What is “Regularization” and how does it help with overfitting?

Shadi Balandeh
2 min readJan 22, 2024

--

Would you always select the model with a higher goodness-of-fit number?

If it has a higher R-squared, it must do better in predicting future values, right?

Well, not necessarily!

The culprit is 𝐨𝐯𝐞𝐫𝐟𝐢𝐭𝐭𝐢𝐧𝐠, and a solution often overlooked is “𝐫𝐞𝐠𝐮𝐥𝐚𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧”.

Overfitting is a common issue in machine learning where a model learns the details in the training data to such an extent that it negatively impacts the model’s performance on new data.

Essentially, the model has memorized the training data rather than learning to generalize from it.

There are various ways to handle overfitting, such as cross-validation or using more data.

💡 However, one equally effective technique is ‘𝐫𝐞𝐠𝐮𝐥𝐚𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧’.

Regularization works by adding a penalty to the model’s complexity, encouraging it to be simpler and to focus on the main trends in the data rather than memorizing specific details.

This makes the model more robust, less sensitive to noise, and more generalizable to new data.

There are two primary regularization techniques: L1 and L2.
Both add a penalty term to the model’s loss function, but they do it differently.

𝐋1 𝐑𝐞𝐠𝐮𝐥𝐚𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧 (𝐋𝐚𝐬𝐬𝐨): It adds the absolute value of the coefficients as the penalty, potentially reducing some coefficients to zero, thus selecting more relevant features.

𝐋2 𝐑𝐞𝐠𝐮𝐥𝐚𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧 (𝐑𝐢𝐝𝐠𝐞): It adds the square of the coefficients, which tends to distribute the penalty across all features, shrinking them closer to zero, but rarely to zero.

𝐑𝐞𝐠𝐮𝐥𝐚𝐫𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐢𝐧 𝐃𝐞𝐞𝐩 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠

Deep learning models, with their vast number of parameters, are particularly prone to overfitting.

Two popular techniques to prevent this are Dropout and Early Stopping.

𝐃𝐫𝐨𝐩𝐨𝐮𝐭: It randomly ‘drops’ a set percentage of neurons in each layer during training, ensuring that the network doesn’t become overly reliant on any specific neuron.

𝐄𝐚𝐫𝐥𝐲 𝐒𝐭𝐨𝐩𝐩𝐢𝐧𝐠: It monitors the model’s performance on a validation set and stops training once the performance starts deteriorating.

Python offers built-in functions for all these four techniques.

Remember, in machine learning, sometimes less is more!

📢 I regularly write about common data science mistakes and pitfalls of data-driven decision making in simple words.
Please follow to join along!

--

--

Shadi Balandeh
Shadi Balandeh

Written by Shadi Balandeh

AI and Data Science Manager| AI & Data Literacy Educator| Scientific Data-Driven Decision Making Advocate| Mom

No responses yet