Lecture "Applied data science: Regularisation" includes content: variable subset selection, shrinkage methods, dimension reduction, considerations in high dimensions,... We invite you to consult!
AMBIENT/
Chủ đề:
Nội dung Text: Lecture Applied data science: Regularisation
- Regularisation
- Overview
1. Introduction 8. Validation
2. Application 9. Regularisation
3. EDA 10. Clustering
4. Learning Process 11. Evaluation
5. Bias-Variance Tradeoff 12. Deployment
6. Regression (review) 13. Ethics
7. Classification
- Lecture outline
● Variable subset selection
○ Best subset selection
○ Stepwise selection methods - forward, backward, hybrid
● Shrinkage methods
○ Ridge regression
○ Lasso
○ Elastic net
● Dimension reduction
○ Principal components analysis and regression
● Considerations in high dimensions
- Best subset selection
- Example data
- Best subset selection
RSS and R2 for all possible regression models of Balance on the predictors
- Best subset selection
- Forward stepwise selection
- Forward stepwise selection
- Forward stepwise selection
- Backward stepwise selection
- Backward stepwise selection
- Backward stepwise selection
- Hybrid stepwise selection
Similar to forward stepwise selection, except that after adding a new variable to the
model, we remove any existing variables that no longer (statistically significantly)
contribute to explaining the response.
- Observations
- Best subset selection is computational demanding because we have to fit 2^p
models
- Stepwise selection methods have the computational advantage over best
subset selection because they only have to fit 1 + p(p+1)/2 models.
- Forward and backward selection do not guarantee the best possible model out
of 2^p models. Hybrid is getting closer to best subset selection while
preserving the computational advantage of forward stepwise selection.
- Backward selection can only be used when n > p.
- RSE can be a better metric compared to RSS or R2 in selecting the best
training model. Why?
- Selecting the best model
RSS and R-squared are associated with the training error
=> not suitable to select the best model
=> choose the best model based on the following test errors by making adjustment to
the training error (to account for the bias due to overfitting)
… or estimate test errors directly with cross-validation
- Selecting the best model (indirectly)
- Selecting the best model (with cross validation)
One standard error rule boundaries
Smallest average MSE
Selected (best) model
Balance ~ Income + Limit + Cards + Student_Yes
- Shrinkage methods - Ridge regression and the Lasso
Ridge regression
Lasso
- Ridge regression and the lasso add bias to the estimation of betas via lambda => reduce
variation & improve predictive performance.
- If lambda = 0, ridge regression and the lasso become OLS
- When lambda very large
- ridge coefficients are shrunk toward 0
- lasso coefficients become 0
- Shrinkage methods - Ridge regression and the Lasso
It is best to standardise the predictors before doing ridge regression and lasso
- OLS regression coefficients are scale equivariant. Ridge regression and lasso
coefficients are not.
- Ridge regression and lasso coefficients are shrunk toward zero and toward each other.
- The shrinking between coefficients not on the same scale would be unequal.
- Standardisation brings predictors to the same scale => allow us to rank relative
importance of the predictors. More important predictors have higher standardised
coefficients.