intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Lecture Applied data science: Regularisation

Chia sẻ: _ _ | Ngày: | Loại File: PDF | Số trang:34

8
lượt xem
3
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Lecture "Applied data science: Regularisation" includes content: variable subset selection, shrinkage methods, dimension reduction, considerations in high dimensions,... We invite you to consult!

Chủ đề:
Lưu

Nội dung Text: Lecture Applied data science: Regularisation

  1. Regularisation
  2. Overview 1. Introduction 8. Validation 2. Application 9. Regularisation 3. EDA 10. Clustering 4. Learning Process 11. Evaluation 5. Bias-Variance Tradeoff 12. Deployment 6. Regression (review) 13. Ethics 7. Classification
  3. Lecture outline ● Variable subset selection ○ Best subset selection ○ Stepwise selection methods - forward, backward, hybrid ● Shrinkage methods ○ Ridge regression ○ Lasso ○ Elastic net ● Dimension reduction ○ Principal components analysis and regression ● Considerations in high dimensions
  4. Best subset selection
  5. Example data
  6. Best subset selection RSS and R2 for all possible regression models of Balance on the predictors
  7. Best subset selection
  8. Forward stepwise selection
  9. Forward stepwise selection
  10. Forward stepwise selection
  11. Backward stepwise selection
  12. Backward stepwise selection
  13. Backward stepwise selection
  14. Hybrid stepwise selection Similar to forward stepwise selection, except that after adding a new variable to the model, we remove any existing variables that no longer (statistically significantly) contribute to explaining the response.
  15. Observations - Best subset selection is computational demanding because we have to fit 2^p models - Stepwise selection methods have the computational advantage over best subset selection because they only have to fit 1 + p(p+1)/2 models. - Forward and backward selection do not guarantee the best possible model out of 2^p models. Hybrid is getting closer to best subset selection while preserving the computational advantage of forward stepwise selection. - Backward selection can only be used when n > p. - RSE can be a better metric compared to RSS or R2 in selecting the best training model. Why?
  16. Selecting the best model RSS and R-squared are associated with the training error => not suitable to select the best model => choose the best model based on the following test errors by making adjustment to the training error (to account for the bias due to overfitting) … or estimate test errors directly with cross-validation
  17. Selecting the best model (indirectly)
  18. Selecting the best model (with cross validation) One standard error rule boundaries Smallest average MSE Selected (best) model Balance ~ Income + Limit + Cards + Student_Yes
  19. Shrinkage methods - Ridge regression and the Lasso Ridge regression Lasso - Ridge regression and the lasso add bias to the estimation of betas via lambda => reduce variation & improve predictive performance. - If lambda = 0, ridge regression and the lasso become OLS - When lambda very large - ridge coefficients are shrunk toward 0 - lasso coefficients become 0
  20. Shrinkage methods - Ridge regression and the Lasso It is best to standardise the predictors before doing ridge regression and lasso - OLS regression coefficients are scale equivariant. Ridge regression and lasso coefficients are not. - Ridge regression and lasso coefficients are shrunk toward zero and toward each other. - The shrinking between coefficients not on the same scale would be unequal. - Standardisation brings predictors to the same scale => allow us to rank relative importance of the predictors. More important predictors have higher standardised coefficients.
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2