intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Bài giảng Khai phá dữ liệu (Data mining): Ensemble models - Trịnh Tấn Đạt

Chia sẻ: _ _ | Ngày: | Loại File: PDF | Số trang:90

6
lượt xem
5
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Bài giảng Khai phá dữ liệu (Data mining): Ensemble models, chương này trình bày những nội dung về: introduction; voting; bagging; boosting; stacking and blending; learning ensembles; methods of constructing ensembles; bias-variance tradeoff; simple ensemble techniques;... Mời các bạn cùng tham khảo chi tiết nội dung bài giảng!

Chủ đề:
Lưu

Nội dung Text: Bài giảng Khai phá dữ liệu (Data mining): Ensemble models - Trịnh Tấn Đạt

  1. Trịnh Tấn Đạt Khoa CNTT – Đại Học Sài Gòn Email: trinhtandat@sgu.edu.vn Website: https://sites.google.com/site/ttdat88/
  2. Contents  Introduction  Voting  Bagging  Boosting  Stacking and Blending
  3. Introduction
  4. Definition  An ensemble of classifiers is a set of classifiers whose individual decisions are combined in some way (typically, by weighted or un-weighted voting) to classify new examples  Ensembles are often much more accurate than the individual classifiers that make them up.
  5. Learning Ensembles  Learn multiple alternative definitions of a concept using different training data or different learning algorithms.  Combine decisions of multiple definitions, e.g. using voting. Training Data Data 1 Data 2  Data K Learner 1 Learner 2  Learner K Model 1 Model 2  Model K Model Combiner Final Model
  6. Necessary and Sufficient Condition  For the idea to work, the classifiers should be  Accurate  Diverse  Accurate: Has an error rate better than random guessing on new instances  Diverse: They make different errors on new data points
  7. Why they Work?  Suppose there are 25 base classifiers  Each classifier has an error rate,  = 0.35  Assume classifiers are independent  Probability that the ensemble classifier makes a wrong prediction: 25  25  i  i  i =13   (1 −  ) 25−i = 0.06  Marquis de Condorcet (1785) Majority vote is wrong with probability:
  8. Value of Ensembles  When combing multiple independent and diverse decisions each of which is at least more accurate than random guessing, random errors cancel each other out, correct decisions are reinforced.  Human ensembles are demonstrably better  How many jelly beans in the jar?: Individual estimates vs. group average.
  9. A Motivating Example  Suppose that you are a patient with a set of symptoms  Instead of taking opinion of just one doctor (classifier), you decide to take opinion of a few doctors!  Is this a good idea? Indeed it is.  Consult many doctors and then based on their diagnosis; you can get a fairly accurate idea of the diagnosis.
  10. The Wisdom of Crowds  The collective knowledge of a diverse and independent body of people typically exceeds the knowledge of any single individual and can be harnessed by voting
  11. When Ensembles Work?  Ensemble methods work better with ‘unstable classifiers’  Classifiers that are sensitive to minor perturbations in the training set  Examples:  Decision trees  Rule-based  Artificial neural networks
  12. Ensembles  Homogeneous Ensembles : all individual models are obtained with the same learning algorithm, on slightly different datasets  Use a single, arbitrary learning algorithm but manipulate training data to make it learn multiple models.  Data1  Data2  …  Data K  Learner1 = Learner2 = … = Learner K  Different methods for changing training data:  Bagging: Resample training data  Boosting: Reweight training data  Heterogeneous Ensembles : individual models are obtained with different algorithms  Stacking and Blending  combining mechanism is that the output of the classifiers (Level 0 classifiers) will be used as training data for another classifier (Level 1 classifier)
  13. Methods of Constructing Ensembles 1. Manipulate training data set 2. Cross-validated Committees 3. Weighted Training Examples 4. Manipulating Input Features 5. Manipulating Output Targets 6. Injecting Randomness
  14. Methods of Constructing Ensembles - 1 1. Manipulate training data set  Bagging (bootstrap aggregation)  On each run, Bagging presents the learning algorithm with a training set drawn randomly, with replacement, from the original training data. This process is called boostrapping.  Each bootstrap aggregate contains, on the average 63.2% of original training data, with several examples appearing multiple times
  15. Methods of Constructing Ensembles - 2 2. Cross-validated Committees  Construct training sets by leaving out disjointed sets of training data  Idea similar to k-fold cross validation 3. Maintain a set of weights over the training examples. At each iteration the weights are changed to place more emphasis on misclassified examples (Adaboost)
  16. Methods of Constructing Ensembles - 3 4. Manipulating Input Features  Works if the input features are highly redundant (e.g., down sampling FFT bins) 5. Manipulating Output Targets 6. Injecting Randomness
  17. Variance and Bias  Bias is due to differences between the model and the true function.  Variance represents the sensitivity of the model to individual data points
  18. Variance and Bias
  19. Variance and Bias
  20. Variance and Bias
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2