
Bài giảng Máy học nâng cao: Ensemble model - Trịnh Tấn Đạt (2024)
lượt xem 1
download

Bài giảng "Máy học nâng cao: Ensemble model" cung cấp cho người đọc các kiến thức: Introduction, voting, bagging, boosting, stacking and blending. Mời các bạn cùng tham khảo nội dung chi tiết.
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Bài giảng Máy học nâng cao: Ensemble model - Trịnh Tấn Đạt (2024)
- Trịnh Tấn Đạt Khoa CNTT – Đại Học Sài Gòn Email: trinhtandat@sgu.edu.vn Website: https://sites.google.com/site/ttdat88/
- Contents Introduction Voting Bagging Boosting Stacking and Blending
- Introduction
- Definition An ensemble of classifiers is a set of classifiers whose individual decisions are combined in some way (typically, by weighted or un-weighted voting) to classify new examples Ensembles are often much more accurate than the individual classifiers that make them up.
- Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different learning algorithms. Combine decisions of multiple definitions, e.g. using voting. Training Data Data 1 Data 2 Data K Learner 1 Learner 2 Learner K Model 1 Model 2 Model K Model Combiner Final Model
- Necessary and Sufficient Condition For the idea to work, the classifiers should be Accurate Diverse Accurate: Has an error rate better than random guessing on new instances Diverse: They make different errors on new data points
- Why they Work? Suppose there are 25 base classifiers Each classifier has an error rate, = 0.35 Assume classifiers are independent Probability that the ensemble classifier makes a wrong prediction: 25 25 i i i =13 (1 − ) 25−i = 0.06 Marquis de Condorcet (1785) Majority vote is wrong with probability:
- Value of Ensembles When combing multiple independent and diverse decisions each of which is at least more accurate than random guessing, random errors cancel each other out, correct decisions are reinforced. Human ensembles are demonstrably better How many jelly beans in the jar?: Individual estimates vs. group average.
- A Motivating Example Suppose that you are a patient with a set of symptoms Instead of taking opinion of just one doctor (classifier), you decide to take opinion of a few doctors! Is this a good idea? Indeed it is. Consult many doctors and then based on their diagnosis; you can get a fairly accurate idea of the diagnosis.
- The Wisdom of Crowds The collective knowledge of a diverse and independent body of people typically exceeds the knowledge of any single individual and can be harnessed by voting
- When Ensembles Work? Ensemble methods work better with ‘unstable classifiers’ Classifiers that are sensitive to minor perturbations in the training set Examples: Decision trees Rule-based Artificial neural networks
- Ensembles Homogeneous Ensembles : all individual models are obtained with the same learning algorithm, on slightly different datasets Use a single, arbitrary learning algorithm but manipulate training data to make it learn multiple models. Data1 Data2 … Data K Learner1 = Learner2 = … = Learner K Different methods for changing training data: Bagging: Resample training data Boosting: Reweight training data Heterogeneous Ensembles : individual models are obtained with different algorithms Stacking and Blending combining mechanism is that the output of the classifiers (Level 0 classifiers) will be used as training data for another classifier (Level 1 classifier)
- Methods of Constructing Ensembles 1. Manipulate training data set 2. Cross-validated Committees 3. Weighted Training Examples 4. Manipulating Input Features 5. Manipulating Output Targets 6. Injecting Randomness
- Methods of Constructing Ensembles - 1 1. Manipulate training data set Bagging (bootstrap aggregation) On each run, Bagging presents the learning algorithm with a training set drawn randomly, with replacement, from the original training data. This process is called boostrapping. Each bootstrap aggregate contains, on the average 63.2% of original training data, with several examples appearing multiple times
- Methods of Constructing Ensembles - 2 2. Cross-validated Committees Construct training sets by leaving out disjointed sets of training data Idea similar to k-fold cross validation 3. Maintain a set of weights over the training examples. At each iteration the weights are changed to place more emphasis on misclassified examples (Adaboost)
- Methods of Constructing Ensembles - 3 4. Manipulating Input Features Works if the input features are highly redundant (e.g., down sampling FFT bins) 5. Manipulating Output Targets 6. Injecting Randomness
- Variance and Bias Bias is due to differences between the model and the true function. Variance represents the sensitivity of the model to individual data points
- Variance and Bias
- Variance and Bias
- Variance and Bias

CÓ THỂ BẠN MUỐN DOWNLOAD
-
Bài giảng Máy học nâng cao: Naive bayes classification - Trịnh Tấn Đạt
36 p |
47 |
8
-
Bài giảng Máy học nâng cao: Clustering - Trịnh Tấn Đạt
70 p |
55 |
6
-
Bài giảng Máy học nâng cao: Dimension reduction and feature selection - Trịnh Tấn Đạt
81 p |
47 |
6
-
Bài giảng Máy học nâng cao: Artificial neural network - Trịnh Tấn Đạt
62 p |
41 |
4
-
Bài giảng Máy học nâng cao: Association rules - Trịnh Tấn Đạt
76 p |
64 |
3
-
Bài giảng Máy học nâng cao: Clustering - Trịnh Tấn Đạt (2024)
70 p |
3 |
2
-
Bài giảng Máy học nâng cao: Dimension reduction and feature selection - Trịnh Tấn Đạt (2024)
81 p |
2 |
1
-
Bài giảng Máy học nâng cao: Support vector machine - Trịnh Tấn Đạt (2024)
77 p |
1 |
1
-
Bài giảng Máy học nâng cao: Association rules - Trịnh Tấn Đạt (2024)
76 p |
4 |
1
-
Bài giảng Máy học nâng cao: Deep learning - An introduction - Trịnh Tấn Đạt (2024)
109 p |
2 |
1
-
Bài giảng Máy học nâng cao: Artificial neural netword - Trịnh Tấn Đạt (2024)
62 p |
2 |
1
-
Bài giảng Máy học nâng cao: Naive bayes classifier - Trịnh Tấn Đạt (2024)
36 p |
3 |
1
-
Bài giảng Máy học nâng cao: Logistic regression - Trịnh Tấn Đạt (2024)
27 p |
3 |
1
-
Bài giảng Máy học nâng cao: Linear regression - Trịnh Tấn Đạt (2024)
64 p |
5 |
1
-
Bài giảng Máy học nâng cao: Python, jupyter notebook, kaggle - Trịnh Tấn Đạt (2024)
48 p |
3 |
1
-
Bài giảng Máy học nâng cao: Introduction - Trịnh Tấn Đạt (2024)
41 p |
3 |
1
-
Bài giảng Máy học nâng cao: Genetic algorithm - Trịnh Tấn Đạt (2024)
70 p |
0 |
0


Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn
