Clustering
Overview
1. Introduction
2. Application
3. EDA
4. Learning Process
5. Bias-Variance Tradeoff
6. Regression (review)
7. Classification
8. Validation
9. Regularisation
10. Clustering
11. Evaluation
12. Deployment
13. Ethics
Lecture outline
- Exemplary technique - K-means clustering
- Exemplary technique - Hierarchical clustering
- Practical issues in clustering
- Case study
Unsupervised learning and clustering
- Tend to be more subjective
- Often a part of the exploratory data analysis
- No universally accepted mechanism to validate the results
- Clustering - partition a data set into distinct, non-overlapping groups
Exemplary technique - K-means clustering
- Assign each observation to exactly one of K clusters (K must be predefined)
-A good clustering is one for which the within-cluster variation is smallest
- There are K^n ways to partition n observations in K clusters, thus the
approximating algorithm…