intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Lecture Applied data science: Clustering

Chia sẻ: _ _ | Ngày: | Loại File: PDF | Số trang:21

9
lượt xem
3
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Lecture "Applied data science: Clustering" includes content: Exemplary technique - K-means clustering; Exemplary technique - Hierarchical clustering; Practical issues in clustering; Case study;... We invite you to consult!

Chủ đề:
Lưu

Nội dung Text: Lecture Applied data science: Clustering

  1. Clustering
  2. Overview 1. Introduction 8. Validation 2. Application 9. Regularisation 3. EDA 10. Clustering 4. Learning Process 11. Evaluation 5. Bias-Variance Tradeoff 12. Deployment 6. Regression (review) 13. Ethics 7. Classification
  3. Lecture outline - Exemplary technique - K-means clustering - Exemplary technique - Hierarchical clustering - Practical issues in clustering - Case study
  4. Unsupervised learning and clustering - Tend to be more subjective - Often a part of the exploratory data analysis - No universally accepted mechanism to validate the results - Clustering - partition a data set into distinct, non-overlapping groups
  5. Exemplary technique - K-means clustering - Assign each observation to exactly one of K clusters (K must be predefined) - A good clustering is one for which the within-cluster variation is smallest - There are K^n ways to partition n observations in K clusters, thus the approximating algorithm…
  6. Exemplary technique - K-means clustering
  7. Exemplary technique - K-means clustering - The above algorithm is repeated until the elements in the K clusters are stable - The algorithm only gives a local optimum - Run the algorithm multiple times and selected the best solution, i.e. one that has the smallest within-cluster variation of all clusters.
  8. Exemplary technique - Agglomerative hierarchical clustering
  9. Exemplary technique - Agglomerative hierarchical clustering
  10. The dendrogram ‘Hierarchical’ means that clusters obtained by cutting the dendrogram at a given height are nested within clusters at any greater height => not a suitable approach to all data sets.
  11. Choice of dissimilarities Euclidean distance Manhattan distance Jaccard distance Cosine distance Correlation based distance
  12. Choice of dissimilarity - The Euclidean distance - similar items have shorter distance between them - The correlation based distance - similar items are stronger correlated
  13. Practical issues in clustering - Standardising features before clustering - Hierarchical clustering - dissimilarity measures, types of linkage, number of clusters - K- means clustering - the number of k - Are clusters representing true (natural) sub groups in data? - Clustering methods not robust to perturbations to data - Clustering results are only a starting point for forming hypotheses about data - Understanding clustering results - Use the name (or characteristic attributes) of elements in each cluster - Use an exemplar member in each cluster - Clusters may be used as label for subsequent predictive analytics
  14. Case study - Clustering financial centres 15 cities - Ho Chi Minh City, Manila, Jakarta, Kuala Lumpur, Bangkok, Mumbai, Hong Kong, Singapore, Beijing, Shanghai, Shenzhen, Seoul, Busan, Taipei, Tokyo 55 instrument factors - in Business Environment (20), Financial Sector Development (9), Human Capital (7), Infrastructure (8), Reputation (11)
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2