Classification
Overview
1.
Introduction
8. Validation
2. Application
9. Regularisation
3. EDA
10. Clustering
4. Learning Process
11. Evaluation
5. Bias-Variance Tradeoff
12. Deployment
6. Regression (review)
13. Ethics
7. Classification
Lecture outline
- Classification - Logistic regression review - Classification evaluation metrics The expected value framework -
Classification problems
Response is categorical, e.g. credit card default (Yes/No), favourite movie types (Action/Drama/Animation)
Exemplary techniques - logistic regression, classification tree, K-NN, etc.
Logistic regression formulation
Logistic regression coefficients are estimated by maximising the likelihood function
Logistic regression example
responding
Yes
No
127
2817
student_Yes
206
6850
student_No
333
9667
Total
Training set
responding
Yes No
student_Yes 84 1959
student_No 150 4808
Total 234 6767
Test set responding
student_Yes
43
858
Yes No
student_No 56 2042
Total 99 2900
Logistic regression results
Logistic regression results interpretation
Prediction from multiple classifiers
The ROC curve
The ROC curve
Each point corresponds to a confusion matrix
Point A is more ‘conservative’ than B, which is more ‘conservative’ than C
Points that are closer to the upper left are preferred. Point (0,1) represents the perfect classifier
Points along the diagonal represent random guessing - no classifiers should be in the lower right
The ROC curves from different classifiers
p
n
Predicted Yes
46
12
Predicted No
53
2888
The expected value analytical framework
The targeted marketing example.
Assume that we sell the product for $200, production related cost is $100 and shipping and handling cost is $1. What would be the minimum probability of responding we should target.
Expected value of a classifier
Expected value of a classifier
From the above example, let’s use 0.35 as the threshold and assume the matrix of cost/benefit information is as below. What would be total expected value of the logistic regression classifier per customer?
Actual Yes Actual No
Predicted Yes
$99
$-1
Predicted No
$0
$0
The profit curves
Actual Yes Actual No Actual Yes Actual No
Predicted No
$0
$0
Predicted No
$0
$0
Predicted Yes $99 $-1 Predicted Yes $99 $-10