Classification

Overview

1.

Introduction

8. Validation

2. Application

9. Regularisation

3. EDA

10. Clustering

4. Learning Process

11. Evaluation

5. Bias-Variance Tradeoff

12. Deployment

6. Regression (review)

13. Ethics

7. Classification

Lecture outline

- Classification - Logistic regression review - Classification evaluation metrics The expected value framework -

Classification problems

Response is categorical, e.g. credit card default (Yes/No), favourite movie types (Action/Drama/Animation)

Exemplary techniques - logistic regression, classification tree, K-NN, etc.

Logistic regression formulation

Logistic regression coefficients are estimated by maximising the likelihood function

Logistic regression example

responding

Yes

No

127

2817

student_Yes

206

6850

student_No

333

9667

Total

Training set

responding

Yes No

student_Yes 84 1959

student_No 150 4808

Total 234 6767

Test set responding

student_Yes

43

858

Yes No

student_No 56 2042

Total 99 2900

Logistic regression results

Logistic regression results interpretation

Prediction from multiple classifiers

The ROC curve

The ROC curve

Each point corresponds to a confusion matrix

Point A is more ‘conservative’ than B, which is more ‘conservative’ than C

Points that are closer to the upper left are preferred. Point (0,1) represents the perfect classifier

Points along the diagonal represent random guessing - no classifiers should be in the lower right

The ROC curves from different classifiers

p

n

Predicted Yes

46

12

Predicted No

53

2888

The expected value analytical framework

The targeted marketing example.

Assume that we sell the product for $200, production related cost is $100 and shipping and handling cost is $1. What would be the minimum probability of responding we should target.

Expected value of a classifier

Expected value of a classifier

From the above example, let’s use 0.35 as the threshold and assume the matrix of cost/benefit information is as below. What would be total expected value of the logistic regression classifier per customer?

Actual Yes Actual No

Predicted Yes

$99

$-1

Predicted No

$0

$0

The profit curves

Actual Yes Actual No Actual Yes Actual No

Predicted No

$0

$0

Predicted No

$0

$0

Predicted Yes $99 $-1 Predicted Yes $99 $-10