Chapter 3 NUMERICAL MEASURES

MBA Nguyen Tien Dung School of Economics and Management Website: https://sites.google.com/site/nguyentiendungbkhn Email: dung.nguyentien3@hust.edu.vn

Main Contents

3.1 MEASURES OF LOCATION 3.2 MEASURES OF VARIABILITY 3.3 MEASURES OF DISTRIBUTION SHAPE,

RELATIVE LOCATION, AND DETECTION OF OUTLIERS

© Nguyễn Tiến Dũng Applied Statistics for Business

2

3.1 MEASURES OF LOCATION

● Mean ● Median ● Mode ● Percentiles ● Quartiles

© Nguyễn Tiến Dũng Applied Statistics for Business

3

Mean

Population mean

● A population, say, a data set about the ages of students in 5 classes. We denote: ● X: the random variable of age ● X1, X2, …, XN ● N – population size (say N = 200) ● A random sample taken from a

Sample mean

population ● x1, x2, …, xn ● n – sample size (say, n = 30)

● The sample mean is the unbiased point estimator of the population mean

© Nguyễn Tiến Dũng Applied Statistics for Business

4

© Nguyễn Tiến Dũng Applied Statistics for Business

5

© Nguyễn Tiến Dũng Applied Statistics for Business

6

© Nguyễn Tiến Dũng Applied Statistics for Business

7

© Nguyễn Tiến Dũng Applied Statistics for Business

8

Median

● The median is the value in the middle when the data are arranged in ascending order (smallest value to largest value).

● A set of observations: x1, x2, …, xn ● Arrange the data in ascending order (smallest

value to largest value).

● Me = x(n+1)/2 ● If n = 2k+1, then Me = xk+1 ● If n = 2k, then Me = 0.5(xk + xk+1) ● Sample 1: 1 3 5 8 10 n = 5  k = 2  k+1 = 3 ● Sample 2: 1 3 5 8 9 10  (n+1)/2 = 3.5

© Nguyễn Tiến Dũng Applied Statistics for Business

9

Mode

● The mode is the value that occurs with

greatest frequency.

● 1 1 2 2 3 4 4 4 5 5 6 6  Mode = 4 ● 1 2 2 3 4 4 4 5 5 6 6 6  Mode = 4, 6

(multiple modes)

● 1 1 2 2 3 3 4 4 5 5 6 6  no Mode

© Nguyễn Tiến Dũng Applied Statistics for Business

10

Percentile (Textbook)

● Anderson 2014: The pth percentile is a value such that at least p percent of the observations are less than or equal to this value and at least (100 - p) percent of the observations are greater than or equal to this value.

© Nguyễn Tiến Dũng Applied Statistics for Business

11

Percentile (Excel)

● Position of the kth percentile:

● pk = k.(n-1)/100 + 1

● Value of the pth percentile:

● if pk is integer -> x(pk) ● if pk is not an integer, use the interpolation

procedure

© Nguyễn Tiến Dũng Applied Statistics for Business

12

Quartiles

● Q1: the first quartile = the 25th percentile ● Q2: the second quartile = the 50th percentile = Median ● Q3: the third quartile = the 75th percentile

© Nguyễn Tiến Dũng Applied Statistics for Business

13

Quartiles (Excel & MegaStat)

● Q1: The first quartile

● Position: q1 = [1*(n-1)/4] +1 ● Value: Q1 = x(q1)

● Q2

● Position: q2 = [2*(n-1)/4] +1 ● Value: Q2 = x(q2) = Median

● Q3

● Position: q3 = [3*(n-1)/4] +1 ● Value: Q3 = x(q3)

● Recommend: Use Excel & MegaStat procedure

© Nguyễn Tiến Dũng Applied Statistics for Business

14

3.2 MEASURES OF VARIABILITY

● Range ● Interquartile Range ● Variance ● Standard Deviation ● Coefficient of Variation

© Nguyễn Tiến Dũng Applied Statistics for Business

15

Different Variances

© Nguyễn Tiến Dũng Applied Statistics for Business

16

● Range = Max - Min ● Interquartile Range = Q3 – Q1 ● Population Variance 2 and Population Standard Deviation 

● Sample Variance s2 & Sample Standard Deviation s

© Nguyễn Tiến Dũng Applied Statistics for Business

17

Calculating the Mean and Std. Deviation

● Sample Data ● Sample Variance = 256 / 4 = 64 ● Sample Std. Deviation = sqrt(64) = 8

© Nguyễn Tiến Dũng Applied Statistics for Business

18

Sample Variance & Standard Deviation

© Nguyễn Tiến Dũng Applied Statistics for Business

19

Coefficient of Variation

● A measure of how large the standard deviation is relative to the mean, expressed as a percentage.

or

© Nguyễn Tiến Dũng Applied Statistics for Business

20

Patterns of Skewness

© Nguyễn Tiến Dũng Applied Statistics for Business

21

Skewness and Kurtosis

© Nguyễn Tiến Dũng Applied Statistics for Business

22

Skewness and Kurtosis

© Nguyễn Tiến Dũng Applied Statistics for Business

23

z-Scores

● Suppose we have a sample of n observations, with the values denoted by x1, x2, . . . , xn.

● The z-score is often called the

standardized value.

● The z-score, zi, can be interpreted

as the number of standard deviations xi is from the mean .

© Nguyễn Tiến Dũng Applied Statistics for Business

24

© Nguyễn Tiến Dũng Applied Statistics for Business

25

Chebyshev’s Theorem

● At least (1 - 1/z2) of the data values

must be within z standard deviations of the mean, where z is any value greater than 1.

1821 - 1894

● Implications:

● At least 0.75, or 75%, of the data

values must be within z = 2 standard deviations of the mean.

● At least 0.89, or 89%, of the data

values must be within z = 3 standard deviations of the mean.

● At least 0.94, or 94%, of the data

z = 4 standard

values must be within deviations of the mean.

© Nguyễn Tiến Dũng Applied Statistics for Business

26

Chebyshev’s Inequality Theorem

© Nguyễn Tiến Dũng Applied Statistics for Business

27

Empirical Rule

● 68% of observations are within 1 std. dev. from the mean.

● 95% of observations are within 2 std. dev. from the mean. ● Nearly 100% of

observations are within 3 std. dev. from the mean.

© Nguyễn Tiến Dũng Applied Statistics for Business

28

Detection of Outliers

● Outliers: Some data points may have unusually large or

unusually small values. These extreme values.

● Lower limit = Q1 – 1.5.IQR ● Upper limit = Q3 + 1.5.IQR ● If x(i) < Lower limit  a low outlier ● If x(i) > Upper limit  a high outlier ● If x(i) < Q1 – 3.IQR or x(i) > Q3 + 3.IQR  extreme values ● For example: 1 2 3 4 10 ● Sources of outliers;

● Errors of data records  be corrected ● An inappropriate observation  be removed ● Correctly recorded, but unsual values  be retained, but be noticed

© Nguyễn Tiến Dũng Applied Statistics for Business

29

3.4 EXPLORATORY DATA ANALYSIS

● Five number summary 1. Smallest value (Min) 2. First quartile (Q1) 3. Median (Q2) 4. Third quartile (Q3) 5. Largest value (Max)

© Nguyễn Tiến Dũng Applied Statistics for Business

30

Boxplot (Box-and-whisker plot)

© Nguyễn Tiến Dũng Applied Statistics for Business

31

3.5 MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES

● Covariance: A descriptive

measure of the linear association between two variables. ● Sample covariance

● Population covariance

© Nguyễn Tiến Dũng Applied Statistics for Business

32

Example

● Question: Is there any correlation /

relationship between x and y ?

© Nguyễn Tiến Dũng Applied Statistics for Business

33

Drawing a scatter diagram

© Nguyễn Tiến Dũng Applied Statistics for Business

34

© Nguyễn Tiến Dũng Applied Statistics for Business

35

Interpretation of Sample Covariance

sXY is positive A positive linear relationship

sXY is negative A negative linear relationship

sXY is about 0 No apparent relationship

© Nguyễn Tiến Dũng Applied Statistics for Business

36

Correlation Coefficient

● Pearson Product Moment Correlation

Coefficient for Sample Data

© Nguyễn Tiến Dũng Applied Statistics for Business

37

● Pearson Product Moment Correlation

Coefficient for Population Data

© Nguyễn Tiến Dũng Applied Statistics for Business

38

© Nguyễn Tiến Dũng Applied Statistics for Business

39

● rxy > 0: a positive linear relationship ● rxy < 0: a negative linear relationship ● Absolute value of rxy: from 0 to 1

● The higher value, the tighter / closer linear

relationship

● Excel Application: ● CORREL() function

© Nguyễn Tiến Dũng Applied Statistics for Business

40

3.6 THE WEIGHTED MEAN AND WORKING WITH GROUPED DATA ● A simple mean

● A weighted mean

● Calculate: GPA (Grade Point in

Average) ● Marks of the courses: x1, x2, …, xn ● Credits of the courses: w1, w2, …, wn

© Nguyễn Tiến Dũng Applied Statistics for Business

41

Grouped Data

● Sample mean for grouped data

© Nguyễn Tiến Dũng Applied Statistics for Business

42

Sample Variance

© Nguyễn Tiến Dũng Applied Statistics for Business

43

Population Mean and Variance for Grouped Data

● Population mean

● Population variance

© Nguyễn Tiến Dũng Applied Statistics for Business

44

Exercises for Homework

Section

Exercises

3.1

1, 5, 6, 11, 16 – 7, 10 (Excel)

3.2

25, 26 – 27, 32 (Excel)

3.3

37, 41 – 44, 45 (Excel)

3.4

48, 49, 51 – 52, 53 (Excel)

3.5

55, 58 – 57, 59 (Excel)

3.6

-

Supplementary 63, 68 (Excel)

© Nguyễn Tiến Dũng Applied Statistics for Business

45