Lecture 3. NUMERICAL SUMMARY Lecture 3. NUMERICAL SUMMARY

 Data Measurements

 Locations  Variability Measures  Shape

PROBABILITY & STATISTICS– Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

1

 [1] Chapter 3, pp. 99 - 162  [3] Chapter 2

Comparison Comparison

Profit of Project A (million)

30%

 Profit of two project A & B

20%

20%

15%

10%

5%

1

2

3

4

5

6

Profit of Project B (million)

30%

20%

20%

15%

10%

5%

1

2

3

4

5

6

PROBABILITY & STATISTICS– Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

2

Comparison Comparison

Profit of Project C (million)

30%

20%

20%

15%

8%

5%

2%

0%

0%

1

2

3

4

5

6

7

8

9

Profit of Project D (million)

30%

20%

20%

15%

8%

5%

2%

0%

0%

1

2

3

4

5

6

7

8

9

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

3

Comparison Comparison

Profit of Project E (million)

20%

20%

15%

15%

10%

10%

5%

5%

-1

0

1

2

3

4

5

6

Profit of Project F (million)

40%

40%

10%

10%

0%

0%

0%

0%

-1

0

1

2

3

4

5

6

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

4

Data Measurements Data Measurements

 Location:

 Minimum, Maximum  Central Tendency: Mean, Median, Mode  Quantile: Quartile, Percentile

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

5

 Variability:  Range  Variance (Var)  Standard Deviation (SD)  Coefficient of Variation (CV)  Interquartile Range (IQR)

3.1. Mean (arithmetic mean) 3.1. Mean (arithmetic mean)

 Apply for scale variable only

 =

Population Data: {,,… ,} Sample Data: {,,… ,}

= =

+ + ⋯ + + + ⋯ +

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

6

 Have the same unit as the original data

Weighted mean Weighted mean

 Price ($) in Quarter 1, 2, 3, 4 are 10, 12, 18, 14,

respectively.

= =

10 + 12 + 18 + 14 4

 Any difference if the volume of sales in Quarter 1, 2, 3,

4 are 70, 90, 110, 130?

Value xi

Q1

Q2

Q3

Q4

Price

10

12

18

14

Weight wi

Volume

70

90

110

130

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

7

Weighted Mean Weighted Mean

 In general, for grouped data:

= =

+ + ⋯ + + + ⋯ + ∑ ∑

 For Example of Price:

̅ = =

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

8

70 ∗ 10 + 90 ∗ 12 + 110 ∗ 18 + 130 ∗ 14 70 + 90 + 110 + 130

Mean of Grouped data Mean of Grouped data

 Frequency, Proportion, Percent table

Wage ($) 7 8 9

4 10 6

0.2 0.5 0.3

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

9

Number of worker (Frequency) Proportion (Relative frequency) Percent 20% 50% 30%

Compare the Mean Compare the Mean

 Compare the mean of following data:  Data 1: {10, 10, 11, 12, 12}  Data 2: {5, 5, 6, 6, 100}

 The mean is easily affected by the extreme or outlier

value

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

10

 May lead to biased comparison   Use the other measures

3.2. Median 3.2. Median

 Median, denoted by me, is the midpoint of ordered

list of values

 Median could be applied for ordinal variable

Ex. Data: { 5, 6, 9, 5, 6 }

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

11

Ordered data: { 5, 5, 6, 6, 9 } : Median = Ordered Data {6, 6, 7, 8, 9, 11} : Median =  Data: {XXS, XS, S, S, S, M, L, XL, XXL}: Median =

MedianMedian

 Median is the ‘cutoff point’ of lower 50% - upper 50%

Discrete vs Continous

Lower 50%

Upper 50%

Discrete

Continuous

Median

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

12

parts

3.3. Mode 3.3. Mode

 Mode, denoted by m0, is the value that occurs most

often, frequency of (X = m0) is the largest.  There may be no mode or several modes.  Mode could be applied for nominal variable

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

13

 Example What are the modes?  Data 1: { 5, 6, 6, 7, 7, 7, 9 }  Data 2: { 5, 6, 7, 8, 9 }  Data 3: { 5, 6, 9, 5, 6 }  Data 4: { Yellow, Yellow, Red, Blue, Green}

Mean, Median, Mode Mean, Median, Mode

No Mode

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Mean = 3

Median = 3

Mean = 4

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

Mean = 4.8

Mode: 7

Mean = Median = Mode = 5

Median = 5.5

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

14

Mean, Median, Mode Mean, Median, Mode

Left skewed

Symmetric

Right skewed

Mean Median Mode

Mean < Median < Mode

Mode < Median < Mean

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

15

Grouped data Grouped data

 Customer’s waiting time

Waiting time Frequency

0 – 5 15

5 – 10 20

10 – 15 8

15 – 20 5

20 + 2

 Median is in group of [5 – 10)  Modal group:  Mean: using middle value

Waiting time

2.5

7.5

12.5

17.5

22.5

Frequency

15

20

8

5

2

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

16

3.4. Quartile 3.4. Quartile

 Divide data into 4 equal-parts by 3 cutoff points: 3

quartile ,,

25% 25% 25% 25%

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

17

 2nd quartile: =

Quantile Quantile

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

18

 Divide into 5 equal-parts by 4 cutoff point: 4 Quintile  Divide into 10 equal-parts by 9 cutoff point: 9 Decile  100 equal-parts: 99 percentile  10th percentile = 1st decile  20th percentile = 2nd decile = 1st quintile  25th percentile = 1st quartile  50th percentile = 2nd quartile = median

Micrsoft Excel Function Micrsoft Excel Function

Measures

Command / Function

Mean

= average(data)

Median

= median(data)

Mode

= mode(data)

Quartile k (k = 1,2,3)

= quartile(data, k)

Percentile k (k = 1,2,…,99) = percentile(data, k)

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

19

Variability Variability

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

0 1 2 3 4 5 6 7 8 9

 Central Tendency may not provide efficient information of the data.

0 1 2 3 4 5 6 7 8 9

Mean = Median = 5

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

20

 Data have the same Mean, Median, but differ in variability (dispersion, spread).

3.5. Range 3.5. Range

 Range

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

Range = 7

Range = 6

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

21

= largest value – smallest value = xmax – xmin  Simplest, but poorest information.

3.6. Variance & Standard Deviation 3.6. Variance & Standard Deviation

 Sample Data: ,,… ,  the mean ̅  Deviation: − ̅ : (+) or (–) or zero  Sum of Squares: = ∑ − ̅  Variance:

∑ −

= =

− −

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

22

 Unit of Variance is squared unit of

Standard Deviation Standard Deviation

 Standard Deviation is square root of Variance

then:

=

>

 Standard Deviation has the same unit as  Variance & S.D measure the “absolute” variability  If

 is more variability, dispersed, widespread,

fluctuated than

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

23

 is more stable, concentrated than

Population and Sample Population and Sample

 Difference between Population and Sample

Population Sample

Data {,,… ,} {,,… ,}

Mean = =

∑ ∑

= ∑ − = ∑ − ̅

SS Variance

= =

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

24

Std. Dev. = =

Compare variability Compare variability

 Compare 3 samples  Firm A: Profit ($ mil.): ( 5, 6, 7, 8, 9 )  Firm B: Profit ($ mil.): ( 51, 53, 55, 57, 59 )  Firm C: Price ($): ( 15, 16, 17, 18, 19 )

Mean S2 S CV SS

A 7 ($m) 2.5 ($m)2 1.58 ($m) 22.6 % 10

B 55 ($m) 10 ($m)2 3.16 ($m) 5.7 % 40

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

25

C 17 ($) 2.5 ($)2 1.58 ($) 9.3 % 10

3.7. Coefficient of Variation 3.7. Coefficient of Variation

= × 100%

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

26

 CV has unit of %, independent to unit of the data.  CV measures “relative” variation

3.8. Interquartile Range 3.8. Interquartile Range

 Interquartile Range is range between 3rd quartile

and 1st quartile

 = 3 − 1 = −  IQR is the width of 50% middle value of data

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

27

25% 25% 25% 25%

Outlier Outlier

 There are Lower Limit and Upper Limit for the data  Observations smaller than LL or greater than UL are

Outlier

 By Quartiles:

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

28

Lower Limit is − 1.5 Upper Limit is + 1.5

Key-point and Boxplot Key-point and Boxplot

 Find 5 key-point and Outliers

Salary No. of Worker 10 11 12 13 14 15 16 17 18 1 10 16 30 19 14 10 0 0

 Boxplot

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

29

1.5 1.5

Table, Histogram, Boxplot Table, Histogram, Boxplot

Value Freq.

Salary

35

30

25

20

15

10

5

0

10

11

12

13

14

15

16

17

18

10 11 12 13.5 18

10 11 12 13 14 15 16 17 18

10 16 30 19 14 10 0 0 1

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

30

Boxplot : Key values and Whiskers Boxplot : Key values and Whiskers

A

B

C

D

E

F

6

Max 6

7

9

6

4

5

4

6

6

4

3

Q3

4.5 2.5 5.5 4.5 2.5 2.5

Q2

3

2

4

4

1

2

Q1

Min

1

1

1

3

-1

1

̅

4.2 2.8 5.16 4.84 2.5 2.5

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

31

Boxplot Boxplot

2014

2015

2016

2017

Max

Q3

Q2

Q1

Min

Mean

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

32

3.9. Skewness (Sk) 3.9. Skewness (Sk)

Sk = 0.3 Right short tail Sk = 0 Two-tail Sk = – 0.3 Left short tail

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

33

Sk = – 1.3 Left long tail Sk = 1.3 Right long tail

3.10. Covariance & Correlation 3.10. Covariance & Correlation

 Covariance: combined variability of , , in sample:

Positive covariance

Negative covariance

Y f o n a e M

Y f o n a e M

Mean of X

Mean of X

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

34

, = = ∑ ( − ̅)( − ) − 1

Correlation Coefficient Correlation Coefficient

∑ ( − ̅)( − )

= =

− ̅

(,) ∑ ∑ −

 −1 ≤ ≤ 1, no unit  measures linear relationship between and

: linear negative

: no correlated

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

35

 = −1  −1 < < 0 : negatively correlated  = 0  0 < < 1 : positively correlated  = 1 : linear positive

Correlation Correlation

r = 0.5

Positively

Week

r = 0.8

Strong

r = 0

Negatively

No correlated

r = – 0.5

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

36

 Graph and Correlation Coefficient ()

Correlation Coefficient Correlation Coefficient

− ̅

− ̅

− ̅ ∗ −

 X: Advertising; Y: sales

5 10 Jan

6 8 15 10 Feb Mar

9 12 18 32 Apr May

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

37

Sum Mean

3.11. Standardized value 3.11. Standardized value

 Z-score of one value in data, have no unit

=

− .

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

38

Ex. Compare score of Microeconomics and Macroeconomics of one student in one class if:  Micro score = 7.5; Marcro score = 9  Mean of Micro in class = 6; Mean of Macro = 7  S.D of Micro = 1; S.D of Macro = 2

Excel: Statistic Functions Excel: Statistic Functions

Statistic

Function

= SUM(array)

= AVERAGE(array)

= MEDIAN(array)

= QUARTILE(array, k)

= VAR(array)

Sum Mean X Median Kth Quartile (Q1,Q2,Q3) Sample variance (S2) Sample S.D (S)

= STDEV(array)

= COVAR(array1, array2)

= CORREL(array1, array2) = NORMDIST(b, µ, σ, 1)

Covariance Cov(X,Y) Correlation rXY X ~ N(µ,σ2); P(X < b)

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

39

Exercise Exercise

[1] Chapter 3:  (p110) 2, 3, 6, 7, 11, 13,  (p120) 26, 27, 29, 33,  (p133) 49, 50, 52,  (p143) 56, 58, 59,  (p152) 62, 63, 70,

PROBABILITY & STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai

40

 Case Problem 1, 4