Chapter 3 NUMERICAL MEASURES
MBA Nguyen Tien Dung School of Economics and Management Website: https://sites.google.com/site/nguyentiendungbkhn Email: dung.nguyentien3@hust.edu.vn
Main Contents
3.1 MEASURES OF LOCATION 3.2 MEASURES OF VARIABILITY 3.3 MEASURES OF DISTRIBUTION SHAPE,
RELATIVE LOCATION, AND DETECTION OF OUTLIERS
© Nguyễn Tiến Dũng Applied Statistics for Business
2
3.1 MEASURES OF LOCATION
● Mean ● Median ● Mode ● Percentiles ● Quartiles
© Nguyễn Tiến Dũng Applied Statistics for Business
3
Mean
Population mean
● A population, say, a data set about the ages of students in 5 classes. We denote: ● X: the random variable of age ● X1, X2, …, XN ● N – population size (say N = 200) ● A random sample taken from a
Sample mean
population ● x1, x2, …, xn ● n – sample size (say, n = 30)
● The sample mean is the unbiased point estimator of the population mean
© Nguyễn Tiến Dũng Applied Statistics for Business
4
© Nguyễn Tiến Dũng Applied Statistics for Business
5
© Nguyễn Tiến Dũng Applied Statistics for Business
6
© Nguyễn Tiến Dũng Applied Statistics for Business
7
© Nguyễn Tiến Dũng Applied Statistics for Business
8
Median
● The median is the value in the middle when the data are arranged in ascending order (smallest value to largest value).
● A set of observations: x1, x2, …, xn ● Arrange the data in ascending order (smallest
value to largest value).
● Me = x(n+1)/2 ● If n = 2k+1, then Me = xk+1 ● If n = 2k, then Me = 0.5(xk + xk+1) ● Sample 1: 1 3 5 8 10 n = 5 k = 2 k+1 = 3 ● Sample 2: 1 3 5 8 9 10 (n+1)/2 = 3.5
© Nguyễn Tiến Dũng Applied Statistics for Business
9
Mode
● The mode is the value that occurs with
greatest frequency.
● 1 1 2 2 3 4 4 4 5 5 6 6 Mode = 4 ● 1 2 2 3 4 4 4 5 5 6 6 6 Mode = 4, 6
(multiple modes)
● 1 1 2 2 3 3 4 4 5 5 6 6 no Mode
© Nguyễn Tiến Dũng Applied Statistics for Business
10
Percentile (Textbook)
● Anderson 2014: The pth percentile is a value such that at least p percent of the observations are less than or equal to this value and at least (100 - p) percent of the observations are greater than or equal to this value.
© Nguyễn Tiến Dũng Applied Statistics for Business
11
Percentile (Excel)
● Position of the kth percentile:
● pk = k.(n-1)/100 + 1
● Value of the pth percentile:
● if pk is integer -> x(pk) ● if pk is not an integer, use the interpolation
procedure
© Nguyễn Tiến Dũng Applied Statistics for Business
12
Quartiles
● Q1: the first quartile = the 25th percentile ● Q2: the second quartile = the 50th percentile = Median ● Q3: the third quartile = the 75th percentile
© Nguyễn Tiến Dũng Applied Statistics for Business
13
Quartiles (Excel & MegaStat)
● Q1: The first quartile
● Position: q1 = [1*(n-1)/4] +1 ● Value: Q1 = x(q1)
● Q2
● Position: q2 = [2*(n-1)/4] +1 ● Value: Q2 = x(q2) = Median
● Q3
● Position: q3 = [3*(n-1)/4] +1 ● Value: Q3 = x(q3)
● Recommend: Use Excel & MegaStat procedure
© Nguyễn Tiến Dũng Applied Statistics for Business
14
3.2 MEASURES OF VARIABILITY
● Range ● Interquartile Range ● Variance ● Standard Deviation ● Coefficient of Variation
© Nguyễn Tiến Dũng Applied Statistics for Business
15
Different Variances
© Nguyễn Tiến Dũng Applied Statistics for Business
16
● Range = Max - Min ● Interquartile Range = Q3 – Q1 ● Population Variance 2 and Population Standard Deviation
● Sample Variance s2 & Sample Standard Deviation s
© Nguyễn Tiến Dũng Applied Statistics for Business
17
Calculating the Mean and Std. Deviation
● Sample Data ● Sample Variance = 256 / 4 = 64 ● Sample Std. Deviation = sqrt(64) = 8
© Nguyễn Tiến Dũng Applied Statistics for Business
18
Sample Variance & Standard Deviation
© Nguyễn Tiến Dũng Applied Statistics for Business
19
Coefficient of Variation
● A measure of how large the standard deviation is relative to the mean, expressed as a percentage.
or
© Nguyễn Tiến Dũng Applied Statistics for Business
20
Patterns of Skewness
© Nguyễn Tiến Dũng Applied Statistics for Business
21
Skewness and Kurtosis
© Nguyễn Tiến Dũng Applied Statistics for Business
22
Skewness and Kurtosis
© Nguyễn Tiến Dũng Applied Statistics for Business
23
z-Scores
● Suppose we have a sample of n observations, with the values denoted by x1, x2, . . . , xn.
● The z-score is often called the
standardized value.
● The z-score, zi, can be interpreted
as the number of standard deviations xi is from the mean .
© Nguyễn Tiến Dũng Applied Statistics for Business
24
© Nguyễn Tiến Dũng Applied Statistics for Business
25
Chebyshev’s Theorem
● At least (1 - 1/z2) of the data values
must be within z standard deviations of the mean, where z is any value greater than 1.
1821 - 1894
● Implications:
● At least 0.75, or 75%, of the data
values must be within z = 2 standard deviations of the mean.
● At least 0.89, or 89%, of the data
values must be within z = 3 standard deviations of the mean.
● At least 0.94, or 94%, of the data
z = 4 standard
values must be within deviations of the mean.
© Nguyễn Tiến Dũng Applied Statistics for Business
26
Chebyshev’s Inequality Theorem
© Nguyễn Tiến Dũng Applied Statistics for Business
27
Empirical Rule
● 68% of observations are within 1 std. dev. from the mean.
● 95% of observations are within 2 std. dev. from the mean. ● Nearly 100% of
observations are within 3 std. dev. from the mean.
© Nguyễn Tiến Dũng Applied Statistics for Business
28
Detection of Outliers
● Outliers: Some data points may have unusually large or
unusually small values. These extreme values.
● Lower limit = Q1 – 1.5.IQR ● Upper limit = Q3 + 1.5.IQR ● If x(i) < Lower limit a low outlier ● If x(i) > Upper limit a high outlier ● If x(i) < Q1 – 3.IQR or x(i) > Q3 + 3.IQR extreme values ● For example: 1 2 3 4 10 ● Sources of outliers;
● Errors of data records be corrected ● An inappropriate observation be removed ● Correctly recorded, but unsual values be retained, but be noticed
© Nguyễn Tiến Dũng Applied Statistics for Business
29
3.4 EXPLORATORY DATA ANALYSIS
● Five number summary 1. Smallest value (Min) 2. First quartile (Q1) 3. Median (Q2) 4. Third quartile (Q3) 5. Largest value (Max)
© Nguyễn Tiến Dũng Applied Statistics for Business
30
Boxplot (Box-and-whisker plot)
© Nguyễn Tiến Dũng Applied Statistics for Business
31
3.5 MEASURES OF ASSOCIATION BETWEEN TWO VARIABLES
● Covariance: A descriptive
measure of the linear association between two variables. ● Sample covariance
● Population covariance
© Nguyễn Tiến Dũng Applied Statistics for Business
32
Example
● Question: Is there any correlation /
relationship between x and y ?
© Nguyễn Tiến Dũng Applied Statistics for Business
33
Drawing a scatter diagram
© Nguyễn Tiến Dũng Applied Statistics for Business
34
© Nguyễn Tiến Dũng Applied Statistics for Business
35
Interpretation of Sample Covariance
sXY is positive A positive linear relationship
sXY is negative A negative linear relationship
sXY is about 0 No apparent relationship
© Nguyễn Tiến Dũng Applied Statistics for Business
36
Correlation Coefficient
● Pearson Product Moment Correlation
Coefficient for Sample Data
© Nguyễn Tiến Dũng Applied Statistics for Business
37
● Pearson Product Moment Correlation
Coefficient for Population Data
© Nguyễn Tiến Dũng Applied Statistics for Business
38
© Nguyễn Tiến Dũng Applied Statistics for Business
39
● rxy > 0: a positive linear relationship ● rxy < 0: a negative linear relationship ● Absolute value of rxy: from 0 to 1
● The higher value, the tighter / closer linear
relationship
● Excel Application: ● CORREL() function
© Nguyễn Tiến Dũng Applied Statistics for Business
40
3.6 THE WEIGHTED MEAN AND WORKING WITH GROUPED DATA ● A simple mean
● A weighted mean
● Calculate: GPA (Grade Point in
Average) ● Marks of the courses: x1, x2, …, xn ● Credits of the courses: w1, w2, …, wn
© Nguyễn Tiến Dũng Applied Statistics for Business
41
Grouped Data
● Sample mean for grouped data
© Nguyễn Tiến Dũng Applied Statistics for Business
42
Sample Variance
© Nguyễn Tiến Dũng Applied Statistics for Business
43
Population Mean and Variance for Grouped Data
● Population mean
● Population variance
© Nguyễn Tiến Dũng Applied Statistics for Business
44
Exercises for Homework
Section
Exercises
3.1
1, 5, 6, 11, 16 – 7, 10 (Excel)
3.2
25, 26 – 27, 32 (Excel)
3.3
37, 41 – 44, 45 (Excel)
3.4
48, 49, 51 – 52, 53 (Excel)
3.5
55, 58 – 57, 59 (Excel)
3.6
-
Supplementary 63, 68 (Excel)
© Nguyễn Tiến Dũng Applied Statistics for Business

