Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
Chapter 4
DISCRETE CHOICE ANALYSIS:
MULTINOMIAL MODELS
We look at settings with multiple, unordered choices.
A key notion here is the “independence of irrelevant alternative” property
thi consumer
Models for discrete choice with more than two choices: We assume for the
=
U
X
ij
εβ+ ij
ij
faced with i choices (j=1,2,…,J) suppose that the utility of choice j is:
ijU is the maximum among
If the consumer makes choice j in particular, then we assume that
→
>
Prob(
U
U
)
J alternatives.
k ≠ j
ik
ij
for all
This is a probability of individual I makes choice j.
U >
k ≠ j
ij U
ik
for all Yi = if j
The model is made by a particular choice of distribution for the disturbances.
iY be a random variable that indicates the choice made McFadden (1974) has shown that
Let
if and only if the J disturbances are independent and identically distributed with type I
−
ε ij
e
=
−
F
)
exp(
exp(
)
−= e
ε ( ij
− ε ij
extreme value distribution:
exp(
Z
Then:
θ ) ij
ij
exp(
Z
ij
θ ) ij
∑
= J ∑
j
= 1
= 1
j
X β ) = = Pr j ) ob Y ( i exp( J exp( X β )
ijZ which includes aspects specific to the individual (i) as well as to choice
Utility depends on
[
]
[ αβθ= , ]
,
Z = ij
wX , i ij
(j). Let
ijX varies across choices (j) (and possibly across individual (i) as well). iw contains the characteristics of the individual (i), therefore the same for all choice.
Nam T. Hoang UNE Business School
University of New England
1
• •
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
β
X
[exp(
)]
X
ij
α w ) i
+ β α w ) i
ij
=
=
=
j
Prob(
)
Y i
exp( J
J
X
exp(
+ β α w ) i
ij
∑
X
exp(
β )
exp(
ij
α w ) i
j
= 1
∑
j
= 1
exp(
[exp(
X
β ]
ij
exp(
X
β )
ij
= J ∑
j
= 1
For example, a model of a shopping centre choices by individual:
Depends on: number of stores ijS , distance from the centre of the city Dij, and income of the
( S
Z =→ ij
ij
ID ij
)i
individual (i’) i which varies across individuals but not across the choices.
THE MULTINOMIAL LOGIT MODEL: I.
iw which is the same for all
Suppose we have only individual specifre characteristics (i)
)
α w i j
=
=
=
Prob(
)
Y i
j w i
P ij
exp( J
exp(
)
1
α w i j
+ ∑
j
= 1
choice. The model response probability as:
J
=
1
ijP
For all choices j=1,….,J.
=
j
0
For the first choice j=0 to satisfy ∑
J
∑
j
= 1
1 = = = Pr 0 ) ( Yob i w i P io + 1 exp( ) α w j i
n
J
d
ln
= L ln
ij
P ij
= ∑∑
=
i
= 1
j
0
The log – likelihood:
ijd =1 if alternative j is chosen by individual i, 0 if not
n
Where
ij
∑
i
= 1
= − j=1,…,J ( d ) wP i ij ∂ L α ∂ j
Nam T. Hoang UNE Business School
University of New England
2
The marginal effects of the characteristics on probabilities:
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
J
J
=
=
−
=
α α − [ ]
δ ij
P ij
α P ie ek
P ij
jk
ieP
∑
=
0
e
α jk
∂ P ij ∂ w ik
=
0
e
α ek = ∑ α
CONDITIONAL LOGIT MODEL: II.
)ijX instead of individual - specific characteristics
When the data consist of choice - specific (
=
=
Prob(
,
,...,
X
= ) Pr
j X
)
Y i
j X X 1 i
i
2
iJ
ob Y ( i
i
exp(
X
β )
ij
P ij
exp(
X
β )
ij
= J ∑
=
j
0
The model is:
Notes:
jα varies
iw is unchanged
When
ijX varies β is unchanged
When
The multinomial logit model can be viewed as a special case of this suppose we have a vector
iX with dimension K. Then define for each choice j the vector of
ijX as following:
0
of individual characteristics
i
0
X 0
.
0
=
X
' iJ
' i 1
' ij
.
i
.
. 0 . = = , , X X . X
X
. .
i
0 0
ijX varies for each choice
iX ( ×K )1
So
. ]0. .00[=ioX
i
.
.
.00[
X
]0.
X = ij
ij
.
Nam T. Hoang UNE Business School
University of New England
3
X [ .0 . ]0. X = 1 i
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
.
i
β 1 β 2 .
β =→
.
. Kβ
X
β )
X
)
ij
β j
i
=
=
X .0.00[ ] X = iJ
P ij
exp( J
exp( J
+
exp(
X
β )
1
exp(
X
)
ij
β j
i
∑
∑
=
j
j
0
= 1
In this model, the
coefficients are not directly tied to the marginal effects:
∂ P ij = = − [ (1( β)] P ij Pmj ) im ∂ x im
(1
mj =
)
Where equals 1 if j=m and 0 if not
n
J
d
ln
= L ln
ij
P ij
= ∑∑
i
j
= 1
= 1
Log likelihood:
III. MIXED LOGIT MODEL:
X
)
+ β α W i j
ij
=
=
For a model combines the two models:
j
)
Y ( i
exp( J
X
exp(
)
+ β α W j i
ij
∑
= 1
j
Z
θ ) ij
→
=
=
Pr[
j
]
Y i
exp( J
exp(
Z
θ ) ij
∑
= 1
j
Prob
i 1
i 1
Z X= [ 0 0 ... 0]
i
2
=
Z
[
X
0 ...
... 0]
ij
W i
ij
= Z [ 0 ... 0] X W 2 i i
iJ
iJ
Nam T. Hoang UNE Business School
University of New England
4
= Z X [ 0 ...0 ] W i
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
θ
β α 1 : = α j : α J
This model doesn’t have the advantage the same as the conditional logit model: If an
additional alternative was added to the choice set then one can predict its probability of
selection, since the parameter of the conditional logit model do not vary across alternatives.
INDEPENDENCE OF IRRELEVANT ALTERNATIVES: IV.
• The ratio of probabilities of any two alternatives is independent of the introduction of a
third alternative. This is unrealistic in many economic choice models.
is independent of the remaining • In the multinomial logit and conditional logit model P ij P im
probability called the Independence of Irrelevant Alternative.
=
∈
=
• Consider the conditional probability of choosing j given that you choose either j or l.
j l { , })
Y ( i
j Y i
=
Pr( = j
l
)
Pr(
= j Y ) i + Y ) Pr( i
Y i
β )
=
X +
exp( β
X
X
ij ) exp(
β )
exp(
ij
il
Prob
imX of alternatives m other than j
• This probability does not depend on the characteristics
and l. The traditional example is MeFadden’s famous blue bus/red bus example.
• Suppose there are initially three choices: commuting by car, by red or by blue bus.
U
U=
i redbus ,
i bluebus ,
• People are indifferent between red versus blue buses.
=
=
X
X
X
i redbus ,
i bluebus ,
i bus ,
With the choice between the blue and red bus being random, suppose:
Nam T. Hoang UNE Business School
University of New England
5
Then suppose that the probability of commuting by bus is
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
β )
=
* X , i bus + ) exp(
β )
exp(
X
X
exp( * β , i bus
, i car
=
=
= = = = Pr( bus ) Pr(( redbus or bluebus ) Y i Y i
Pr(
)
bus
Y i
redbus Y i
1 = 2
And
• That would imply that the conditional probability commuting by car, given that one
commutes by blue or red bus, would differ from the same conditional probability if there
is no blue bus. Presumably taking away the blue bus choice would lead all the current blue
=
−
bus users to shift to the red bus, not to cars.
X
X
exp(
β )
β ie
ik
P ie P ik
does not depend on any alternative other than l & k. •
• The conditional logit model does not allow for this type of substitution pattern. Again,
P i
car
=
β
−
β
=
consider commuting initially choosing between two models of transportation, car and red
= 1
exp(
X
X
= )( 1)
car
redbus
P ic P irb
P i bus red
(
)
. bus. So
• Now suppose a third choice, blue bus is added. Assuming bus commuters do not care
about the colour of the bus, consumers will choose between these with equal probability.
= . 1
P irb ibbP
The ratio of their probabilities of taking blue bus and red bus is 1:
Pic irbP
is the same whether or not another alternative is added (blue But then IIA implies that
=
=
=
= and
1
P ic
P irb
P ibb
1 = . 3
P irb P ibb
P ic P irb
+ + bus) so we have: = and 1 P ic P irb P ibb
Which are the probabilities that the logit model predicts?
• In real life, however, we would expect the probability of taking a car to remain the same
when a new bus is introduced that is exactly the same as the old bus. We would expect the
original probability of taking the bus to be split between the two buses after the second
icP = ,
irbP = .
1 4
1 ibbP = , 4
1 2
Nam T. Hoang UNE Business School
University of New England
6
one is introduced. That is we would expect:
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
• In this case, the logit model, because of its IIA property, overestimates the probability of
Pc bbP
taking a car. The ratio of probabilities of car and bus actually changes with
introduction of the red bus, rather than remaining constant as required by the logit model.
• The same kind of misprediction arises with logit models if there is change of another
alternative.
Suppose individuals have choice out of three restaurants: Purdue (P) restaurant,
95
kP =
cP = and 5
pP =
Krannert restaurant (K), Chauncey restaurant (C): , , 85
10
pQ =
kQ = ,
cQ = . 2
quality , 9
0.1
pS =
kS =
cS =
= −
+
Suppose that market shares for 3 restaurant are , and . 0.25 0.65
U
0.2
ij
P j
+ Q ε 2 j ij
conditional logit model . 0.1 0.65 P ip → = P ic
Suppose that Krannert restaurant raise the price to 1000 (taking it out of business).
0.13
ipP =
icP =
logit model would predict and to satisfy 0.87 Conditional
= = const 0.1 0.65 P ip P ic
This seems implausible people who were planning to go to Krannert would
appear to be more likely to go to PMU than to go to the Chauncey rest so one would
0.35
pS ≈
cS ≈
expect ; 0.65
(IIA not holds in reality conditional logit is not valid in this case)
IIA: adding another alternative or changing the characteristics of a third alternative does
not affects the ratio between two alternatives.
• Test of IIA
Hausman & MeFadden offer tests of the IIA assumption based on the observation that: If
the conditional logit model is true, β can be consistently estimated by conditional logit by
focusing on any subset of alternative. Using Hausman’s test to compare the estimate of β,
'
− 1
using all alternative with the estimate, using a subset of alternatives:
−
] (
( ˆ ˆ − ββ f
s
)[ ˆ ˆ − VV s
f
) ˆ ~ˆ 2 χββ f
s
Nam T. Hoang UNE Business School
University of New England
7
ˆ ˆ H β β= s f :o s: restricted subset, f: full subset
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
• We need IIA holds to apply the conditional logit model
=
+
U
ij
X β ε ij
ij
If reject Ho IIA not holds conditional logit is not valid model in this case.
•
The IIA assumption need to hold in reality to apply the conditional logit model.
ijε are extreme value
The IIA property follows from the initial assumption that
distributions.
NESTED LOGIT MODEL. V.
If the test of IIA fails (reject ) then the conditional logit model is not valid. • ˆ ˆ H β β= s f :o
We need to modify the multinomial logit model.
One way to introduce correlation between the choices is through nesting them. Suppose
s
the set of choices {0 , 1,…, J} can be partitioned into S sets B1, B2 ,…, Bs , so that the
} = J U B s s
= 1
full set of choices can be written as: { 0,1,...,
*
Let Zs be set – specific characteristics (Branch characteristics) Mc Fadden (1981) studied
sρ
the following model: Adjusted with
exp(
=
∈
=
Pr(
)
Y i
j X Y B , i s
i
− 1 ρ β X ) s ij − 1 ρ β exp( X ) s
il
∑
∈ l B s
2
• Conditional probability:
(1
)sρ−
ijε is equal to
. Between the sets • Within the sets, the correlation coefficient for
sρ in each group.
ijε are independent adjusted the probabilities by
the
The probability of a choice in the set Bs is
ρ )] s
− 1 ρ β X s
il
∑
∈ l B s
exp( Z )[ exp( α s
s
i
s
ρ s
− 1 ρ β X t
il
∑
∑
= 1
t
∈ l B t
→
=
Pr(
j X
)
Y i
i
∈ = Pr( ) Y B X i [exp( Z )( exp( )) ] α t
sρ = for all s, then
1
Nam T. Hoang UNE Business School
University of New England
8
If we fix
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
+
exp(
X
β α ) Z
s
ij
=
=
Pr(
j X
)
Y i
i
s
exp(
X
+ β α Z )
t
il
∑∑
t
= ∈ 1
l B t
and we are back in the conditional logit model
In the first:
In general this model corresponds to individuals choosing the option with the highest
=
+
+
U
X
ij
Zβ α ε s ij
ij
utility, where the utility of choice j in set Bs for individuals i is
ijε is
S
Mc Fadden suppose that: the joint distribution function of the
ρ )s
s
= 1
∈ j B s
= F ,....., ) exp( exp( )) ε ( io ε iJ ρ ε− 1 t ij −∑ ∑ (
From this he derive the results in the previous page
• How do we estimate these models?
One approach is to construct the log – likelihood and directly maximize it. That is
complicated, especially since the log likelihood function is not concave (but this also
not impossible)
1
1
An easier alternative is to directly use the nesting structure. Within a nest we have a
sρ β−
sρ β−
conditional logit model with coefficient . Hence we can directly estimate
using the concavity of the conditional logit model ( Newton – Raphson procedure will
ˆ − 1 = λβρ s s
converge to a global maximum). Denote these estimate of .
sρ and αthrough:
ρ s
exp(
exp(
Z
X
)ˆ λ s
il
∑
∈ Bl s
α ) s
∈
=
Pr(
)
( Y i
XB s
i
ρ s
S
exp
exp(
( Z
X
)ˆ λ t
il
∑
∑
t
= 1
∈ Bl s
) α t
Z
)ˆ ρα + W s s s
exp(
Z
ˆ ρα + W t t t
exp( = S ∑
t
= 1
Nam T. Hoang UNE Business School
University of New England
9
Then the probability of a particular set Bs can be used to estimate
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
sWˆ is called: “inclusive values”
exp(
X
ˆ W s
il
∈ sBl
= ∑ ln
)ˆ λ s
Where:
n
n
Z
ˆ ρα + W s s s
=
∈
=
Pr(
X )) i
Y i
B s
= 1i
s
∏ ∏ ( Y= i 1 ∈ Bi
exp(
Z
ˆ ρα + W t t t
exp( s ∑
t
= 1
∏ ∏ ∈ i BY s
• We have another conditional logit model with likelihood function:
• These models can be extended too many lagers of nests. It should be noted that both the
order of the nests and the elements of each nest are very important.
VI. MULTINOMIAL PROBIT MODEL:
• A natural alternative model to avoid the IIA problem which is caused by correlation
~
N
(.))
ijε (
. Now we will not across choices is to work with normally distributed errors
ijε ~ Extreme value distribution anymore.
assume
• Note that: extreme value ≈ normal distribution, but EV distribution is much easier to
calculate.
=
U
,...,2,1=
j
J
ijX
εβ+ ij
ij
0
0
i
X i X
εβ + 0 i εβ + 1 1 i i
=
=
U
:
i
X
iJ
: εβ + iJ
iJ
U U 1 i : : U
• The cost of using normal distribution is the complicated likelihood function.
=
∑
X
~
N
(0,
)
ε 0 i ε 1 i :
ε i
i
: ε iJ
Nam T. Hoang UNE Business School
University of New England
10
With:
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
=
>
=
=
With unrestricted covariance matrix ∑
J
U
q
,...,1
)
Pr[
Pr(
j ≠
]q
Y i
jU , ij
iq
−
−
<
−
<
=
=
(
β ] )
;...,
(
Pr(
)
Pr[
X
X
X
q
X
iq
iJ
εεβ ) iJ iq
εε − 1 i iq
iq
1 i
Y i
, or
• The main obstacle to the implementation of the Multinomial probit model is the difficulty
in computing the multivariate normal probabilities for any J > 2.
• Recent results on accurate simulation of multinomial integrals have made estimation of
MNP model feasible.
• Read: Geweke, Keane and Runkle (1994) – RE Statistics 76, No4 for the method, if you
want to use the MN Probit model.
• For J = 3
i 1
i
3
i 1
+∞
+∞
+∞
=
<
−
i 1
i
2
→
=
=
)1
P
( yP i
∫ ∫ ∫
=
<
−
( (
u 1 u
X X
X X
2
εε − i i 1 2 εε − i 1 3
i
i 1
i
3
∞−
∞−
∞−
β ) = β )
1
*
∑
~
N
(0,
)
U U
2
−
−
1
−
* =∑
→ = = > > )1 UP ( U ) yP ( i UU ; i 2
−
011 011
∑
1 0
1
1 0
Where:
• Each element of the likelihood is a double integral and must be evaluated numerically.
• This model does not suffer from the IIA problem.
VII. ORDERED LOGIT, ORDERED PROBIT: & SEQUENTIAL MODELS
εβ+
*Y is unobservable:
Y * i
= i X
*
=
0
if
Y i
Y i *
=
≤
<
1
if
Y i
*
=
≤
2
if
0 µ 1
≤ 0 µ 1 µ 2
Y i
Y i ≤ Y i
:
:
*
=
≤
J
if
µ −
1
J
Y i
Y i
Nam T. Hoang UNE Business School
University of New England
11
1. Ordered Probit:
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
μ1,μ2,…μJ-1, are unknown parameters to be estimate with β.
Assume that εis normally distributed across observations.
~ Nε
)1,0(
=
0
X
( = Φ −
Normalize the mean and variance of ε, .
)
i
) X β i
=
) β
( −Φ−
)β
1
X
X
X
i
i
( −Φ= µ 1
i
i
=
Φ=
−
) β
)β
2
X
X
X
( P y i ( yP ( yP
) )
i
i
( µ 2
( −Φ− µ 1
i
i
We have:
:
=
−
)β
XJ
Φ−= 1
X
( yP
)
i
i
( µ J
−1
i
:
1
Jµ
− 1
< We must have: (for all the probabilities to be positive) 0 << ... µµ < 2
=
=
j
)
Pr(Y i
∏
∈ j [1,...,J] all observations
Likelihood function:
Marginal Effeds:
i
0 X ) −= X ) ( ββφ i k ∂ = YP ( i χ ∂ ik
i
−
2
i
− 1
i
Xj ) = − − X ) ( X µφ ([ j − µφβ j ββ )] k ∂ = YP ( i χ ∂ ik
i
−
1 −
i
XJ ) = X µφ ([ J ββ )] k ∂ = YP ( i χ ∂ ik
2. Ordered Logit:
=
=
)
β )
( XF
( XF
i
exp( +
exp( +
X exp(
)
1
X exp(
1
) X
β ) i β ) X
i
Replace Φ with the logit function
gives the ordered logit model.
3. Sequential Multinomial Models:
A Special case of an ordered variable (where choices have a natural ranking) is a
sequential variable. This occurs when second event is dependent on the first event, the
Nam T. Hoang UNE Business School
University of New England
12
third event is dependent on the previous two events, … Person i at nth category means person i has been all (n-1) previous categories:
Advanced Econometrics - Part II
Chapter 4: Discrete choice analysis: Multinomial Models
not
high
school
=
2
highschool
,
not
college
yi
3
1
=
=
≠
=
[
]
[
Pr
y
2
Pr
Pr
y
y
]1
2
y
] ×≠ 1
college [
i
i
i
i
Φ=
Φ−
(
X
1)(
(
X
))
β 2 2
β 11
n
m
=
L
ln
ln
y ij
p ij
= ∑∑
i
j
= 1
= 1
= Φ
−
(
)
= − 1
p
The parameters β1 and β2 can be estimated by maximizing the log-likelihood:
p 1 i
X β 1 i 1
p 3
i
p 1 i
2
i
, p2i is given in the preceding equation and
i
i
Nam T. Hoang UNE Business School
University of New England
13
= ≠ Notes: means )2 yP ( 2 and y )1 ( =iyP

