Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

Chapter 4

DISCRETE CHOICE ANALYSIS:

MULTINOMIAL MODELS

We look at settings with multiple, unordered choices.

A key notion here is the “independence of irrelevant alternative” property

thi consumer

Models for discrete choice with more than two choices: We assume for the

=

U

X

ij

εβ+ ij

ij

faced with i choices (j=1,2,…,J) suppose that the utility of choice j is:

ijU is the maximum among

If the consumer makes choice j in particular, then we assume that

>

Prob(

U

U

)

J alternatives.

k ≠ j

ik

ij

for all

This is a probability of individual I makes choice j.

U >

k ≠ j

ij U

ik

for all Yi = if j

The model is made by a particular choice of distribution for the disturbances.

iY be a random variable that indicates the choice made McFadden (1974) has shown that

Let

if and only if the J disturbances are independent and identically distributed with type I

ε ij

e

=

F

)

exp(

exp(

)

−= e

ε ( ij

− ε ij

extreme value distribution:

exp(

Z

Then:

θ ) ij

ij

exp(

Z

ij

θ ) ij

= J ∑

j

= 1

= 1

j

X β ) = = Pr j ) ob Y ( i exp( J exp( X β )

ijZ which includes aspects specific to the individual (i) as well as to choice

Utility depends on

[

]

[ αβθ= , ]

,

Z = ij

wX , i ij

(j). Let

ijX varies across choices (j) (and possibly across individual (i) as well). iw contains the characteristics of the individual (i), therefore the same for all choice.

Nam T. Hoang UNE Business School

University of New England

1

• •

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

β

X

[exp(

)]

X

ij

α w ) i

+ β α w ) i

ij

=

=

=

j

Prob(

)

Y i

exp( J

J

X

exp(

+ β α w ) i

ij

X

exp(

β )

exp(

ij

α w ) i

j

= 1

j

= 1

  

exp(   

[exp(

X

β ]

ij

exp(

X

β )

ij

= J ∑

j

= 1

For example, a model of a shopping centre choices by individual:

Depends on: number of stores ijS , distance from the centre of the city Dij, and income of the

( S

Z =→ ij

ij

ID ij

)i

individual (i’) i which varies across individuals but not across the choices.

THE MULTINOMIAL LOGIT MODEL: I.

iw which is the same for all

Suppose we have only individual specifre characteristics (i)

)

α w i j

=

=

=

Prob(

)

Y i

j w i

P ij

exp( J

exp(

)

1

α w i j

+ ∑

j

= 1

choice. The model response probability as:

J

=

1

ijP

For all choices j=1,….,J.

=

j

0

For the first choice j=0 to satisfy ∑

J

j

= 1

1 = = = Pr 0 ) ( Yob i w i P io + 1 exp( ) α w j i

n

J

d

ln

= L ln 

ij

P ij

= ∑∑

=

i

= 1

j

0

The log – likelihood:

ijd =1 if alternative j is chosen by individual i, 0 if not

n

Where

ij

i

= 1

= − j=1,…,J ( d ) wP i ij ∂ L α ∂ j

Nam T. Hoang UNE Business School

University of New England

2

The marginal effects of the characteristics on probabilities:

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

J

J

=

=

=

α α − [ ]

δ ij

P ij

α P ie ek

P ij

jk

ieP

=

0

e

 α  jk 

  

∂ P ij ∂ w ik

=

0

e

α ek = ∑  α     

CONDITIONAL LOGIT MODEL: II.

)ijX instead of individual - specific characteristics

When the data consist of choice - specific (

=

=

Prob(

,

,...,

X

= ) Pr

j X

)

Y i

j X X 1 i

i

2

iJ

ob Y ( i

i

exp(

X

β )

ij

P ij

exp(

X

β )

ij

= J ∑

=

j

0

 The model is:

Notes:

jα varies

iw is unchanged 

 When

ijX varies  β is unchanged

 When

The multinomial logit model can be viewed as a special case of this suppose we have a vector

iX with dimension K. Then define for each choice j the vector of

ijX as following:

0

of individual characteristics

i

0

X 0

.

0

=

X

' iJ

' i 1

' ij

.

i

.

. 0 . = = , , X X . X

X

. .

i

         

         

0 0                                        

ijX varies for each choice

iX ( ×K )1

So

. ]0. .00[=ioX

i

.

.

.00[

X

]0.

X = ij

ij

.

Nam T. Hoang UNE Business School

University of New England

3

X [ .0 . ]0. X = 1 i

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

.

i

β 1 β 2 .

β =→

.

         

       .  Kβ  

X

β )

X

)

ij

β j

i

=

=

X .0.00[ ] X = iJ

P ij

exp( J

exp( J

+

exp(

X

β )

1

exp(

X

)

ij

β j

i

=

j

j

0

= 1

In this model, the

coefficients are not directly tied to the marginal effects:

∂ P ij = = − [ (1( β)] P ij Pmj ) im ∂ x im

(1

mj =

)

Where equals 1 if j=m and 0 if not

n

J

d

ln

= L ln 

ij

P ij

= ∑∑

i

j

= 1

= 1

Log likelihood:

III. MIXED LOGIT MODEL:

X

)

+ β α W i j

ij

=

=

For a model combines the two models:

j

)

Y ( i

exp( J

X

exp(

)

+ β α W j i

ij

= 1

j

Z

θ ) ij

=

=

Pr[

j

]

Y i

exp( J

exp(

Z

θ ) ij

= 1

j

Prob

i 1

i 1

Z X= [ 0 0 ... 0]

i

2

=

Z

[

X

0 ...

... 0]

ij

W i

ij

= Z [ 0 ... 0] X W 2 i i

iJ

iJ

Nam T. Hoang UNE Business School

University of New England

4

= Z X [ 0 ...0 ] W i

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

θ

  β   α   1   :   =   α   j   :     α   J

This model doesn’t have the advantage the same as the conditional logit model: If an

additional alternative was added to the choice set then one can predict its probability of

selection, since the parameter of the conditional logit model do not vary across alternatives.

INDEPENDENCE OF IRRELEVANT ALTERNATIVES: IV.

• The ratio of probabilities of any two alternatives is independent of the introduction of a

third alternative. This is unrealistic in many economic choice models.

is independent of the remaining • In the multinomial logit and conditional logit model P ij P im

probability called the Independence of Irrelevant Alternative.

=

=

• Consider the conditional probability of choosing j given that you choose either j or l.

j l { , })

Y ( i

j Y i

=

Pr( = j

l

)

Pr(

= j Y ) i + Y ) Pr( i

Y i

β )

=

X +

exp( β

X

X

ij ) exp(

β )

exp(

ij

il

Prob

imX of alternatives m other than j

• This probability does not depend on the characteristics

and l. The traditional example is MeFadden’s famous blue bus/red bus example.

• Suppose there are initially three choices: commuting by car, by red or by blue bus.

U

U=

i redbus ,

i bluebus ,

• People are indifferent between red versus blue buses.

=

=

X

X

X

i redbus ,

i bluebus ,

i bus ,

With the choice between the blue and red bus being random, suppose:

Nam T. Hoang UNE Business School

University of New England

5

Then suppose that the probability of commuting by bus is

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

β )

=

* X , i bus + ) exp(

β )

exp(

X

X

exp( * β , i bus

, i car

=

=

= = = = Pr( bus ) Pr(( redbus or bluebus ) Y i Y i

Pr(

)

bus

Y i

redbus Y i

1 = 2

And

• That would imply that the conditional probability commuting by car, given that one

commutes by blue or red bus, would differ from the same conditional probability if there

is no blue bus. Presumably taking away the blue bus choice would lead all the current blue

=

bus users to shift to the red bus, not to cars.

X

X

exp(

β )

β ie

ik

P ie P ik

does not depend on any alternative other than l & k. •

• The conditional logit model does not allow for this type of substitution pattern. Again,

P i

car

=

β

β

=

consider commuting initially choosing between two models of transportation, car and red

= 1

exp(

X

X

= )( 1)

car

redbus

P ic P irb

P i bus red

(

)

. bus. So

• Now suppose a third choice, blue bus is added. Assuming bus commuters do not care

about the colour of the bus, consumers will choose between these with equal probability.

= . 1

P irb ibbP

The ratio of their probabilities of taking blue bus and red bus is 1:

Pic irbP

is the same whether or not another alternative is added (blue But then IIA implies that

=

=

=

= and

1

P ic

P irb

P ibb

1 = . 3

P irb P ibb

P ic P irb

+ + bus) so we have: = and 1 P ic P irb P ibb

Which are the probabilities that the logit model predicts?

• In real life, however, we would expect the probability of taking a car to remain the same

when a new bus is introduced that is exactly the same as the old bus. We would expect the

original probability of taking the bus to be split between the two buses after the second

icP = ,

irbP = .

1 4

1 ibbP = , 4

1 2

Nam T. Hoang UNE Business School

University of New England

6

one is introduced. That is we would expect:

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

• In this case, the logit model, because of its IIA property, overestimates the probability of

Pc bbP

taking a car. The ratio of probabilities of car and bus actually changes with

introduction of the red bus, rather than remaining constant as required by the logit model.

• The same kind of misprediction arises with logit models if there is change of another

alternative.

 Suppose individuals have choice out of three restaurants: Purdue (P) restaurant,

95

kP =

cP = and 5

pP =

Krannert restaurant (K), Chauncey restaurant (C): , , 85

10

pQ =

kQ = ,

cQ = . 2

quality , 9

0.1

pS =

kS =

cS =

= −

+

Suppose that market shares for 3 restaurant are , and . 0.25 0.65

U

0.2

ij

P j

+ Q ε 2 j ij

conditional logit model .  0.1 0.65 P ip → = P ic

 Suppose that Krannert restaurant raise the price to 1000 (taking it out of business).

0.13

ipP =

icP =

logit model would predict and to satisfy 0.87  Conditional

= = const 0.1 0.65 P ip P ic

 This seems implausible  people who were planning to go to Krannert would

appear to be more likely to go to PMU than to go to the Chauncey rest so one would

0.35

pS ≈

cS ≈

expect ; 0.65

(IIA not holds in reality  conditional logit is not valid in this case)

IIA: adding another alternative or changing the characteristics of a third alternative does

not affects the ratio between two alternatives.

• Test of IIA

Hausman & MeFadden offer tests of the IIA assumption based on the observation that: If

the conditional logit model is true, β can be consistently estimated by conditional logit by

focusing on any subset of alternative. Using Hausman’s test to compare the estimate of β,

'

− 1

using all alternative with the estimate, using a subset of alternatives:

] (

( ˆ ˆ − ββ f

s

)[ ˆ ˆ − VV s

f

) ˆ ~ˆ 2 χββ f

s

Nam T. Hoang UNE Business School

University of New England

7

ˆ ˆ H β β= s f :o s: restricted subset, f: full subset

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

• We need IIA holds to apply the conditional logit model

=

+

U

ij

X β ε ij

ij

If reject Ho  IIA not holds  conditional logit is not valid model in this case.

The IIA assumption need to hold in reality to apply the conditional logit model. 

ijε are extreme value

The IIA property follows from the initial assumption that 

distributions.

NESTED LOGIT MODEL. V.

If the test of IIA fails (reject ) then the conditional logit model is not valid. • ˆ ˆ H β β= s f :o

We need to modify the multinomial logit model.

One way to introduce correlation between the choices is through nesting them. Suppose

s

the set of choices {0 , 1,…, J} can be partitioned into S sets B1, B2 ,…, Bs , so that the

} = J U B s s

= 1

full set of choices can be written as: { 0,1,...,

*

Let Zs be set – specific characteristics (Branch characteristics) Mc Fadden (1981) studied

the following model: Adjusted with

exp(

=

=

Pr(

)

Y i

j X Y B , i s

i

− 1 ρ β X ) s ij − 1 ρ β exp( X ) s

il

∈ l B s

2

• Conditional probability:

(1

)sρ−

ijε is equal to

. Between the sets • Within the sets, the correlation coefficient for

sρ in each group.

ijε are independent  adjusted the probabilities by

the

The probability of a choice in the set Bs is

ρ )] s

− 1 ρ β X s

il

∈ l B s

exp( Z )[ exp( α s

s

i

s

ρ s

− 1 ρ β X t

il

= 1

t

∈ l B t

=

Pr(

j X

)

Y i

i

∈ = Pr( ) Y B X i [exp( Z )( exp( )) ] α t

sρ = for all s, then

1

Nam T. Hoang UNE Business School

University of New England

8

If we fix

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

+

exp(

X

β α ) Z

s

ij

=

=

Pr(

j X

)

Y i

i

s

exp(

X

+ β α Z )

t

il

∑∑

t

= ∈ 1

l B t

and we are back in the conditional logit model

In the first:

 In general this model corresponds to individuals choosing the option with the highest

=

+

+

U

X

ij

Zβ α ε s ij

ij

utility, where the utility of choice j in set Bs for individuals i is

ijε is

S

Mc Fadden suppose that: the joint distribution function of the

ρ )s

s

= 1

∈ j B s

= F ,....., ) exp( exp( )) ε ( io ε iJ ρ ε− 1 t ij −∑ ∑ (

From this he derive  the results in the previous page

• How do we estimate these models?

 One approach is to construct the log – likelihood and directly maximize it. That is

complicated, especially since the log likelihood function is not concave (but this also

not impossible)

1

1

 An easier alternative is to directly use the nesting structure. Within a nest we have a

sρ β−

sρ β−

conditional logit model with coefficient . Hence we can directly estimate

using the concavity of the conditional logit model ( Newton – Raphson procedure will

ˆ − 1 = λβρ s s

converge to a global maximum). Denote these estimate of .

sρ and αthrough:

ρ s

exp(

exp(

Z

X

)ˆ λ s

il

∈ Bl s

   

  α )  s 

=

Pr(

)

( Y i

XB s

i

ρ s

S

exp

exp(

( Z

X

)ˆ λ t

il

t

= 1

∈ Bl s

   

 )  α  t 

   

   

Z

)ˆ ρα + W s s s

exp(

Z

ˆ ρα + W t t t

exp( = S ∑

t

= 1

Nam T. Hoang UNE Business School

University of New England

9

 Then the probability of a particular set Bs can be used to estimate

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

sWˆ is called: “inclusive values”

exp(

X

ˆ W s

il

∈ sBl

 = ∑  ln  

 )ˆ  λ  s 

Where:

n

n

Z

ˆ ρα + W s s s

=

=

Pr(

X )) i

Y i

B s

= 1i

s

∏ ∏ ( Y= i 1 ∈ Bi

exp(

Z

ˆ ρα + W t t t

exp( s ∑

t

= 1

     

     

  ∏ ∏   ∈ i BY s  

     

• We have another conditional logit model with likelihood function:

• These models can be extended too many lagers of nests. It should be noted that both the

order of the nests and the elements of each nest are very important.

VI. MULTINOMIAL PROBIT MODEL:

• A natural alternative model to avoid the IIA problem which is caused by correlation

~

N

(.))

ijε (

. Now we will not across choices is to work with normally distributed errors

ijε ~ Extreme value distribution anymore.

assume

• Note that: extreme value ≈ normal distribution, but EV distribution is much easier to

calculate.

=

U

,...,2,1=

j

J

ijX

εβ+ ij

ij

0

0

i

X i X

εβ + 0 i εβ + 1 1 i i

=

=

U

:

i

X

iJ

: εβ + iJ

iJ

 U  U  1 i  :  :   U 

       

       

       

• The cost of using normal distribution is the complicated likelihood function.

=

X

~

N

(0,

)

ε 0 i ε 1 i :

ε i

i

: ε iJ

       

       

Nam T. Hoang UNE Business School

University of New England

10

With:

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

=

>

=

=

With unrestricted covariance matrix ∑

J

U

q

,...,1

)

Pr[

Pr(

j ≠

]q

Y i

jU , ij

iq

<

<

=

=

(

β ] )

;...,

(

Pr(

)

Pr[

X

X

X

q

X

iq

iJ

εεβ ) iJ iq

εε − 1 i iq

iq

1 i

Y i

, or

• The main obstacle to the implementation of the Multinomial probit model is the difficulty

in computing the multivariate normal probabilities for any J > 2.

• Recent results on accurate simulation of multinomial integrals have made estimation of

MNP model feasible.

• Read: Geweke, Keane and Runkle (1994) – RE Statistics 76, No4 for the method, if you

want to use the MN Probit model.

• For J = 3

i 1

i

3

i 1

+∞

+∞

+∞

=

<

i 1

i

2

=

=

)1

P

( yP i

∫ ∫ ∫

=

<

( (

u 1 u

X X

X X

2

εε − i i 1 2 εε − i 1 3

i

i 1

i

3

∞−

∞−

∞−

  

β  ) = β ) 

1

*

~

N

(0,

)

U U

2

  

  

1

* =∑

→ = = > > )1 UP ( U ) yP ( i UU ; i 2

011 011

 ∑ 

  

1 0

1

 1  0   

    

Where:

• Each element of the likelihood is a double integral and must be evaluated numerically.

• This model does not suffer from the IIA problem.

VII. ORDERED LOGIT, ORDERED PROBIT: & SEQUENTIAL MODELS

εβ+

*Y is unobservable:

Y * i

= i X

*

=

0

if

Y i

Y i *

=

<

1

if

Y i

*

=

2

if

0 µ 1

≤ 0 µ 1 µ 2

Y i

Y i ≤ Y i

:

:

*

=

J

if

µ −

1

J

Y i

         Y  i

Nam T. Hoang UNE Business School

University of New England

11

1. Ordered Probit:

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

μ1,μ2,…μJ-1, are unknown parameters to be estimate with β.

Assume that εis normally distributed across observations.

~ Nε

)1,0(

=

0

X

( = Φ −

Normalize the mean and variance of ε, .

)

i

) X β i

=

) β

( −Φ−

1

X

X

X

i

i

( −Φ= µ 1

i

i

=

Φ=

) β

2

X

X

X

( P y i ( yP ( yP

) )

i

i

( µ 2

( −Φ− µ 1

i

i

We have:

:

=

XJ

Φ−= 1

X

( yP

)

i

i

( µ J

−1

i

:

1

− 1

< We must have: (for all the probabilities to be positive) 0 << ... µµ < 2

=

=

j

)

Pr(Y i

∈ j [1,...,J] all observations

Likelihood function:

Marginal Effeds:

i

0 X ) −= X ) ( ββφ i k ∂ = YP ( i χ ∂ ik

i

2

i

− 1

i

Xj ) = − − X ) ( X µφ ([ j − µφβ j ββ )] k ∂ = YP ( i χ ∂ ik

i

1 −

i

XJ ) = X µφ ([ J ββ )] k ∂ = YP ( i χ ∂ ik

2. Ordered Logit:

=

=

)

β )

( XF

( XF

i

exp( +

exp( +

X exp(

)

1

X exp(

1

) X

β ) i β ) X

i

Replace Φ with the logit function

gives the ordered logit model.

3. Sequential Multinomial Models:

A Special case of an ordered variable (where choices have a natural ranking) is a

sequential variable. This occurs when second event is dependent on the first event, the

Nam T. Hoang UNE Business School

University of New England

12

third event is dependent on the previous two events, …  Person i at nth category means person i has been all (n-1) previous categories:

Advanced Econometrics - Part II

Chapter 4: Discrete choice analysis: Multinomial Models

not

high

school

=

2

highschool

,

not

college

yi

3

 1    

=

=

=

[

]

[

Pr

y

2

Pr

Pr

y

y

]1

2

y

] ×≠ 1

college [

i

i

i

i

Φ=

Φ−

(

X

1)(

(

X

))

β 2 2

β 11

n

m

=

L

ln

ln

y ij

p ij

= ∑∑

i

j

= 1

= 1

= Φ

(

)

= − 1

p

The parameters β1 and β2 can be estimated by maximizing the log-likelihood:

p 1 i

X β 1 i 1

p 3

i

p 1 i

2

i

, p2i is given in the preceding equation and

i

i

Nam T. Hoang UNE Business School

University of New England

13

= ≠ Notes: means )2 yP ( 2 and y )1 ( =iyP