International Choice Modelling Conference, International Choice Modelling Conference 2015

Font Size: 
The use of heuristic optimization algorithms to facilitate maximum simulated likelihood estimation of random parameter logit models
Arne Risa Hole, Hong Il Yoo

Last modified: 11 May 2015


With an increase in desktop computing power, the estimation of the random parameter logit model (RPL) has become increasingly common in empirical applications. Also known as mixed logit, RPL provides a flexible framework for analyzing discrete choice data. RPL can approximate any random utility maximization model arbitrarily well subject to specifying a suitable joint distribution of parameters (McFadden and Train, 2000), and readily accommodate interpersonal preference heterogeneity while also addressing panel correlation across repeated choices by the same person (Revelt and Train, 1998). Related applications can be found in several research areas to which the structural analysis of individual preferences is relevant, including environmental economics (Layton and Brown, 2000), labor economics (van Soest et al., 2002), transportation economics (Small et al., 2005), international economics (Basile et al., 2008), and health economics (Sivey et al., 2012).     

While RPL is specified by augmenting the parameters of the multinomial logit model (MNL) with random heterogeneity, RPL poses estimation issues which are absent in the context of MNL. Perhaps the most well-known issue is that the RPL likelihood often lacks a closed-form expression and needs be approximated by a simulated integral. The associated computational challenges have motivated several studies to explore the use of alternative density simulators to obtain the best approximation from a given number of draws (Train, 2009, pp.205-236). They have also motivated studies on the use of alternative estimation methods (Huber and Train, 2001; Harding and Hausman, 2007; Train, 2008) which are less computationally demanding than the method of maximum simulated likelihood (MSL), though MSL still remains by far the most popular due to its ready applicability to various RPL specifications.   

This paper proposes an estimation strategy to address another prominent issue, on which limited practical guidance exists nevertheless. Specifically, in contrast with its MNL counterpart, the RPL likelihood is not globally concave and may feature several local maxima. As in other similar contexts of non-linear estimation, the selection of ``good’’ starting values for estimated parameters is crucial to avoiding potentially false inferences based on the estimates associated with inferior local maxima. But in the RPL literature, an explicit discussion of which starting values have been used is rarely presented, and how to proceed with obtaining ``good’’ starting values has not been the subject of inquiry as far as we know. At least on the basis of a few studies reporting their starting value search processes (Greene and Hensher, 2010, p.418; Knox et al., 2013, p.74), the conventional practice seems to be taking starting values from the estimated special cases of a preferred RPL specification.

Our proposed estimation strategy makes joint use of heuristic optimization algorithms and conventional gradient-based algorithms to obtain the MSL estimates of RPL. The central idea is to use heuristic algorithms to locate a starting point which is likely to be close to the global maximum, and then to use gradient-based algorithms to refine this point further to a local maximum which thus stands a good chance of being the global maximum. For the heuristic search step, we consider two parsimonious but effective algorithms which can be easily implemented by non-specialists in heuristic optimization: the differential evolution (DE) algorithm (Storn and Price, 1997) and the particle swarm optimization (PSO) algorithm (Eberhart and Kennedy, 1995). Sometimes called global search routines (Fox, 2007, p.1013), these population-based algorithms are well-suited to the task of locating candidate solutions away from inferior maxima, as they search comprehensively over the parametric space in looking for the directions of improvement. As other gradient-free algorithms, however, they tend to be much slower than gradient-based algorithms in refining a candidate solution to a nearby maximum. Our estimation strategy exploits the global search efficiency of the population-based heuristics and the local search efficiency of gradient-based algorithms, in the sense of Dorsey and Mayer (1995).

We investigate the performance of the DE- and PSO-assisted estimation strategies in four different empirical data sets of varied sizes. While these strategies can be applied to the estimation of any RPL specification, the four case studies primarily focus on the generalized multinomial logit model (GMNL) of Fiebig et al. (2010). GMNL extends the traditional specification featuring normally distributed coefficients by accommodating interpersonal variations in the overall scale of utility, and tends to perform favorably against other extensions and variants of the traditional specification (Keane and Wasi, 2013). GMNL has been rapidly gaining influence in the empirical literature, as partly attested by its availability as canned commands in software packages like NLOGIT and Stata despite its relative novelty.              


We find that the DE-assisted strategy can be a very useful tool to check for the adequacy of the estimates that have been obtained by following (what is likely to be) the conventional practice. In all four data sets, the DE-assisted strategy locates solutions which attain higher likelihoods than the best conventionally obtained solutions do. Since the updating rules employed by the heuristic algorithms are partly random, the DE- and PSO-assisted strategies may find different solutions over different estimation runs. Under most computational settings we have explored, the DE-assisted strategy finds those improved solutions with high enough empirical frequencies to suggest that a small number of DE-assisted estimation runs would be sufficient for detecting whether a preferred conventional solution is at an inferior maximum. While the PSO-assisted strategy also locates solutions improving on the best conventional solutions in all four data sets, it does so with much smaller empirical frequencies. Moreover, in each data set, the best solution that attains the highest likelihood we have found comes from the DE-assisted strategy.    


In terms of likelihood values, the best DE-assisted solution is always farther from the best conventional solution than the latter is from the worst conventional solution that displays acceptable convergence diagnostics. Yet, in terms of possible analytic conclusions, the best DE-assisted and best conventional solutions show much more agreement than the best and worst conventional solutions. It appears that many of policy implications drawn from a carefully selected solution could stand, even when the solution is at an inferior maximum. 

Conference registration is required in order to view papers.