International Choice Modelling Conference, International Choice Modelling Conference 2017

Font Size: 
Incorporating External Information into Discrete Choice Models with Incomplete Choice Data

Last modified: 28 March 2017


This paper studies the estimation of discrete choice models when the exact alternative chosen is not fully observed, but we do observe that the choice is from a subgroup of the total choice set. This situation occurs in vehicle choice where typically used survey data only record make, year and model of the vehicle but do not record the exact trim line.  For example in many studies we observe that the household chose a 2013 Honda Civic, but do not know if it is a 2013 Honda Civic DX, LX, or EX. Similarly residential location choice data frequently only record the neighborhood chosen but not the details of the actual residence. In many of these examples there are also external data that give aggregate market shares at the exact choice level, and these external data can be used to improve estimation of choice model parameters.

Berry, Levinsohn, and Pakes (BLP) developed a methodology for incorporating external market share information when discrete choices are completely observed. Brownstone and Li (2014) proposed a maximum likelihood estimator for the case where choices are not completely observed, and Wong (2015) showed how this estimator could be extended to include external market share data using Generalized Minimum Distance estimation. This paper looks more carefully at which parameters can be identified in this situation, and also considers the case where the external information is also subject to error. For example in the automobile choice case considered in Wong (2015) the survey data covers vehicles purchased by households for personal use, but the external market share data also include purchases from commercial fleet users, leading to measurement error in the data.

This paper derives the likelihood function for the incomplete choice data case where the underlying choice model is conditional logit.  We show analytically that incomplete choice data always reduces the precision of the maximum likelihood estimators relative to the case where choices are completely observed. Even though the underlying choice model is conditional logit, the likelihood function with partial choice observability is not globally concave. However for the Monte Carlo examples considered in this paper we did not encounter numerical convergence problems with maximum likelihood (ML) estimation. We then go on to consider maximum likelihood estimation when exact market share are available and carry out Monte Carlo studies that show that these market share data permit identification of alternative specific constants for the elemental alternatives. Furthermore, instead of using the contraction mapping algorithm proposed in BLP, we propose several improved algorithms to incorporate the market shares.

The remainder of the paper shows how to incorporate uncertainty in the external market share information using a Bayesian framework.  This uncertainty is specified by an informative prior distribution on the market shares.  We use the multivariate Dirichlet distribution to represent this uncertainty.  The BLP fixed point algorithm can be used to map this prior on the market shares to a prior distribution on the alternative specific parameters of the elemental alternatives. Almost all utility specifications used in applied work contain interactions between generic variables and alternative indicators, so it is unrealistic to postulate an independent prior for the coefficients of the generic variables and the coefficients of the alternative-specific constants.  We therefore propose using a training sample to generate the prior distribution on all of the model coefficients. The remainder of the sample is used to implement a Metropolis-Hastings Markov chain Monte Carlo (MCMC) algorithm to draw values from the posterior distribution. Given that MCMC and ML estimation algorithms can require a large number of iterations before convergence, our improved algorithms to incorporate the market shares can substantially reduce the computational time relative to the original BLP contraction mapping.

Finally the paper carries out some simulations to investigate how the tightness of the prior distribution for the market shares impacts the posterior distribution.  These simulations show that unless this prior on the market share is very precise there is very little learning about the posterior distribution of the alternative specific constants.  These results suggest that studies using the BLP approach where the external market share data do not closely match the population generating the choice data are likely misleading.

The Bayesian methods introduced here also apply to estimating discrete choice models from choice-based samples with known market shares (see for example Bierliere, Bolduc, and McFadden????).  Although we only formally analyzed the case where the underlying discrete choice model is conditional logit, the same methods could be applied to more flexible models.



Bierlaire, M., D. Bolduc and D. McFadden. The estimation of generalized extreme value models from choice-based samples, Transportation Research Part B 42 (2008) 381–394.

Brownstone, D. and P. Li. A Model for Broad Choice Data, Working Paper, Department of Economics, University of California, Irvine, February, 2014.

Wong, T. J. Econometric Models in Transportation, Ph.D. thesis, Department of Economics, University of California, Irvine, June, 2015.

Conference registration is required in order to view papers.