International Choice Modelling Conference, International Choice Modelling Conference 2015

Font Size: 
Enabling the use of cross-nested logit for very large problems through indirect inference
Stephane Hess, Andrew Daly, Mark Bradley, Maren Outwater

Last modified: 11 May 2015


The field of choice modelling has seen many exciting developments in the last couple of decades, primarily in the form of models allowing for complex correlation structures between alternatives and flexible variations in sensitivities across decision makers. While the added computational burden of these models is a nuisance in academic work, it has severely limited the appeal and use of such models in large scale applied work, where, especially in a transport context, the number of alternatives and attributes tends to be much larger. As an example, many models of joint mode and destination choice have tens of thousands of alternatives, and dozens of parameters. This is very different from most academic studies, and makes the estimation of even Multinomial and Nested Logit time consuming.

With the benefits of advanced models potentially being even more important in the complex decision processes studied in applied work, this presentation looks at the use of a technique recently introduced into transport research for such cases. The method, known as indirect inference, relies on understanding the relationship between the parameters of simple and complex models for a given dataset and thus being able to ‘predict’ the parameters of advanced model structures without actually estimating them on the data.

There are four steps in the use of indirect inference, which can most simply be described as follows:

1.       We start by simulating choices with the true model, which we cannot estimate, for a large number (say K) of different possible values for the parameters of that model.

2.       We then use a model which we can estimate, approximating the true model and known as the auxiliary model, to estimate a simple model on each of the K datasets created in simulation.

3.       We create a ‘binding function’ which explains the relationship between the estimates of the auxiliary model and the parameters for the true model used in simulation of the choices. This binding function thus explains the impact of misspecification by using the auxiliary model.

4.       We finally estimate the auxiliary model on the real data and apply the inverse of the binding function to the estimates of the auxiliary model to obtain inferred values for the parameters of the true model on the real data.

We apply this technique in the context of estimating a cross-nested logit model for the joint choice of destination and mode for long distance travel in the United States. Initial work on advanced destination and mode choice models compared cross-nested logit models with multinomial logit and nested logit (with mode above destination and destination above mode nests).  This work showed clear advantages of cross-nested logit over standard techniques, with gains in model fit and different elasticity results, as well as correlation along both dimensions of choice. The first step to demonstrate the potential of the indirect inference method was in a limited context using data from California with 4 modes and 58 destinations (232 alternatives), where we are able to estimate the model on the full set of alternatives and then show that we can correct for sampling bias by using indirect inference when using estimation on just a subset of alternatives. We finally use a US-wide dataset with 4 modes and 73,057 destinations, thus giving us a total of 292,228 alternatives, going way beyond anything that would be feasible for classical estimation of the cross-nested logit model with current software capabilities.

The work with the California data showed that the technique was able to estimate models of this structure without excessive sampling bias. The work with US-wide data further showed that the procedure was feasible and gave reasonable results with computer resources much smaller than would be required to estimate the full cross-nested model. This opens the possibility for more extensive application of the comparatively new technique.


This work was funded by the Federal Highway Administration as part of the Exploratory Advanced Research program to develop a Long Distance Passenger Travel Demand Modeling Framework for the United States. 


Conference registration is required in order to view papers.