International Choice Modelling Conference, International Choice Modelling Conference 2017

Font Size: 
Model combination for capturing the inconsistency in the aggregate prediction
shiva Habibi, MArcus Sundberg

Last modified: 28 March 2017


Discrete choice models are often estimated at the level of individual decision-makers units such as individuals or households but used to predict an aggregate quantity like the market share of clean cars or an average response to a policy change. There are so many sources that contribute to uncertainty in prediction. Some of them arise from data, but also from the uncertainties in models specifications. To show, any source of uncertainty, the models must provide the probability distribution of forecast, rather than simply make point predictions (Sims; 1986). In discrete choice models, the consistent way of aggregating over individuals is sample enumeration (Train; 2009, p. 31). In the sample enumeration method, the choice probabilities are aggregated over all individuals. However, this method will give us point prediction. To get a distribution of prediction, one has to simulate the choice model. We show with a case study that the simulated variance of an aggregate prediction is very small. Therefore, the distribution of the prediction is very sharp centered around the mean. We argue that any model estimated on a large amount of individuals that are assumed to be independent will have a problem in replicating the variance of the aggregate prediction. The reason is the sum of many independent decisions would cause the prediction errors to be offset.

We propose to tackle the aggregate prediction problem by employing and developing model combination methods to combine aggregate and disaggregate models. Aggregate models are estimated at the level of aggregate prediction. By combining aggregate model with logit models, the forecast error between observed data and model aggregate prediction will be captured. The model combination has been used in different fields and, in general, has shown improvement in prediction performance (for reviews see e.g. Clemen; 1989; Hoeting et al.; 1999). However, to the best of our knowledge, model combination has not been used for tackling the aggregate prediction problem. Combining logit models with aggregate models, in addition to solving the problem of aggregation over individuals, can benefit from possibility of more precise specification of aggregate equations due to the fewer numbers of interactions available at aggregate level (Grunfeld and Griliches; 1960) and the aggregate data could be measure more accurately (Aigner and Goldfeld ;1974).

We use the model combination approach both at the aggregate and disaggregate level using mixture models and latent variable model approaches. We address the problem of combining models that are estimated at different aggregation levels and propose to use the aggregate likelihood to combine them at the aggregate level. The aggregate likelihood is the likelihood of aggregate data given the aggregate point prediction of the model. We suggest using the aggregate likelihood in the model combination and selection when the purpose of modeling is aggregate prediction.

The application of interest is to predict the monthly share of clean cars in the Swedish car fleet. In this case study, the prediction question of interest is the monthly share of the clean cars in the Swedish market in the short-run future. We have access to two different data sources. The first data source is the car register that contains all passenger cars in the Swedish feet from 2008-2012 and some characteristics of each car. The second data source provides very detailed information about all cars in the Swedish market. For the results presented in this study, we merge these two data sources to impute alternatives and some main missing characteristics in the registry data such as price. We estimate a nested logit model on the pooled data 2008-2012 and present the preliminary prediction results to show some prediction problems. This model is a benchmark model. We combine it with a regression tree to capture individual heterogeneity in a much more complicated way than could be done by prior knowledge. Individual heterogeneity plays an important role in consistent aggregation.  Moreover ,we combine the benchmark model with a time-series model to capture dynamics of the market share of clean cars at the aggregate level. Models are combined through a latent variable model (Ben-Akiva et al.; 2002; Walker and Ben-Akiva; 2002) and a mixture model approach (Raftery et al. (2005)). It should be noted that manner, that is, we take already estimated models 

and combine them i.e. we do not estimate model combination weights and model parameters simultaneously. We investigate a situation that large scale models are already available, and we want to use them for prediction; for example when we want to combine national travel/demand models or car ownership/car type models with other models to improve their performances.

The monthly prediction results, as well as the confidence band, are illustrated in figure 1. As can be seen, the confidence band is very small showing the prediction results are very certain while they are wrong. There is also no variation over months within a year since there is no individual specific or time specific variable in the specification of the utility function and the fact that choice sets only change once a year. Individual specific variables did not include in the model since they did not improve the likelihood function significantly.

The prediction results by now show that all the combined models perform better than any single model. However, the models obtained by the combination at the aggregate level perform better than latent models that are obtained by disaggregate combination.

Finally, it should be added that we are aware that estimating both models and weights on the same dataset might increase the risk of over-fitting. However, we did not have enough macro level data points to form an independent dataset to investigate out-of-sample prediction performance of combined models.



Figure 1: Monthly prediction of NMNL over period of 2008-2012.

Table 1: Comparing predictive performance of different models based on root mean squared error (RMSE)



Time series (auto-regressive)


latent variable model (combination at disaggregate level)


Mixture of models of auto-regressive and latent variable (combination at aggregate level)









Aigner, D. J. and Goldfeld, S. M. (1974). Estimation and prediction from aggregate data when aggregates are measured more accurately than their components, Econometrica: Journal of the Econometric Society pp. 113-134.

Ben-Akiva, M., McFadden, D., Train, K., Walker, J., Bhat, C., Bierlaire, M., Bolduc, D.,

Borsch-Supan, A., Brownstone, D., Bunch, D. S. et al. (2002). Hybrid choice models: progress and challenges, Marketing Letters 13(3): 163-175.

Clemen, R. T. (1989). Combining forecasts: A review and annotated bibliography, International Journal of Forecasting 5(4): 559-583.

Grunfeld, Y. and Griliches, Z. (1960). Is aggregation necessarily bad?, The Review of Economics and Statistics pp. 1-13.

Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial, Statistical Science 14(4): pp. 382-401.

Raftery, A. E., Gneiting, T., Balabdaoui, F. and Polakowski, M. (2005). Using Bayesian Model Averaging to Calibrate Forecast Ensembles, Monthly Weather Re- view 133(5): 1155-1174.

Sims, C. A. (1986). Are forecasting models usable for policy analysis?, Quarterly Review (Win): 2-16.

Train, K. E. (2009). Discrete choice methods with simulation, Cambridge university press.

Walker, J. and Ben-Akiva, M. (2002). Generalized random utility model, Mathematical Social Sciences 43(3): 303-343.

Conference registration is required in order to view papers.