International Choice Modelling Conference, International Choice Modelling Conference 2017

Font Size: 
Latent Class model using discriminative restricted Boltzmann Machine
Melvin Wong, Bilal Farooq, Guillaume-Alexandre Bilodeau

Last modified: 28 March 2017



We have seen numerous applications of semi-supervised learning algorithms on big data analysis to perform classification and prediction tasks, however there are hardly any studies on discrete choice models utilizing these machine learning concepts. Semi-supervised learning with generative models is an interesting strategy due to the proliferation of big data and on line services providing large volumes of data. They provide a way to learn the underlying structure of the data with little guidance from a priori classes or choice labels. In discrete choice literature, we have also seen similar trends using data obtained from rich, streaming data in travel demand analysis (Toole et al., 2015; Vij and Shankari, 2015), but such cases are mainly for offline analysis. Therefore, we suggest that new research directions in choice modeling have to place more emphasis on the semi-supervised learning approach for online demand analysis, to be able to replicate similar successful machine learning applications seen in big data analysis.

In this paper, we propose a novel modeling methodology in discrete choice analysis that utilize generative modeling and efficient training algorithms from machine learning to describe a latent class choice model. To our knowledge, this has not been done before in travel choice modeling literature. Our study hopes to provide new clues for machine learning algorithms in choice modeling for greater value towards the behavior interpretation and understanding the complex phenomena in the choice process. This is even more essential with the recently emerging intelligent transportation market (Vij and Walker, 2016; Anselmetti, 2016).


Semi-supervised learning has been extremely successful in areas such as speech and word generation and recommender systems due to breakthroughs in advanced generative models and training methods. In particular, we have seen semi-supervised learning methods in machine learning competitions that use large sparsely labeled data volumes, for example, the Netflix Prize Competition or in synthetic music generation (Salakhutdinov et al. 2007; Boulanger-Lewandowski et al. 2012). Much of the success have been contributed by the development of the semi-supervised learning methods in restricted Boltzmann Machine (RBM) and efficient training algorithms such as stochastic gradient decent (SGD) or contrastive divergence (CD) (Hinton, 2002). Discriminative RBM is an extension of the generative modeling based RBM with classification that can be used for probabilistic decision making or predicting future observations based on past data (Larochelle and Bengio, 2008; Larochelle et al. 2012). A generative model has the potential to learn the individual variations entirely from the data without supervision. It can uncover the latent structure better than traditional discrete choice models because it is a full probabilistic model of all variables (i.e. it estimates the joint distribution of the input variables and output choices), whereas a conditional probabilistic model such as a mixed logit would normally assume an arbitrary underlying distribution (normal, log-normal, triangle etc.) for the input random variables, which is not particularly accurate in modeling realistic representations. Discriminative models used in discrete choice analysis does not attempt to uncover the underlying probability distributions, however they are much faster to compute than generative models.


The graphical model is shown in Figure 1 is a single layer discriminative RBM which models the relation between the observed “visible” inputs, output choices and “hidden” classes. A predetermined number of hidden units will represent the different latent classes and are parameterized by a weight factor and constant. The RBM is characterized by an energy based function (Larochelle and Bengio, 2008) which is another similar term used for the choice utility function. The probability of each choice is the multinomial logit of the energy function. The contrastive divergence learning algorithm is a fast and efficient sampling based optimization that relies on the gradient approximation of the maximum log-likelihood. We start close to the data distribution and update the parameters after a small number of steps which can result in good parameter estimates (Hinton, 2002). This is the predominant training methodology for RBM models in literature.

Case Study

In this study we present a discriminative RBM choice model and compare its performance with an operational level dynamic latent class mixed logit next link direction choice model that we had developed in our previous research work on the same dataset. The dataset consists of GPS location, time, availability of facilities and hourly weather data. The choice is defined by the next link direction over a series of links forming a trajectory shown in Figure 2. Then we generate new synthetic representations of the input data from the model and forecast the next step choices from these samples. Finally, we compare the accuracy of simulated choices against our observed data. Our previous study has shown that our discrete choice analysis approach can yield a maximum of 85% choice prediction accuracy. Semi-supervised learning methods in literature often achieve higher than 90% in most cases, therefore we hypothesize that with similar approach, we should obtain much better results.


Toole, J.L., Colak, S., Sturt, B., Alexander, L.P., Evsukoff, A. and González, M.C., 2015. The path most traveled: Travel demand estimation using big data resources. Transportation Research Part C: Emerging Technologies, 58, pp.162-177.

Vij, A. and Shankari, K., 2015. When is big data big enough? Implications of using GPS-based surveys for travel demand analysis. Transportation Research Part C: Emerging Technologies, 56, pp.446-462.

Vij, A. and Walker, J.L., 2016. How, when and why integrated choice and latent variable models are latently useful. Transportation Research Part B: Methodological, 90, pp.192-217.

Anselmetti, R., 2016. The Autonomous Vehicle: End of the Road, or the Beginning of A New Era?: Concept and Challenges of a Disruptive Innovation within the Automotive Industry. Masters Thesis, KTH, School of Industrial Engineering and Management, Entrepreneurship and innovation.

Salakhutdinov, R., Mnih, A. and Hinton, G., 2007, June. Restricted Boltzmann machines for collaborative filtering. In Proceedings of the 24th international conference on Machine learning (pp. 791-798). ACM.

Boulanger-Lewandowski, N., Bengio, Y. and Vincent, P., 2012. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. arXiv preprint arXiv:1206.6392.

Hinton, G.E., 2002. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8), pp.1771-1800.

Larochelle, H. and Bengio, Y., 2008. Classification using discriminative restricted Boltzmann machines. In Proceedings of the 25th international conference on Machine learning (pp. 536-543). ACM.

Larochelle, H., Mandel, M., Pascanu, R. and Bengio, Y., 2012. Learning algorithms for the classification restricted boltzmann machine. Journal of Machine Learning Research, 13(Mar), pp.643-669.

Conference registration is required in order to view papers.