Last modified: 28 March 2017

#### Abstract

Hybrid Choice Models (HCMs) are becoming increasingly popular in the literature due to their ability to incorporate attitudes and perceptions into individuals’ decision rules. Psychological constructs can be measured by so called indicator variables which are usually based on respondents’ reported agreement with particular statements measured using a Likert scale. There is consensus in the literature that incorporation of such indicator variables directly into the choice model can lead to at least two problems. One of them is that such variables are functions of real psychological constructs rather than their direct measures and therefore they are likely associated with measurement error. Treating these variables as error-free can therefore lead to biased estimates. Secondly, indicator variables are often expected to be endogenous, for example due to the existence of other psychological constructs, not controlled for in the model, that can influence both an attitude and a choice. Endogeneity is clearly important because it makes Maximum Likelihood estimator inconsistent.

Multiple published papers claim that HCMs provide some improvement over direct incorporation of indicators into choice model, with regard to measurement error and endogeneity. Although it is quite clear that by using latent variables framework – directly assuming that measurement error exists and formulating proper statistical model – HCMs account for the measurement bias, the case of endogeneity is not so clear. Even though indicator variables do not enter the choice model directly, in no application to date correlations between latent variables and choices are accounted for (e.g., through introducing correlation between error terms in structural equations and a choice model). The same holds for correlations between measurement equations and choices. Instead, the models available in the literature assume that error terms in the three parts of the model (discrete choices, structural and measurement component) are independent. As a result, we argue that it is unlikely that HCMs mitigate endogeneity bias. To prove our point, we conducted a Monte Carlo simulation which illustrates the existence of the bias if correlations between structural or measurement component and the choice component are not accounted for. In this simulation we assume that latent variables enter choice model as interactions with attributes. Although there exist other specifications, for example latent class multinomial logit model in which latent variables explain class probabilities, we wanted to investigate the most popular case. We estimate 7 models for every artificially generated dataset to analyze different specification decisions researchers can make and investigate their consequences.

Our contribution is as follows. Firstly, we differentiate between the two possible types of endogeneity of indicator variables. We distinguish between latent variable (LV) endogeneity and measurement equation (ME) endogeneity. The former arises when the psychological construct itself is correlated with error terms in the choice model, whether the latter corresponds to the case when the measurement error is correlated with error terms in the choice model. We believe that introducing a clear definition of endogeneity, distinguishing its two types, can improve discussion quality. Understanding in what situation each type arises, as well as determining how likely these types are to occur, is left for future, empirical studies. Secondly, we use Monte Carlo simulation to show that although HCMs can theoretically be tailored to address this issue, it is not done by the commonly used specifications (i.e., not explicitly accounting for the possibility of correlations between components of the model). Specifically, we show that endogeneity bias prevails even when latent variables framework is used. In our case, ME-endogeneity can lead to higher overall bias than LV-endogeneity. We also show how to correct for this problem – by explicitly accounting for the necessary correlations the models allow us to properly recover true parameters. We propose two ways to introduce it in the model – either by directly modelling correlation between latent factors and random taste heterogeneity or using additional latent variables to capture residual correlation (the former solution would only work in the case of LV-endogeneity). In addition, the ME-endogeneity could also be accounted for by directly modelling correlation between measurement errors and error terms in the choice model (although such specification could be infeasible for Maximum Likelihood estimator, it could be implemented using Bayesian methods). As all these adjustments complicate the overall structure of HCM, more complex models require careful consideration of model identification. As general conditions for HCM identification are not known, this may constitute a problem.

Lastly, our simulation shed some light on the measurement bias issue. It occurred that, as expected, it can cause significant deviation in coefficient value even when endogeneity bias is not present. Interestingly, in some cases measurement and endogeneity bias can change coefficient value in the opposite directions and therefore to some extent cancel each other out. This result may explain some empirical studies, which report that there is no much difference between using latent variables framework and incorporating indicator variables directly into the choice model.