International Choice Modelling Conference, International Choice Modelling Conference 2015

Font Size: 
The Effect of Choice Set Misspecification on Welfare Measures in Random Utility Models
Vic Adamowicz

Last modified: 11 May 2015


Random utility (RU) models have been widely employed in environmental valuation. But choice set formation (CSF) models in the RU framework are rarely applied in this literature, even though previous research has shown that ignoring CSF (when it exists) leads to biased parameter estimates and welfare measures. This paper conducts Monte Carlo (MC) experiments to investigate whether ignoring or misspecifying CSF leads to estimation bias in welfare measures and what is the magnitude of such bias.


We systematically compare welfare measures estimated by both CSF and non-CSF models calibrated on data generated by a two-stage process (CSF followed by evaluation). The non-CSF models (ignoring CSF) include the Multinomial Logit (MNL) Model and the Random Parameter Logit (RPL) model. CSF models include the Independent Availability (IAL) Model (Swait and Ben-Akiva 1987a, b), the Constrained Multinomial Logit (CMNL) Model (Martinez et al. 2009) and two “hybrid” models that use information from the CMNL to incorporate CSF into the MNL and a form of the IAL. We also explore the sensitivity of welfare measures to different forms of CSF and we assess whether CSF is reflected as unobserved heterogeneity when the CSF process is omitted in estimation.


Our simulation model can be considered a representation of a destination choice model or recreation demand / travel cost model similar to forms often examined using revealed preference data. The simulation setting is 500 people, 4 choices scenarios per person (4 trips or site choices), 5 alternatives per scenario and two attributes (price and quality).  Price and quality do not change over the 4 choice scenarios (as in a typical recreation demand specification) and they are drawn from a uniform distribution. Choice set formation is characterized as a probabilistic function of price (travel cost). In one setting all individuals have the same CSF process (a price “cutoff” plus a random component in the CSF process) while in another setting the CSF process differs across groups of respondents in terms of their sensitivity to price as a determinant of inclusion in the choice set. We conduct 200 replications of this set of data generations and estimations. We examine welfare measures associated with a price increase for the first alternative. We choose to simulate a price increase as a policy change as it affects both the CSF sub-process and the utilities.


We find that when cutoff-based CSF exists in the data generation process, typical RUM models that ignore CSF underestimate the welfare measures by about half. RPL models generate biased welfare measures, but do not represent CSF as unobserved heterogeneity under the assumed dgp (data generation process). Models that approximate CSF produce unbiased welfare measures in certain specifications, but are poor approximations in other cases. In the case of homogeneous CSF processes over the population, both the IAL and the CMNL generate unbiased measures of welfare.  In the case of heterogeneous CSF (or cutoff processes) the IAL and CMNL continue to generate unbiased measures of welfare, although the CMNL performance declines as the threshold of the CSF process (cutoff) increases.  


The contribution of this paper is two-fold.  First, it provides clear empirical evidence that ignoring or misspecifying CSF (a) biases welfare measures and (b) that the magnitude of the bias is considerable. Second, this paper provides practical guidance about the usage of the CSF models. Specifically, when a cutoff-based CSF exists in the data generation process, the IAL is the best option. But when the IAL model is intractable or the estimation cost is too high (especially when the number of alternatives in a choice set is large), the CMNL model can, in certain cases, serve as a good approximation (to provide unbiased welfare measures). We find that the CMNL model, while often appearing to be significantly biased in terms of parameters (relative to the true process) provides a good approximation to welfare measures in many cases. Combined models that use information on choice set probabilities from the CMNL to reduce the estimation cost of the IAL also perform well.


This paper illustrates the importance of applying CSF in environmental valuation and provides practical guidance about the usage of CSF models. By focusing on welfare measures we examine the value that is of policy interest in many cases. As welfare measures in these cases involve a non-linear combination of parameters, the relationship between the welfare measure and the incorporation of CSF into the econometric model is difficult to assess analytically, thus our simulations shed light on this issue. The paper also illustrates the importance of accurately capturing the structure of the CSF process.


Keywords: choice model, choice set formation, random utility, welfare measures, random parameters logit, independent availability model, multinomial logit model.




Martinez, Francisco, Felipe Aguila, and Ricardo Hurtubia (2009), The Constrained Multinomial Logit Model: A Semi-Compensatory Choice Model. Transportation Research Part B, Vol. 43, No. 3, 365–377


Swait, Joffre and Moshe Ben-Akiva (1987a), “Incorporating Random Constraints in Discrete Choice Models of Choice Set Generation,” Transportation Research, 21B, 91-102


Swait, Joffre and Moshe Ben-Akiva (1987b), “Empirical Test of a Constrained Choice Discrete Model: Mode Choice in Sao Paulo, Brazil,” Transportation Research, 21B, 103-115

Conference registration is required in order to view papers.