International Choice Modelling Conference, International Choice Modelling Conference 2015

Font Size: 
Empirical comparison of the Construction of Confidence Intervals for willingness-to-pay
Esther Chiew, Ricardo A. Daziano

Last modified: 18 May 2015


Willingness-to-pay (WTP) measures are important when reporting results of discrete choice experiments, and in applications to policy analysis, especially aiming at welfare-improving scenarios. For a linear-in-attributes discrete choice model, WTP is given by the ratio of the coefficients of the attribute in question and price. These being random variables, WTP is itself random, and hence, reporting standard errors or confidence intervals (CIs) is important, though often overlooked in applied work. For example, different policy options are usually evaluated using mean WTP values, potentially resulting in inaccurate conclusions. Two WTP values, while numerically different, might have overlapping CIs, indicating that they are not statistically significantly different (cf. Park et al. 1991). Likewise, in cost-benefit analyses, overlapping CIs of WTP and marginal cost indicate that some people would support a proposed policy.


There are multiple reasons for interest in building such CIs. Parameters estimated by maximum likelihood are asymptotically distributed multivariate normal (Ben-Akiva and Lerman, 1985). Hence WTP, being a parameter ratio, is governed by an unknown probability distribution a priori. When the parameters are independently distributed standard normal, WTP follows a Cauchy distribution (Arnord and Brockett, 1992), which has no moments. Thus, standard methods of interval estimation cannot be used. Additionally, WTP can be discontinuous, such as when the price parameter approaches zero. This creates errors not only when estimating model parameters, but also when building CIs (Dufour, 1997).


Recently, interest in WTP interval estimation has increased, yet no consensus exists on the method of construction. Different fields of study often choose one method of building CIs for WTP as the norm without explaining this choice. Various authors have either proposed a method or compared different methods, with inconsistent results. For example, Hole (2007) reports that for the logit model, the Delta, Fieller, Krinsky-Robb and bootstrap methods all yield similar results, except in the case of neglected unobserved heterogeneity. In contrast, Bolduc et al. (2010) find that the Fieller method outperforms the Delta and bootstrap methods, especially when WTP is weakly identified. Finally, Daly et al. (2012) defend the Delta method, stating that derived standard errors have closed forms and are not approximate calculations as often claimed.


This paper aims to consolidate current work by comparing regularly used methods of building CIs. We analyze the appropriate conditions for using each method, including when WTP is weakly identified. We add to the literature by analyzing the Bayesian post-processing method, which, though involving at most as much work as other methods, is seldom used. Additionally, while working in WTP-space (Train and Weeks, 2005) has been exclusively used to study heterogeneity distributions, we contribute to the literature by analyzing the CIs directly derived from such a specification. Finally, we also study in greater detail how these methods can be used to build WTP CIs under assumptions for unobserved heterogeneity (cf. Bliemer and Rose, 2012). Using random parameter models adds difficulty to the problem, as WTP is no longer a single ratio of parameter estimates, and hence the methods used must be modified appropriately. Bliemer and Rose (2012) apply the Delta method, and propose that using the median of the WTP standard errors (as opposed to the mean) will give accurate results. In addition, the Krinsky-Robb and Bayesian post-processing methods can be modified to build CIs for this model.


Different methods of constructing CIs for fixed parameter models, namely the Delta, Fieller, Krinsky-Robb, Bayesian post-processing, and the use of WTP-space, are compared in this work using a simulation study that extends the work of Bolduc et al. (2010). A binary probit model with three attributes is used, with two fixed attributes and the third ("price" attribute) taking values ranging from 2 to 0.0001 to account for weak identification. WTP CIs of one of the fixed attributes are built over 1000 trials for five different sample sizes. Separately, we also study the methods of building CIs for random parameter models using a simulation that estimates a random parameter logit model. The methods compared are the Delta (using both the median and mean of standard errors), Krinsky-Robb, Bayesian post-processing, and the use of WTP-space methods.


Results for the fixed parameter model show a general improvement (by width and coverage) in the CIs as sample size increases, with all methods performing equally when sample sizes are large. The Delta method does not appear to give accurate results, and as this seems consistent in the literature, we continue to study why Daly et al. (2012) defend the Delta method. Additionally, Bayesian post-processing gives results comparable to other methods, and should be considered as a viable alternative in constructing CIs. Working in WTP-space appears to generate narrower CIs, but at the expense of poor coverage. For example, with 100 individuals and price parameter 0.1, the average width of the WTP-space CIs is about 100 (compared to 150 and 300 for the Fieller and Krinsky-Robb methods respectively), but the coverage is only 30% (compared to about 95% for both Fieller and Krinsky-Robb methods). This seems counterintuitive to the conclusions of Train and Weeks (2005), and we continue analyzing the reason for this.


Preliminary results from the random parameter model show a similar improvement as sample size increases. Using the median of standard errors in the Delta method does result in narrower CIs, and Bayesian and Krinsky-Robb methods again produce similar CIs. Finally, the time taken to build the CIs is comparable across the different methods, countering the suggestion that the Krinsky-Robb method requires additional work (Bliemer and Rose, 2012).


We conclude the analysis with an empirical application that uses the different methods to analyze airline passengers' value of time when purchasing tickets, as well as their WTP for extra leg space. We find that when a fixed parameter model is assumed, the different methods give similar CIs as the dataset is large with no identification issues. We are currently analyzing the CIs built under the random parameter model. These CIs are important for airlines to understand the lower and upper bounds of their consumers' WTP and price their tickets accordingly.

Conference registration is required in order to view papers.