Last modified: 28 March 2017

#### Abstract

One of the pivotal questions in the discrete choice context is to determine which parameters affect choice making. Researchers are interested in determining the taste parameters that influence choice making and the model specification for the choice probabilities (e.g. nested logit, mixed logit, etc.) that better explain observed behaviour. These questions are answered through statistical inference on estimated parameter values and hypothesis testing. For instance, taste parameters are tested to determine whether they differ from zero (or unity in the case of nesting parameters) in a statistically significant manner.

In the discrete choice literature, t-tests and likelihood ratio tests are by far the most widely used methods for hypothesis testing (Ben-Akiva and Lerman, 1985; Train, 2009). Publicly available and widely used estimation software packages (Biogeme and Train’s software) report t-statistics for each parameter in their outputs and provide additional capabilities for using likelihood ratio tests. Both of these tests, however, rely on specific distributional assumptions on parameter estimates and the work reported in this paper suggests that these distributional assumptions may not hold. Indeed, unlike in the case of linear regression, there is no guarantee that the sampling distribution of parameters in discrete choice models will conform to a specific (e.g., normal) distribution.

In this paper, we show using Monte Carlo simulations that while the assumption of normally distributed sampling distribution holds for taste parameters that form part of linear utility specifications; it does not hold for the structural parameters in nested logit, and by extension other GEV model forms. These analyses can also be potentially extended to show that the normal distribution assumption is problematic for standard deviation parameters in mixed-logit and the additional structural parameters involved in the cross-nested logit models.

Similar issues arise in the context of generalized linear mixed models when testing for random effects. The challenges associated with inference are well-acknowledged when true parameter values are located on the boundary of their allowed domain under the null hypothesis (Lee and Braun, 2012; Fitzmaurice et. al. 2007). Importantly, the asymptotic null distribution of the widely used statistical tests for these parameters, such as the Wald, likelihood ratio, and score tests, do not follow a chi-squared distribution. Similarly, t-statistics do not have the typical Student’s t-distribution. Structural parameters in nested logit models and standard deviation parameters for mixed logit models are also examples of such problematic parameters.

Understanding such sources of potential difficulty and ensuring the validity of the assumptions underlying the widely used statistical tests in discrete choice models is crucial. If assumptions do not hold, corresponding hypothesis tests will not be valid. This might result in errors in decisions regarding the best model specification and drawing conclusions regarding the underlying choice behaviour. In this paper, we focus on nested logit models and show the potential problems that arise from the use of incorrect distributional assumption on the structural parameters. First, we show that the estimated nesting parameters do not follow a normal distribution. This suggests that using the t-test for testing hypothesis about the nesting parameters is not always appropriate. Using the Wald, LR, and score tests are also problematic. Secondly, we propose a permutation test as a practical alternative for testing hypotheses about structural parameters. Permutation tests are attractive as they are non-parametric and make minimal assumptions, i.e. exchangeability. There are however some restrictions to how they can be used in nested logit models which relates to the number of nests and alternatives. We present a discussion of these issues and illustrate the use of our proposed permutation test using simulated data.

The simulation study is set up using a nested logit model with two nests and ten alternatives. The simulation scenarios included different sample sizes N= 100, 200, 300, 1500, and varying magnitudes for the nesting parameter mu = 1 (i.e., null case), 2 and 5. For each scenario we carried out estimations for 2000 replications. We report and compare results from using different statistical tests including our proposed permutation test, t-test, LR test, Wald test, and score test. For each of these tests we report Type I and Type II error rates. Error rates are computed as the proportion of simulations in which p-values for the structural parameters were less than 0.05. We also compute precision, recall, and F-score metrics to provide a measure of statistical power for each of these tests. The results from our simulation study favours the proposed permutation test compared to the ones that are widely used in practice today. Type I error rates of the permutation test are much closer to the 0.05 level. Type II error rates comparisons showed commonly used tests in practice have very high error rates especially for smaller sample sizes. These results have important implications for practice as the conclusions drawn from inappropriate testing procedures can be inaccurate.