International Choice Modelling Conference, International Choice Modelling Conference 2009

Using finite mixture models to accommodate outliers in discrete choice modelling

Danny Campbell, Stephane Hess, Riccardo Scarpa, John M Rose

Last modified:  7 May 2009

Abstract


Unlike other areas of econometric analysis, the sensitivity of outliers is rarely assessed or even explored in discrete choice analysis. In the context of discrete choice, outliers are observations that exert undue influence on the estimates.  If significant changes are observed in the parameter estimates after eliminating such observations, then one might conclude that the particular observations are influential and are actual outliers. The presence of actual outliers is particularly important in the case of discrete choice experiments, as they are likely to have profound implications for choice prediction and welfare estimation and may cause potential computational problems.

Irrespective of whether or not outliers are genuine, their presence means that discrete choice models that do not rely on the homogeneity of preferences may be more appropriate.  Whilst this heterogeneity can be accommodated by treating the preferences as random and estimating the parameters of their distribution, the presence of outliers may exaggerate the true extent of heterogeneity---especially in cases where the random parameters are specified with unbounded distributions, such as the commonly used normal and lognormal, which has no upper bound.  For this reason, nonparametric methods may provide a better means of representing the unobserved preference heterogeneity.

Using a stated choice dataset designed to determine the existence value of a number of rare and endangered fish species in Ireland, this paper uses a finite mixture modelling approach for approximating the mixing distribution, with particular emphasis given to explaining the extreme lower and upper elements of the distribution (i.e., the outliers).  This is achieved using a nonparametric estimation procedure, where the mass points of the distribution are estimated for predefined densities. In estimation three mass points are specified for each attribute---the first and third of which are associated with lower and upper outlying parameter estimates respectively, whilst the middle latent class represents the parameter estimates not including the lower and upper outliers. As a test of sensitivity, a series of models with different predefined densities are estimated.  Results from this analysis reveal a considerable proportion of large and implausible point estimates, particularly for the price attribute. We subsequently show the consequences of this on model performance and welfare estimation.

Whilst deciding on the legitimacy of outliers is a difficult judgement and is ultimately an empirical issue to be evaluated case-by-case, the fact that outliers are found to be behaviourally inconsistent with a priori expectations and to seriously distort the interpretation of the remaining data---in terms of choice prediction and welfare estimation---suggests some caution for analysts engaged in discrete choice modelling.  Indeed, the findings indicate the importance of testing for the presence of outliers and that they should be assessed as a recommended course of action in practice.


Full Text: PDF