International Choice Modelling Conference, International Choice Modelling Conference 2015

Font Size: 
The use of Firth penalized-likelihood estimates for individual-level discrete choice modeling and market segmentation
Roselinde Kessels, Bradley Jones, Peter Goos

Last modified: 18 May 2015


Discrete choice models relate respondents’ choices of one of two or more alternatives or profiles to the attributes of the respondents and the attributes of the alternatives. Data for discrete choice models are either collected via discrete choice experiments (DCEs), where respondents state their choices in hypothetical situations, or via observational studies, where respondents have actually made or revealed real-life choices.

Individual-level choice data often exhibit separation which occurs if the responses can be perfectly classified by a linear combination of the attributes of the alternatives. A commonly used procedure to fit discrete choice models is maximum likelihood (ML) estimation. The resulting ML estimator possesses a number of asymptotic properties, including efficiency and unbiasedness. However, using finite datasets such as individual-level choice data, one can no longer rely on the asymptotic properties. One consequence is that the probability of separation is always strictly positive. In the event of separation, the ML estimator does not exist. Therefore, the expectation of the ML estimator does not exist. That is, the integral defining the expected value does not converge. This is because the probability of data separation is never nonzero.

In practical applications, when data separation occurs, computer implementations of ML estimation for individuals often show the likelihood estimates converging while at least one parameter gets large without bound. The actual parameter estimate reported is then a function of the convergence criterion for the likelihood rather than having any practical meaning. Moreover, separation occurs so frequently for individual-level data as to make such an approach infeasible. A lesser problem with ML estimation is that, if the ML estimator exists, it tends to over-estimate the utility of strongly preferred attribute levels. Similarly, undesirable attribute levels are modeled as being even less desirable than their true utilities would indicate. This bias can have practical implications in the decisions that practitioners make.

In this paper, we show how to overcome these two problems using the penalized-likelihood method of Firth, which we applied to the multinomial logit (MNL) model. A major advantage of the method is that it allows fitting a MNL model to individual-level data when the number of choice sets evaluated by each respondent permits, and, subsequently, exploring the heterogeneity in the respondents’ preferences and identifying market segments. Unlike panel mixed logit models, latent class models or hierarchical Bayes approaches, Firth’s approach does not require imposing an a priori preference heterogeneity distribution for each of the model parameters. This is important since it is not at all clear what an appropriate a priori preference heterogeneity distribution would be when markets are segmented.  

Firth’s approach is an example of a “bottom-up” approach that obtains individual-level parameters for the empirical distribution of sample preferences. The opposite approach that makes use of prior distributional assumptions is called a “top-down” approach. In theory, if one specifies correct preference distributions, and the number of choices per person is sufficiently large, top-down and bottom-up approaches should give the same results. If, however, assumptions about preference distributions are incorrect, the inferences from top-down models will be biased and incorrect. Furthermore, bottom-up approaches are more researcher-friendly since they are computationally simpler than the current top-down practices.

Using a simulation study, we observed that Firth’s method provides MNL model parameter estimates that are useful in the case of data separation (they have relatively little bias and variance) as long as each individual evaluates a sufficient number of choice sets. We can therefore model the preferences of individuals directly, and construct empirical distributions of the individuals’ preferences allowing us to detect preference heterogeneity.

We demonstrate the usefulness of Firth’s method for individual-level preference estimation using a real-life study that investigates preferences for various forms of compensation of employees. This study was done using a DCE involving four three-level attributes: salary increase, bonus, extra vacation and flexible working time. A total of 448 respondents evaluated 12 choice sets of three compensation schemes or profiles which are combinations of attribute levels, one level for each attribute. For each choice set, respondents had to indicate the profile they preferred. We conclude that the mean individual-level estimates lie close to the estimates from analysing the aggregate data, but there is no reason to expect this result in general because of Jensen’s inequality. Also, the individual-level estimates make sense overall.  

In another simulation study, based on the real-life study in employee compensation, we group individuals into two market segments, “the work-life balance” segment and the “money” segment, where the former segment attaches importance to all four attributes and the latter segment only cares about salary increase and bonus. We determine how big the difference in market segments has to be before it is possible to detect them using differences in individual-level estimates. We illustrate that in the case of well-defined segments, the cluster classification is nearly perfect.

Conference registration is required in order to view papers.