Last modified: 28 March 2017

#### Abstract

In stated choice experiments, the experimental design is the process of manipulating attribute levels and their allocation into choice tasks (Hensher, Rose, & Greene, 2015). By means of an experimental design, the number of choice tasks to be presented is reduced from the full factorial, thus reducing also the effort for the respondent (Bliemer & Rose, 2005; Rose, Bliemer, Hensher, & Collins, 2008). The theory of experimental design was born in the framework of liner models with continuous dependent variables, in which orthogonality is an important feature to respect for a good design (Louviere & Woodworth, 1983). However, the most recent literature on experimental designs for stated choice experiments indicates that orthogonal experimental designs are not efficient from a statistical point of view (Rose & Bliemer, 2014). Statistical efficiency, in the experimental design context, relates to the precision of the estimated parameters. Therefore, a statistically efficient design is capable of reducing the standard errors of the parameters (Johnson et al., 2013) and thus minimize sample size requirements. The idea behind efficient designs is that, when some *a priori* information about parameters is available, it is possible to calculate the asymptotic variance-covariance matrix (AVC) for the model and to identify the expected standard errors for the parameters (Huber & Zwerina, 1996). Information about priors is required in order to calculate utilities derived from each alternative and the following choice probabilities. By generating a large number of different designs, it is possible to identify the most efficient one by means of some efficiency measures (Bliemer & Rose, 2010a, 2010b; Bliemer, Rose, & Hess, 2008; Rose & Bliemer, 2009), such as *d*-error and *a*-error (the former being usually preferred to the latter, which do not include covariances in the estimation) and more recently the *s*-error statistic (Rose & Bliemer, 2012).

Obtaining reliable information on parameter distribution, necessary to calculate utilities and the AVC matrix, is an important challenge for the analyst, since it affects the precision of the design. Although it has been shown that efficient designs, created with mis-specified priors, are still generally more robust than orthogonal designs (Rose & Bliemer, 2013), obtaining good priors is advantageous. In fact, with good priors the experiment is more efficient, meaning that equivalently precise estimates are attainable with smaller samples (Bliemer & Rose, 2011), thus reducing survey costs both in terms of money and time. In most cases, the priors are informed on the basis of a pilot study, carried out before the main survey, in which the questionnaire is administrated and tested on a relatively small sample. In accordance with the law of large numbers, as the size of the pilot sample increases, the closer priors should be to the true values of the parameters. However, given a fixed sample size, with overly large pilots there is a potentially high opportunity cost, since it leaves fewer respondents for the main survey. This poses a dilemma for practitioners on how best to allocate their total sample between the pilot and main surveys. Identifying the right balance will deliver the greatest sampling efficiency and, thus, value-for-money, which makes it of utmost importance for practitioners designing stated choice experiments.

With this in mind, the aim of this paper is to assess the quality of estimated priors for efficient designs using pilots of different sizes. We assume a fixed sample size of 1,000 respondents, which can be allocated to either the pilot or main survey. The simulated choice experiment is representative of many studies and consists of four attributes, three of which are non-cost with three levels and one of which is a cost attribute with five levels. For the pilot, the experimental design is generated assuming zero priors. An efficient design (based on minimization of the *d*-error) is then generated using the coefficients recovered from the multinomial logit model generated from the pilot data as the priors. Based on this experimental design, choices are simulated for respondents allocated to the main survey (i.e., the remainder of the sample). For each size of the pilot, this procedure is repeated many times. We also do this for different sizes of the pilot study (50, 100, 150…. 950 respondents).

The results reveal that the typical practice of generating priors based on a relatively small pilot sample (of perhaps smaller than 5 percent of the total sample size) is inefficient. While this ensures a large main survey, which we show does lead to the approximate true parameters, we find that the parameters are associated with larger standard errors and lower statistical significance. Importantly, we illustrate that as the size of the pilot increases, the priors become increasingly precise, which we find leads to a higher degree of accuracy of estimated parameters in the main survey (despite the smaller number of respondents) and when both the pilot and main waves are pooled. This said, while we find very large pilots, such as the ones including 900-950 respondents, provide very accurate priors, it leaves an insufficient number of remaining respondents for reliable final estimates. Crucially, our findings show the point where the overall sampling efficiency provided by larger pilots diminishes. In this simulated experiment, it is observed that the optimum size for a pilot, from a sampling efficiency perspective, is in the range of 40-50% of the total sample. In brief, our results signal, what appears to be, a ‘Goldilocks’ effect: not too few and not too many.