International Choice Modelling Conference, International Choice Modelling Conference 2015

Font Size: 
Imputing Socioeconomic Attributes for Movement Data by Analysing Patterns of Visited Places and Google Places Database: Bridging between Big Data and Behavioural Analysis
Jacek Pawlak, Alireza Zolfaghari, John Polak

Last modified: 18 May 2015

Abstract


Development and proliferation of portable ICT devices such as smartphones or tablet computers, as well as more novel, wearable technologies such as Internet-capable watches or glasses means that eventually (if not already so) most people would carry at least one device of such kind. However, such devices not only provide their statutory services and capabilities to the users, but can also offer valuable insights about the users themselves. This property results from such devices’ increasing capabilities of deducing location in real-time, and either storing or transmitting such information for further analysis. Such location data can be inferred from a variety of sources such as satellite positioning (GPS), cellular network, or WiFi-based triangulation. Given that methods of tracking of this kind practically eliminate burden placed on individuals in terms of recording their movements – a feature that has troubled more traditional mobility surveys, it is of no surprise that data derived thereof have been seen as a tempting alternative to conventional data collection efforts (Wolf et al., 2001).

However, unlike conventional data used in choice modelling and other travel behaviour research, these new types of data, often included under the ‘Big Data’ umbrella term, are semantically poor. In other words, whilst they offer immense numbers of detailed observations (data points), they often lack meaningful information on locations (e.g., work, home, etc.), people (e.g., socioeconomic attributes, choices, attitudes, etc.), or activities (e.g. travelling, shopping, eating). This property limits their applicability in the context of travel behaviour, or other choice modelling, to visualisation of mobility patterns without necessarily explaining or quantifying the underlying decisions-making processes.

Nevertheless, a number of researchers attempted to enrich such datasets using a variety of methods. Whilst follow-up questionnaires or diaries, asking the respondents to record or clarify on some aspects of the investigated behaviour are the most obvious way of enriching such data, they are applicable only to cases where the identities of individuals are known, and the respondents can be approached. However, in majority cases such data would remain anonymised to conform to privacy regulations, and hence other means of enrichment are required. A number of methodologies have been developed to identify and classify locations of individuals or travel modes, from analysis of the patterns of visited locations, e.g. Bohte and Maat (2009) or Andrienko et al. (2011). This has been an important step towards bridging between the novel Big Data sources and behavioural models. In this respect imputation of information on individual attributes, such as socioeconomic characteristics, has remained largely unexplored.

In addressing this important gap between the increasingly available Big Data sources and behavioural analysis, we propose a new methodology for inferring characteristics of individuals from their movement tracks. We propose to augment the movement data of the kind that is obtainable from the signatures of portable ICT devices, with information on characteristics of particular places or venue coincident with the locations, and derived from Google's database of places using Google Places API. The database currently holds detailed information on more than 50 million places including their geographic coordinates, types, price levels, and crowdsourced user ratings. The proposed imputation methodology involves three stages: (i) matching movement tracks with places using a density based clustering algorithm, (ii) calibrating a classification model on a sub-sample of data containing both socioeconomic characteristics and record of visited places and their characteristics derived from Google places database, (iii) using such classification model to impute the socioeconomic attributes of the remaining sample of individuals, for whom only records of visited places are known.    

In order to validate the methodology, we conduct a Monte Carlo experiment that attempts to quantify the degree of robustness of such imputation mechanisms. In the experiment, we first assume that people with certain socioeconomic characteristics are more likely to visit particular places in line with empirical findings from studies elsewhere, such as higher propensity of more affluent people to visit more expensive restaurants (cf. Frank, 1967; Pan and Zinkhan, 2006), and simulate their movement tracks. Such simulated tracks are recorded in formats that would be obtainable from portable ICT devices, including accuracies characteristic to various positioning techniques. We then assume that the complete information (i.e., socioeconomic attributes and movement tracks) is only available for a sub-sample of the simulated data. In reality such data could be obtained from follow-up questionnaire surveys. Using this complete sub sample we calibrate a classification model using a supervised learning approach. Following that we attempt to retrieve the socioeconomic classes from the simulated tracks of the remaining individuals using the calibrated classification model.

In the subsequent analysis, we explore the effects of different types of uncertainties that are inherent to such enrichment procedure. We discuss how these can be estimated and possibly accounted for in the estimation procedures in order to avoid biased conclusions. Finally, we discuss contexts in which such methodology could be applied as well as its implications for modelling practice and privacy considerations. We believe that the proposed approach can unlock further potentials of Big Data for behavioural analysis in various disciplines, beyond the superficial, graphically-rich visualisations of movement data.

References:

Andrienko, G., Andrienko, N., Hurter, C., Rinzivillo, S. & Wrobel, S. (2011). From movement tracks through events to places: Extracting and characterizing significant places from mobility data. Proceedings of the 2011 IEEE Conference on Visual Analytics Science and Technology, 161-170.  

Bohte, W., & Maat, K. (2009). Deriving and validating trip purposes and travel modes for multi-day GPS-based travel surveys: A large-scale application in the Netherlands. Transportation Research Part C: Emerging Technologies, 17(3), 285-297.

Frank, R. E. (1967). Correlates of buying behavior for grocery products. The Journal of Marketing, 48-53.

Pan, Y., & Zinkhan, G. M. (2006). Determinants of retail patronage: a meta-analytical perspective. Journal of Retailing, 82(3), 229-243.

 

Wolf, J., Guensler, R., & Bachman, W. (2001). Elimination of the travel diary: Experiment to derive trip purpose from global positioning system travel data. Transportation Research Record: Journal of the Transportation Research Board, 1768(1), 125-134.

 


Conference registration is required in order to view papers.