INVESTIGATING THE POTENTIAL OF ACTIVITY TRACKING APP DATA TO ESTIMATE CYCLE FLOWS IN URBAN AREAS
Keywords: Transport, Cycling, Mobility, Regression, Green Travel, GPS
Abstract. Traffic congestion and its associated environmental effects pose a significant problem for large cities. Consequently, promoting and investing in green travel modes such as cycling is high on the agenda for many transport authorities. In order to target investment in cycling infrastructure and improve the experience of cyclists on the road, it is important to know where they are. Unfortunately, investment in intelligent transportation systems over the years has mainly focussed on monitoring vehicular traffic, and comparatively little is known about where cyclists are on a day to day basis. In London, for example, there are a limited number of automatic cycle counters installed on the network, which provide only part of the picture. These are supplemented by surveys that are carried out infrequently. Activity tracking apps on smart phones and GPS devices such as Strava have become very popular over recent years. Their intended use is to track physical activity and monitor training. However, many people routinely use such apps to record their daily commutes by bicycle. At the aggregate level, these data provide a potentially rich source of information about the movement and behaviour of cyclists. Before such data can be relied upon, however, it is necessary to examine their representativeness and understand their potential biases. In this study, the flows obtained from Strava Metro (SM) are compared with those obtained during the 2013 London Cycle Census (LCC). A set of linear regression models are constructed to predict LCC flows using SM flows along with a number of dummy variables including road type, hour of day, day of week and presence/absence of cycle lane. Cross-validation is used to test the fitted models on unseen LCC sites. SM flows are found to be a statistically significant (p<0.0001) predictor of total flows as measured by the LCC and the models yield R squared statistics of ~0.7 before considering spatio-temporal variation. The initial results indicate that data collected using fitness tracking apps such as Strava are a promising data source for traffic managers. Future work will incorporate the spatio-temporal structure in the data to better account for the spatial and temporal variation in the ratio of SM flows to LCC flows.