Using geo-targeted social media data to detect outdoor air pollution
Keywords: Social media, Outdoor air pollution, Spatiotemporal relationship, Machine learning
Abstract. Outdoor air pollution has become a more and more serious issue over recent years (He, 2014). Urban air quality is measured at air monitoring stations. Building air monitoring stations requires land, incurs costs and entails skilled technicians to maintain a station. Many countries do not have any monitoring stations and even lack any means to monitor air quality. Recent years, the social media could be used to monitor air quality dynamically (Wang, 2015; Mei, 2014). However, no studies have investigated the inter-correlations between real-space and cyberspace by examining variation in micro-blogging behaviors relative to changes in daily air quality. Thus, existing methods of monitoring AQI using micro-blogging data shows a high degree of error between real AQI and air quality as inferred from social media messages.
In this paper, we introduce a new geo-targeted social media analytic method to (1) investigate the dynamic relationship between air pollution-related posts on Sina Weibo and daily AQI values; (2) apply Gradient Tree Boosting, a machine learning method, to monitor the dynamics of AQI using filtered social media messages. Our results expose the spatiotemporal relationships between social media messages and real-world environmental changes as well suggesting new ways to monitor air pollution using social media.