COMPARISON OF THE PERFORMANCE OF GRADIENT BOOSTING, LOGISTIC REGRESSION, AND LINEAR SUPPORT VECTOR CLASSIFIER ALGORITHMS IN CLASSIFYING TRAVEL MODES BASED ON GNSS DATA
Keywords: Gradient boosting, Linear support vector classifier, Logistic regression, Streaming GNSS data, Transportation mode
Abstract. Public transportation system capacity must be compatible with the frequency of daily trips. Smart mobile phones can collect positioning data at different times, which can detect transportation modes people use for their daily commutes. This information helps the government predict how many vehicles are needed to satisfy public transportation system demands. This article investigates the performance of three different machine learning models, including Gradient Boosting (GB), Logistic Regression (LR), and linear Support Vector Classifier (SVC) in classifying the trip types. Thirty-nine features, including statistical parameters of velocity, acceleration, and jerk, and also parameters representing the time of each trip, are given to the models as input. To increase the performance of the models, with the help of thresholding, points corresponding to noise are detected and removed from the dataset. Moreover, to fill the possible gaps and smooth the trajectories, spline interpolation and Savitzky-Golay filter are also investigated in feature calculation. The results show that the linear models are incapable of distinguishing between different classes well and they are over-fitted to classed with more samples. Hence, the GB by 0.93 recall, precision, and F-score was the best model in determining the vehicle used compared to LR and linear SVC.