Understanding User Equilibrium States of Road Networks using Big Trajectory Data

: User equilibrium (UE) has long been regarded as the cornerstone of transport planning studies. Despite its fundamental importance, our understanding of the actual UE state of road networks has remained surprisingly incomplete. Using big datasets of taxi trajectories, this study investigates the UE states of road networks in Wuhan. Effective indicators, namely relative gaps, are introduced to quantify how actual traffic states deviate from theoretical UE states. Advanced machine learning techniques, including XGBoost and SHAP values, are employed to analyze nonlinear relationships between network disequilibrium states and seven influencing factors extracted from trajectory data. The results reveal significant gaps between actual traffic states and the theoretical UE states at various times of the day during both weekdays and weekends. The XGBoost analysis shows that differences in travel distances, travel speeds, and signalized intersection numbers among alternative routes are the primary causes of road network disequilibrium. The results of this study could have several important methodological and policy implications for using the UE models in transport applications.


INTRODUCTION
User Equilibrium (UE) theory (Wardrop, 1952) has long been regarded as the cornerstone of transport planning studies.It assumes that all travelers, given perfect traffic knowledge, select their routes for minimal travel times.As a result, travel times of all used routes are equal and minimum, and travel times of all unused routes are greater than or equal to the travel times of used routes.Under UE state, travelers can not reduce the travel time by unilaterally changing the route between the origin and destination (OD) pair.UE is widely used in various transport planning applications, such as transport network designs, disaster evacuating planning, etc. (Sheffi, 1985;Yang and Bell, 1998;Chen et al., 2012;Fu et al., 2022).
Although UE model is elegant, its two behavioral assumptions have been recognized too strong to some extent (Garcia-Sierra et al., 2015;Havlícková and Zámecník, 2020;Giannotti et al., 2011).It is generally difficult for travelers to acquire the completed traffic conditions of the road network due to their limited spatial knowledge and reasoning abilities.Numerous empirical studies using survey data have shown that several other criteria could be considered in travelers' route choice decisions, such as shorter travel distance, higher reliability and more major roads use.
Technological advancements have made it possible to collect abundant vehicle trajectories, particular taxi trajectories (Yildirimoglu and Kahraman, 2018;Calabrese et al., 2013;Rayle et al., 2016;Chen et al., 2023).Taxi trajectories have provided an excellent opportunity to examine real-world route choice behaviors in large-scale transport networks.Many empirical studies have utilized taxi trajectory data to analyze individual-level route choice behaviors and indicated the necessity to revisit the basics of how routes are chosen.For example, Ma et al. (2020) estimated network disequilibrium levels using big trajectory data in Chengdu and Pittsburgh and introduced a traffic management strategy for optimal routing (Ma et al. 2020).Manley et al., (2015) utilized a large dataset of nearly 700,000 taxi routes in London to observe the route choice behaviors.They found that travelers prefer to choose anchorbased routes (i.e., major roads, roads with well-known places, and etc.) rather than the shortest distance routes.In addition, a study based on the GPS traces of 20,000 taxis collected in Shenzhen implied that travelers do not substantially anticipate the existing traffic conditions when making their route choice decisions (Yildirimoglu and Kahraman, 2018).Through analysis of 496 participants recorded 5,535 choices over 41 OD pairs in Lyon (González Ramírez et al., 2021), the results confirmed that travelers evaluate relative rather than absolute differences in travel times of different routes.However, there has been little attention in the literature on using taxi trajectory data to investigate the collective patterns of individual route choice behaviors, i.e., the UE state of road networks.
The research objectives of this study are twofold: to quantify how much the actual traffic states of road networks deviate from the theoretical UE state; and to examine what factors influence the disequilibrium states of road networks.To fulfill these research objectives, we collect big datasets of one-month taxi trajectories in Wuhan, a Chinese mega-city.Using the collected dataset, travel times of all used routes by taxis between numerous OD pairs are exacted.Effective indicators, namely relative gaps, are introduced to quantify the degrees of disequilibrium states at two different levels, i.e., the OD level and the network level.The evolutions of disequilibrium states at different times of the day for both weekdays and weekends are examined.Seven key factors affecting disequilibrium states of road networks are identified.Advanced machine learning techniques, i.e., XGBoost and Shapley value, are employed to analyze nonlinear relationships between disequilibrium states and seven influencing factors.The results will deepen our understandings of UE states in real road networks and provide methodological and policy implications in transport applications.

STUDY AREA AND DATA COLLECTION
Wuhan, the largest city in central China, is selected as the study areas.Because there are numerous lakes and rivers (e.g., the Yangtze River) pass through Wuhan, the road network structure is complex with many bridges and tunnels.By the end of 2009, the vehicle parc in Wuhan was approximately 0.9 million.
The taxi trajectory dataset from Wuhan offers a comprehensive insight into the city's dynamism and complexity of the urban mobility patterns.The characteristics of the dataset in Wuhan are summarized in Table 1.

Relative Gap Indicators
Relative gap was widely used as an indicator of whether the traffic assignment model converges to an equilibrium solution (Rose et al. 1988, Chen et al. 2011, Patil et al. 2021).In this study, the relative gap is introduced to quantify the UE state of a road network using trips data extracted from taxi trajectories.
Let  and  be the origin and destination nodes respective.
Between each OD pair, there has a set of alternative routes  = … ,  , … used by taxi drivers.Let  and  be average travel time and taxi flows of route  during the th time interval.Both  and  are directly extracted from taxi trajectory datasets.Let  be the least travel time between the OD pair during the th time interval.It can be calculated in the road network by using the shortest path algorithm.The detailed procedure for calculating  ,  and  are described in Section 3.2.2.Then, the relative gap at the OD level, denoted by  , describes the relative difference between used routes and the least travel time route during the th time interval and it can be expressed as: where  is total taxi flows between the OD pair during the th time interval.It can be calculated by: Given the set of OD pairs, the relative gap for the whole network, denoted by  , describes the traffic state deviates from the theoretical UE state at snapshot  and it can be expressed as The value of  and  both range in [0, +∞).The value equal to 0 indicates the traffic state under the perfect UE state: taxi drivers between all OD pairs choose the least travel time routes and no taxi driver could improve their travel times by switching the routes.A larger value implies that the more observed traffic state (i.e., route choice behaviors of all taxi drivers) deviates from the theoretical UE state.

Step 1: Extracting Representative Routes Used by Taxis between OD Pairs
This process involves two stages.Firstly, taxi trajectory data is cleaned: abnormal GPS jumps and erroneous points are removed, and a map matching algorithm (Chen et al. 2014) is applied to correct positioning errors and reconstruct trajectories.The result is a continuous trajectory on the road network.
The second stage involves identifying commonly used taxi routes between origin-destination (OD) pairs.Trajectories are divided into OD trips using passenger boarding information.Then, representative routes are identified by clustering similar trips between the same OD pair using dynamic time warping and DBSCAN algorithm (Lima et al., 2016), resulting in valid OD pairs, their routes, and corresponding trips, as illustrated in Table 2.The first phase is to determine route travel time (i.e.,  ) and route taxi flows (i.e.,  ) for each route  ∈  between the OD pair during the hourly time interval.To determine these  and  , we group taxi trips during the same OD pair during the same time interval according to their departure times.After this phase, we can determine route travel time  and route taxi flows  for all routes ∀ ∈  between each OD pair during the hourly time interval.It is worth noting that we set  = 0 for an unused route  ∈  and it is not used in the relative gap calculation.
The second phase is to calculate the least travel time (i.e.,  ) between every OD pair during the time interval.We firstly utilize map-matched trajectories of all taxis during the hourly time interval to estimate traffic conditions of the road network.The method of estimating hourly link travel times using taxi trajectories can be referred to Shi et al. (2017).We then employ the shortest path algorithm (Li et al., 2015) to calculate  for each OD pair.After performed this phase, we can calculate the UE state of a road network, i.e.,  , during the time interval using Eqs.(1-3).Consequently, we quantify the UE states of Wuhan networks for 540 hourly time intervals during one month, i.e., 18 hours/day * 30 days.After extracting these factors, we examine their influences on  at 540 hourly snapshots as where X =  ,  ̅ ,  , ̅ ,  ,  ,  is the set of seven factors.Because these factors could be correlated and have nonlinear relationships with  , we cannot simply use the traditional multivariable linear regression method but utilize a powerful machine learning technique, namely eXtreme Gradient Boosting (XGBoost) (Chen and Guestrin, 2016;Ma et al., 2017;Li, 2022;Ji et al., 2022).Apart from a high regression accuracy, the XGBoost exhibits strong interpretative power by integrating with the SHapley Additive exPlanations (SHAP) (Shapley, 1953).
To calculate the SHAP relative importance  of each factor and the interaction term as: Where  is the SHAP value of each factor/interaction term.Thus, the value of the relative importance of each factor/interaction term is between 0 and 1, and the sum of the relative importance of each factor/interaction term is 1.The result represents the percentage of marginal contribution of each factor/interaction term to the relative gap.

Disequilibrium States of Road Networks
This section reports the network equilibrium states of Wuhan.We first examined the detailed equilibrium states at the OD level, i.e.,  , at a selected peak hour, i.e., 18:00-19:00 on the first Monday out of one month data.As shown in Figure 1, the red line links represent extreme disequilibrium OD pairs with  > 0.5, the blue line connects moderate disequilibrium OD pairs with  ∈ [0.05, 0.5], and the green line indicates the equilibrium OD pairs with  < 0.05.It is easy to find that OD pairs at extreme and moderate disequilibrium states are predominant, while only a few OD pairs are at equilibrium states.We then investigated the equilibrium states at the network level (i.e.,  ) during different times of the day.Figure 4 reports the temporal patterns of road network for weekdays and weekends by calculating the average  as well as its 95% confidence intervals at the same time of the day.As shown in Figure 3(a), the  95% confidence intervals of the road network in weekdays ranged from 0.287 to 0.378 and fluctuated at different times of the day.The peak of  value occurred at the evening peak hour (17:00-18:00).The  value significantly raised at morning and afternoon.Between these two timeslots, the  was relatively stable.After 18:00,  value kept decreasing until the mid-night.As shown in Figure 3(b), the  value of Wuhan road network in weekends had a distinct pattern compared to that in weekdays.It ranged at a larger range of 95% confidence interval from 0.233 to 0.413.The  value gradually increased from the morning until the noon, and then fluctuated at a high level of disequilibrium state until the mid-night.This result highlights that Wuhan road network kept moderate disequilibrium states at various times of the day for both weekdays and weekends.

Factors Influencing Disequilibrium States of Road Networks
Using XGBoost and SHAP techniques, we then examined how seven factors influencing network disequilibrium states.Table 3 presents the relative importance of seven influencing factors and their interaction terms.This relative importance ( ) represents the percentage of a factor's marginal contribution to  .The higher  value, the larger contribution to the prediction of  .
Overall, the XGBoost performed well for the Wuhan dataset with R 2 = 0.559.Among all factors and interaction terms in Wuhan city, the CV of travel distances  made the largest contribution with  = 40.7%.It was followed by the CV of travel speeds  with  = 16.7% , the CV of signalized intersection numbers  with  = 11.7% , average signalized intersection numbers  with  = 6.8%, average travel speeds ̅ with  = 5.9% , average alternative route number  with  = 5.5%, and the average travel distances  ̅ with  = 3.3%.All seven influencing factors predicted 90.6% of  , while other interaction terms predicted only the rest 9.4%. ) and two interaction terms (DistCV*SpeedCV and SpeedMean*SpeedCV) are plotted, since they accounted for about 93% of total relative importance.In each SHAP dependence plot, the x-axis represents the corresponding feature value, and the y-axis gives the SHAP value indicating how much this feature impacts the prediction of  .Each dot represents a feature value of a  value.For interaction terms, the color of each dot represents the value of the other feature.
Based on the figure, following observations can be found: (a) When CV of travel distances (  ) is below 0.15, its contribution to  is negative, but above this threshold it contributes positively.
(b) The CV of travel speeds (( ) behaves similarly; under 0.15 it reduces  , while above this, it increases it.
(c) The CV of signalized intersection numbers ( ) negatively impacts  until 0.24, after which it positively affects it.
(d) The average signalized intersection number (  ) decreases  under 21 but increases it beyond.
(e) For average travel speeds (̅ , under 25 km/h they reduce GAP_net, while over this speed they increase it, sharply until 27 km/h and then fluctuates at high level. (f) The average alternative route numbers (  ) contribute positively to  until 2.18, after which they contribute negatively.
(h) The interaction of  and  shows a complex interplay of similar travel distances and differences in travel speeds affecting the disequilibrium states.
(i) The influence of  and ̅ together on  depends on whether the average travel speed is less or more than 25 km/h, with each combination contributing differently to the disequilibrium states.

CONCLUSION AND DISCUSSION
This study investigated user equilibrium states of road networks in Wuhan using taxi trajectories.The user equilibrium states were explicitly evaluated by relative gap indicators at both network and OD levels, i.e.,  and  .The advanced regression techniques, i.e., XGboost and SHAP, were employed to investigate the nonlinear relationships between seven factors and  values.Results found that road networks kept moderate and extreme disequilibrium states with  > 0.2 at various times of the day for weekdays and weekends.Results also showed that  obeyed exponential distributions during most time periods.Regression analysis found that the nonlinear relationships between seven influencing factors and  values can be well established by using the XGBoost method.CV factors ( ,  , and  ) were top three contributors to predict  values.
The results of network disequilibrium states in Wuhan provided several new insights on the route choice behavior field.Firstly, taxi trajectory data mining allowed the quantification of UE states on road networks.The results from Wuhan suggested that road networks tend to maintain moderate or extreme imbalance states at various times of the day, with different patterns appearing during weekdays and weekends.Therefore, this study provided strong empirical evidence to support the previous assertion that Wardrop's user equilibrium state is difficult to reach in real road networks at various times of the day during both weekdays and weekends (Daganzo and Sheffi 1977, Sheffi and Powell 1982, Lam et al. 2008).
Secondly, this work measures user equilibrium states at an intricate OD level, i.e.,  .The findings show that the majority of OD pairs were slightly or moderately imbalanced, which resonates with prior studies (Papinski and Scott 2011;Zhu and Levinson, 2015;Yildirimoglu Kahraman, 2018).This research extends those studies by exploring the distribution characteristics of  values.
Thirdly, we used XGBoost and SHAP to study non-linear relationships between seven influencing factors and  values.XGBoost's performance in Wuhan demonstrates the impact of these factors and their interaction terms in predicting  .Interestingly, CV factors ( ,  , and  ) were found to be more influential than mean factors (̅ ,  ̅ and  ) in predicting  (Train and Wilson, 2008;Manley et al. 2015;Yang et al., 2017).
Based on the findings, several methodological and policy implications can be derived.Transport planners and policymakers should be aware that Wardrop's user equilibrium state is hard to achieve in reality.The use of UE-based models can lead to significant bias in policy evaluation, especially during peak hours.
Despite the valuable insights, this study, being one of the first to investigate actual user imbalance states of road networks using taxi trajectory mining techniques, has its limitations.It used trajectories from over 10,000 taxis, which only form a small fraction of total daily travels.Also, taxi drivers may exhibit different driving behavior than other drivers.More data from private cars (Xiao et al. 2020) and ride-hailing services (Tirachini 2020) could enhance the evaluation accuracy.
Several future research directions can be proposed.Extending this study to multi-mode transport networks, investigating factors contributing to disequilibrium levels in different cities, or exploring equilibrium state at the OD pair level using the massive data available for individual OD pairs are all valuable next steps.Examining the consistency between theoretical equilibrium state and actual state from vehicle trajectory data can provide a more comprehensive understanding of traffic equilibria.
3.2.3Step 3: Analysis of Factors Affecting User Equilibrium States of Road Networks This step is to utilize the advanced machine learning technique to examine factors affecting UE states of road networks by using extracted  for 540 hourly snapshots.First, seven factors in terms of OD pairs are extracted from the collected taxi trajectories at each snapshot .Afterwards, according to taxi flow  between each OD, weighted to obtain the average value of each factor within the entire network range at each snapshot .The results are (1) average alternative route numbers (denoted by  ); (2) average travel distance (denoted by  ̅ ); (3) coefficient of variation (CV) of travel distances (denoted by  ); (4) average travel speeds (denoted by ̅ ); (5) CV of travel speeds (denoted by  ); (6) average signalized intersection numbers (denoted by  ); and (7) CV of signalized intersection numbers (denoted by  ).

Figure 1 .
Figure 1.Network user equilibrium state in Wuhan To quantify the  distribution for all OD pairs, we fitted the  probability density distribution into five pre-given types of distributions, including exponential, gamma, beta, lognormal and Pareto distributions.The goodness-of-fit of each distribution type was evaluated using the K-S test.As shown in Figure 2(b), the majority of time periods in Wuhan (82.5%) road networks obeyed the exponential distributions.As shown in Figure 2(a), during a typical peak hour, 12.3% of OD pairs reached equilibrium (i.e.,  < 0.05); 28.6% of OD pairs were in slightly disequilibrium ( ∈ [0.05, 0.2]); 32.3% were in moderate disequilibrium ( ∈ [0.2, 0.5]); and 26.8% of OD pairs were in extreme disequilibrium (  > 0.5 ).The overall relative gap for Wuhan network was  = 0.34, indicating a moderate disequilibrium state.

Figure 2 .
Figure 2. Distribution of  in Wuhan

Figure 3 .
Figure 3.The relative gap pattern at different times of the day in Wuhan network.

Figure 4 .
Figure 4. SHAP dependence plots of top nine features in Wuhan network.

Table 1 .
The characteristics of taxi trajectory dataset in Wuhan.

Table 2 .
The characteristics of processed taxi trajectory data.

Table 3 .
Relative importance of influencing factors and interaction terms to : average alternative route numbers;  ̅ : average travel distances;  : CV of travel distances; ̅ : average travel speeds;  : CV of travel speeds;  : average signalized intersection numbers;  : CV of signalized intersection numbers.