Interim Department Head University of Cincinnati Cincinnati, OH, United States
Abstract: As the nation’s roadways continue to deteriorate, the presence of work zones on US highways is anticipated to increase, highlighting the crucial need for the safety of work zone workers and road users. This study combined descriptive statistics and SHapley feature important analysis to examine work zone crash data in Ohio from 2019 to 2023 across three datasets. The goal was to identify the factors contributing to crash severity with and without the presence of work zone workers. Various machine learning models were employed to predict crash outcomes across three datasets, including K-Nearest Neighbors (KNN), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Light Gradient boosting machines (LightGBM), etc. Due to the imbalanced nature of crash data, SMOTE and ADASYN balancing techniques were applied to the datasets. Light gradient boosting machines emerged as the best-performing model, achieving a performance accuracy of more than 75.71% on the original data, 85.17% on the SMOTE data, and 85.31% on the ADASYN data. Initially, before fitting the models feature importance technique using XGBoost was used for selecting relevant features. SHapley values were then utilized to interpret the contributing factors to crash injury severity. The analysis indicated that shoulder and lap belt use consistently reduced crash severity across all datasets. Multi-vehicle crashes, sideswipe, angle, and rear-end crashes were among the variables that had an increasing influence on crash severity across the three datasets. The partial dependence plot revealed that the mobile work zone type significantly influenced worker-present crash severity, while out-of-state drivers were a significant factor in non-worker-present crashes. The findings of this study are intended to guide transportation practitioners and policymakers in enhancing work zone safety.
Learning Objectives:
Attendees can expect to learn the following from this session:
Conduct a descriptive analysis of the factors contributing to crashes with and without the presence of work zone workers.
Employ machine learning techniques and utilize Shapley values to identify the contributing factors to crashes with and without the presence of work zone workers.
Identify whether there exists a difference between the contributing factors of crashes with and without the presence of work zone workers.