Student University of Hawaii at Manoa HONOLULU, HI, United States
Abstract: This paper explores the integration of outputs of emerging Machine Learning (ML) models in mode choice modeling for transportation system analysis. It investigates the aggregate choice predictions and the individual choice predictions when the outputs of seven ML models - Random Forest, Support Vector Machine, Extreme Gradient Boosting (XGBoost), AdaBoost, Decision Trees, Naive Bayes, and Neural Networks - are integrated and compared with a Multinomial Logit Model (MNL) as a base model. The MNL model and seven ML models are built using the National Household Travel Survey (NHTS)-2017 data from Florida, and the individual mode choice behavior and aggregate mode choice behavior of the Hawaii population are estimated using NHTS-Hawaii-2017 data. We then combine the choice probabilities obtained from individual ML models in three ways: (CM1) taking an average of output probabilities for each observation, (CM2) taking the weighted average of the choice probabilities using corresponding model’s accuracy as the weight, and (CM3) using majority voting method. The performance of these models is evaluated through metrics including f1 score, model accuracy, Comparative Success Rate (CSR), and Modified Success Rate (MSR). The findings reveal that the MNL model consistently outperforms individual ML models and their combinations in predicting both the individual choice and aggregated choice behaviors. XGBoost and Random Forest models show the highest accuracy, in predicting individual and aggregate behaviors, among the ML models. Combination of ML models by averaging their probability outputs (with or without weights) have the comparable predictive accuracy with the best performing ML models. Notably, combination of probability output improves the f1 score of minority travel modes (least used modes). However, ML models and their combinations are found to outperform an MNL model in terms of comparative success rate and modified success rate.
Learning Objectives:
Attendees can expect to learn the following from this session:
Upon completion, participant will be able to describe the role of Machine Learning models and econometric models in mode choice modeling within transportation analysis.
Upon completion, participant will be able to conduct and analyze the process of combining Machine Learning model outputs using different methods such as averaging, weighted averaging and majority voting.
Upon completion, participant will be able to analyze the impact of model combinations on improving prediction accuracy.