Classification with imbalanced data in customer churn prediction based on Improved Random Forest

Authors

  • Anh Tuan Duong
  • Minh Hòa Đinh

Abstract

In telecommunication industry, customer churn is of great interest since this problem can affects the company’s profit. However, the imbalanced data in customer churn prediction caused difficulties in developing a good prediction model for solving this problem. In this work, we proposed a random forest-based approach for classification with imbalanced data in telecom customer churn prediction. This approach utilizes the cost-sensitive weighted random forest (called CSWRF), which was proposed for credit card fraud detection prediction. We compare the performance of CSWRF against one data resampling method: random forest combined with data sampling SMOTE. Our experiments on two benchmark datasets reveal that for churn prediction in telecom which is an imbalanced data problem, the classification performance of CSWRF method is better than that of SMOTE combined with random forest..

Published

30-03-2023

How to Cite

Duong, T. A., & Đinh, M. H. (2023). Classification with imbalanced data in customer churn prediction based on Improved Random Forest. HUFLIT Journal of Science, 7(3), 58. Retrieved from https://hjs.huflit.edu.vn/index.php/hjs/article/view/143

Issue

Section

Articles

Categories