Classification with imbalanced data in customer churn prediction based on Improved Random Forest
Abstract
In telecommunication industry, customer churn is of great interest since this problem can affects the company’s profit. However, the imbalanced data in customer churn prediction caused difficulties in developing a good prediction model for solving this problem. In this work, we proposed a random forest-based approach for classification with imbalanced data in telecom customer churn prediction. This approach utilizes the cost-sensitive weighted random forest (called CSWRF), which was proposed for credit card fraud detection prediction. We compare the performance of CSWRF against one data resampling method: random forest combined with data sampling SMOTE. Our experiments on two benchmark datasets reveal that for churn prediction in telecom which is an imbalanced data problem, the classification performance of CSWRF method is better than that of SMOTE combined with random forest..