IJIKM - Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector

Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector

Tam-Thanh Luong, Vi-Gia Luong, Anh Hoang Tuan Tran, Tuan Manh Nguyen

Interdisciplinary Journal of Information, Knowledge, and Management • Volume 20 • 2025 • pp. 009

https://doi.org/10.28945/5469

Aim/PurposePrevious studies have primarily focused on comparing predictive models without considering the impact of data preprocessing on model performance. Therefore, this study sets two main objectives. The first objective is to investigate the effect of resampling methods for handling imbalanced data on model effectiveness. The second objective is to compare and evaluate machine learning methods to identify the optimal model for each resampling technique, thereby determining the model that achieves the highest performance.

BackgroundIn the highly competitive banking industry, attrition of customers is a major challenge for banks trying to improve customer retention. While many studies have focused on building and evaluating models to predict customer churn, they often miss addressing the problem of imbalanced data, which can significantly affect the model’s accuracy.

MethodologyIn this study, following exploratory data analysis (EDA), we apply various techniques to address data imbalance and use a range of machine learning models, including Naïve Bayes, Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, Gradient Boosting, XGBoost, and LightGBM, to predict customer churn using the dataset.

ContributionThe contribution of this research lies in its comprehensive evaluation and comparison of various techniques for handling imbalanced data in churn prediction models. The study identifies SMOTE-ENN as the most effective method for resampling imbalanced data. Among the models tested, LightGBM (accuracy = 0.979) achieves the highest performance based on evaluation metrics. Additionally, the research highlights that tree-based machine learning models generally perform better when trained on imbalanced datasets.

FindingsTree-based and ensemble models perform better than regression and probability-based methods when dealing with imbalanced data. SMOTE-ENN has been shown to improve the performance of machine learning models greatly.

Recommendations for PractitionersPractitioners can deploy high-performance models, such as XGBoost and LightGBM, combined with effective resampling methods like SMOTE-ENN to predict customer churn in banking, marketing, and human resources.

Recommendation for ResearchersTo optimize the predictive model in the study, researchers can focus on feature selection, dimensionality reduction, or hyperparameter tuning.

Impact on SocietyCustomer churn reduces revenue and threatens competitive advantage, so businesses need effective retention strategies to maintain sustainable growth. High-performance customer churn prediction models can be an effective solution to address this issue.

Future ResearchDeploy the model on real-world datasets while further optimizing the feature selection process and hyperparameter tuning, combined with SHAP values analysis to identify key features that significantly influence the model’s predictions.

churn prediction, machine learning, imbalanced data, classification models, oversampling, undersampling, hybrid method, banking industry

DOWNLOAD PDF

83 total downloads

Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector

Share this

SEARCH PUBLICATIONS

ISI Journals

Collaborative Journals

Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector

Share this

SEARCH PUBLICATIONS