This thesis analyses machine learning algorithms for predicting P2P loan defaults based on data from the Zonky platform from February 2016 to October 2021. It analyses logistic regression, discriminant analysis, classification and regression trees, random forest, Naive Bayes, K-Nearest Neighbors, AdaBoost, and XGBoost, using metrics such as confusion matrix, ROC/AUC, Gini coefficient, Kolmogorov-Smirnov statistic, and Brier Score for evaluation. The results show that XGBoost and AdaBoost a... show full abstractThis thesis analyses machine learning algorithms for predicting P2P loan defaults based on data from the Zonky platform from February 2016 to October 2021. It analyses logistic regression, discriminant analysis, classification and regression trees, random forest, Naive Bayes, K-Nearest Neighbors, AdaBoost, and XGBoost, using metrics such as confusion matrix, ROC/AUC, Gini coefficient, Kolmogorov-Smirnov statistic, and Brier Score for evaluation. The results show that XGBoost and AdaBoost are the most effective, with Elastic Net Logistic Regression in third place. This study fills a gap in the research on default probability on the Zonky dataset and highlights payment behaviour as a key factor for predicting credit risk when evaluating P2P loans. This research contributes to the existing literature on loan default probability prediction and provides insights for risk management in P2P lending. |