Ensemble Undersampling to Handle Unbalanced Class on Cross-Project Defect Prediction
Aries Saifudin (a*, b), Yaya Heryadi (b), Lukas (b)
a) Informatics Engineering, Pamulang University
Jl. Puspitek Raya no. 46, buaran, Serpong, Tangerang Selatan, banten, Indonesia
*aries.saifudin[at]gmail.com
b) Computer Science Department, Graduate Program-Doctor of Computer Science, Bina Nusantara University
Jl. Kebon Jeruk Raya No. 27 Kebon Jeruk, Jakarta Barat, DKI Jakarta, Indonesia
Abstract
There has been much research which proposed for cross-project software defect prediction models but no models that perform very well with various datasets in general. Software defect dataset usually imbalanced because it contains far more the not defected modules than the defected modules. Class imbalances in the dataset can reduce the performance of classifiers in the software defect prediction model. In this study proposed a Random Undersampling algorithm to balance classes and ensemble techniques to reduce misclassification. The ensemble technique used is the AdaBoost and Bagging algorithm. The results showed that the software defect prediction model that integrates the Random Undersampling algorithm and AdaBoost provides better performance and can find more defects than other models.
Keywords: Cross-project, Software Defect Prediction, Random Undersampling, Ensemble
Topic: Informatic and Information System