RFC: a feature selection algorithm for software defect prediction
RFC: a feature selection algorithm for software defect prediction
基本信息来源于合作网站,原文需代理用户跳转至来源网站获取
摘要:
Software defect prediction (SDP) is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects, so as to effectively predict defects in the new software. However, there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors. In order to identify and re-move the redundant and irrelevant features in software defect datasets, we propose ReliefF-based clustering (RFC), a cluster-based feature selection algorithm. Then, the correlation between features is calculated based on the symmetric uncertainty. Ac-cording to the correlation degree, RFC partitions features into k clusters based on the k-medoids algorithm, and finally selects the representative features from each cluster to form the final feature subset. In the experiments, we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration (NASA) software defect prediction datasets in terms of area under curve (AUC) and F-value. The experimental results show that RFC can effectively improve the performance of SDP.