The Impact of Automated Parameter Optimization on Defect Prediction Models

Authors: Chakkrit Tantithamthavorn Shane McIntosh Ahmed E. Hassan Kenichi Matsumoto

Venue: TSE   IEEE Transactions on Software Engineering, Vol. 45, No. 7, pp. 683-711, 2019

Year: 2019

Abstract: Defect prediction models-classifiers that identify defect-prone software modules-have configurable parameters that control their characteristics (e.g., the number of trees in a random forest). Recent studies show that these classifiers underperform when default settings are used. In this paper, we study the impact of automated parameter optimization on defect prediction models. Through a case study of 18 datasets, we find that automated parameter optimization: (1) improves AUC performance by up to 40 percentage points; (2) yields classifiers that are at least as stable as those trained using default settings; (3) substantially shifts the importance ranking of variables, with as few as 28 percent of the top-ranked variables in optimized classifiers also being top-ranked in non-optimized classifiers; (4) yields optimized settings for 17 of the 20 most sensitive parameters that transfer among datasets without a statistically significant drop in performance; and (5) adds less than 30 minutes of additional computation to 12 of the 26 studied classification techniques. While widely-used classification techniques like random forest and support vector machines are not optimization-sensitive, traditionally overlooked techniques like C5.0 and neural networks can actually outperform widely-used techniques after optimization is applied. This highlights the importance of exploring the parameter space when using parameter-sensitive classification techniques.


    author = "Chakkrit Tantithamthavorn and Shane McIntosh and Ahmed E. Hassan and Kenichi Matsumoto",
    title = "The Impact of Automated Parameter Optimization on Defect Prediction Models",
    year = "2019",
    pages = "683-711",
    journal = "IEEE Transactions on Software Engineering",
    volume = "45",
    number = "7"

Plain Text:

Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto, "The Impact of Automated Parameter Optimization on Defect Prediction Models," IEEE Transactions on Software Engineering, pp. 683-711