The Characteristics of False-Negatives in File-level Fault Prediction

Authors: Harold Valdivia-Garcia Meiyappan Nagappan

Venue: 13th International Conference on Predictive Models and Data Analytics in Software Engineering , pp. 73-82, 2017

Year: 2017

Abstract: Over the years, a plethora of works has proposed more and more sophisticated machine learning techniques to improve fault prediction models. However, past studies using product metrics from closed-source projects, found a ceiling effect in the performance of fault prediction models. On the other hand, other studies have shown that process metrics are significantly better than product metrics for fault prediction. In our case study therefore we build models that include both product and process metrics taken together. We find that the ceiling effect found in prior studies exists even when we consider process metrics. We then qualitatively investigate the bug reports, source code files, and commit information for the bugs in the files that are false-negative in our fault prediction models trained using product and process metrics. Surprisingly, our qualitative analysis shows that bugs related to false-negative files and true-positive files are similar in terms of root causes, impact and affected components, and consequently such similarities might be exploited to enhance fault prediction models.

BibTeX:

@inproceedings{haroldvaldivia-garcia2017tcofiffp,
    author = "Harold Valdivia-Garcia and Meiyappan Nagappan",
    title = "The Characteristics of False-Negatives in File-level Fault Prediction",
    year = "2017",
    pages = "73-82",
    booktitle = "Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software
            Engineering
        "
}

Plain Text:

Harold Valdivia-Garcia and Meiyappan Nagappan, "The Characteristics of False-Negatives in File-level Fault Prediction," 13th International Conference on Predictive Models and Data Analytics in Software Engineering
        , pp. 73-82