Automated topic naming to support cross-project analysis of software maintenance activities

Authors: Abram Hindle Neil A. Ernst Michael W. Godfrey John Mylopoulos

Venue: MSR   8th Working Conference on Mining Software Repositories, pp. 163–172, 2011

Year: 2011

Abstract: Researchers have employed a variety of techniques to extract underlying topics that relate to software development artifacts. Typically, these techniques use semi-unsupervised machine-learning algorithms to suggest candidate word-lists. However, word-lists are difficult to interpret in the absence of meaningful summary labels. Current topic modeling techniques assume manual labelling and do not use domainspecific knowledge to improve, contextualize, or describe results for the developers. We propose a solution: automated labelled topic extraction. Topics are extracted using Latent Dirichlet Allocation (LDA) from commit-log comments recovered from source control systems such as CVS and Bit-Keeper. These topics are given labels from a generalizable cross-project taxonomy, consisting of non-functional requirements. Our approach was evaluated with experiments and case studies on two large-scale RDBMS projects: MySQL and MaxDB. The case studies show that labelled topic extraction can produce appropriate, context-sensitive labels relevant to these projects, which provides fresh insight into their evolving software development activities.

BibTeX:

@inproceedings{abramhindle2011atntscaosma,
    author = "Abram Hindle and Neil A. Ernst and Michael W. Godfrey and John Mylopoulos",
    title = "Automated topic naming to support cross-project analysis of software maintenance activities",
    year = "2011",
    pages = "163–172",
    booktitle = "Proceedings of the 8th Working Conference on Mining Software Repositories"
}

Plain Text:

Abram Hindle, Neil A. Ernst, Michael W. Godfrey, and John Mylopoulos, "Automated topic naming to support cross-project analysis of software maintenance activities," 8th Working Conference on Mining Software Repositories, pp. 163–172