Software bertillonage

Authors: Julius Davies Daniel M. German Michael W. Godfrey Abram Hindle

Venue: MSR   8th Working Conference on Mining Software Repositories, pp. 183–192, 2011

Year: 2011

Abstract: Deployed software systems are typically composed of many pieces, not all of which may have been created by the main development team. Often, the provenance of included components -- such as external libraries or cloned source code -- is not clearly stated, and this uncertainty can introduce technical and ethical concerns that make it difficult for system owners and other stakeholders to manage their software assets. In this work, we motivate the need for the recovery of the provenance of software entities by a broad set of techniques that could include signature matching, source code fact extraction, software clone detection, call flow graph matching, string matching, historical analyses, and other techniques. We liken our provenance goals to that of Bertillonage, a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19th century France before the advent of fingerprints. As an example, we have developed a fast, simple, and approximate technique called anchored signature matching for identifying library version information within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 150GB collection of open source Java libraries. An exploratory case study using a proprietary e-commerce Java application illustrates that the approach is both feasible and effective.


    author = "Julius Davies and Daniel M. German and Michael W. Godfrey and Abram Hindle",
    title = "Software bertillonage",
    year = "2011",
    pages = "183–192",
    booktitle = "Proceedings of the 8th Working Conference on Mining Software Repositories"

Plain Text:

Julius Davies, Daniel M. German, Michael W. Godfrey, and Abram Hindle, "Software bertillonage," 8th Working Conference on Mining Software Repositories, pp. 183–192