Understanding software artifact provenance

Authors: Michael W. Godfrey

Venue: Science of Computer Programming, Vol. 97, No. P1, pp. 86–90, 2015

Year: 2015

Abstract: In a well designed software system, units of related functionality are organized into modules and classes, which are in turn arranged into inheritance trees, package hierarchies, components, libraries, frameworks, and services. The trade-offs between simplicity versus flexibility and power are carefully considered, and interfaces are designed that expose the key functional properties of a component while hiding much of the complexity of the implementation details. However, over time the design integrity of a well-engineered system tends to decay as new features are added, as new quality attributes are emphasized, and as old architectural knowledge is lost when experienced development personnel shift to new jobs. Consequently, as developers and as users we often find ourselves looking at a piece of functionality or other design artifact and wondering, "Why is this here?" That is, we would like to examine the provenance of an artifact to understand its history and why it is where it is within the current design of the system. In this brief paper, we sketch some of the dimensions of the broad problem of extracting and reasoning about the provenance of software development artifacts. As a motivating example, we also describe some recent related work that uses hashing to quickly and accurately identify version information of embedded Java libraries. We motivate the need to model for software artifact provenance.We sketch the dimensions of the problem space, and discuss its analysis.We describe an example, using hashing to identify library version information.


    author = "Michael W. Godfrey",
    title = "Understanding software artifact provenance",
    year = "2015",
    pages = "86–90",
    journal = "Science of Computer Programming",
    volume = "97",
    number = "P1"

Plain Text:

Michael W. Godfrey, "Understanding software artifact provenance," Science of Computer Programming, pp. 86–90