next up previous contents
Next: Speech and Sound Up: Language and Program Previous: Language Modelling and

Discovering Design Features in Computer Programs

The objective of this research is that of developing methods for retrieving segments of computer programs stored in a repository. The main purpose is program understanding for software maintenance and re-engineering. This research is part of a network project involving the IBM Toronto Laboratory, the University of Toronto and the University of Victoria. The project of the McGill node involves the development of pattern matching techniques for accessing code segments stored into a repository developed at the University of Toronto and involving analysis tools developed at the University of Victoria. The use of a set of software metrics as features for obtaining a concise representation of a program segment has been investigated. Dynamic programming is used to calculate distances between potential similar code fragments in terms of insertions, deletions and substitution of statements and expressions. Moreover, statistical models are used to detect matches between abstract code descriptions written in an abstract language and code source. The statistical models evaluate the similarity between an abstract description and the actual code as the probability since the abstract description can generate this particular code fragment. Finally, algorithms for storing and retrieving data from the University of Toronto repository, have been devised so that it is possible to link the McGill tools with other Reverse Engineering tools such as the one developed at the University of Victoria. The whole system runs on a distributed environment and is targeted to be used in large (>1MLOC) commercial software systems.

M. Bernstein, R. DeMori, K. Kontogiannis, S. Lecalvez, M. McLachlan, E. Merlo (Ecole Polytechnique)

Thierry Baron
Mon Nov 13 10:43:02 EST 1995