SemdAnalysis

Short Description

The research focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a software system. We call the analysis that uses this type of information semantic directed analysis (semD analysis). The other dimension, structural, refrs to issues such as the actual syntactic structure of the program along with the control and data flow that it represents. We call the analysis that uses this type of information structure directed analysis (strD analysis).

Project Details

The tasks of maintenance and reengineering of an existing software system require a great deal of effort to be spent on understanding the source code to determine the behavior, organization, and architecture of the software not reflected in documentation. Various methods exist to perform static analysis of the software system. Most of the existing methods focus on the structural information embedded in the source code, derived mostly from the programming language syntax (e.g., data and control flow). The software engineer must examine both the structural aspect of the source code and the nature of the problem domain (e.g., comments, documentation, and variable names) to extract the information needed to fully understand any part of the system. Static analysis directly supports software comprehension, which is a key aspect behind much of the software development or maintenance process. It is vital for learning to program, debugging, reuse, documentation, verification, and maintenance. With this in mind, we are proposing a framework that will provide the foundation for developing new static analysis methods, while enhancing existing one, by combining the analysis of structural and semantic information embedded in the software.

Existing measurement and analysis methods are investigated as well as their usage in conjunction with the analysis of the semantic information. As means to analyze the semantic information extracted from software, we use an advanced information retrieval method, namely Latent Semantic Indexing (LSI). Variants of LSI implementations are investigated and ways to improve the representation of the semantic space, by augment it with structural information.

A series of experiments are executed to validate the framework. These experiments compare results with existing methods in various areas of application. The framework will be used to both develop new analysis methods, as well as combine existing ones to provide better results. Applications of the framework and the new analysis methods to support maintenance and understanding tasks (e.g., re-modularization, clone detection, re-documentation, separation of concerns, identification of interleaved code, identification of patterns and feature location, etc.) is investigated.

Some applications of the method so far are in:

Finally, the framework will provide a research platform that will help answer research questions such as:

Funding

The project is funded National Science Foundation through grant C-CR 02-04175.

People

Papers