SemdAnalysis
Short Description
The research focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a software system. We call the analysis that uses this type of information semantic directed analysis (semD analysis). The other dimension, structural, refrs to issues such as the actual syntactic structure of the program along with the control and data flow that it represents. We call the analysis that uses this type of information structure directed analysis (strD analysis).
Project Details
The tasks of maintenance and reengineering of an existing software system require a great deal of effort to be spent on understanding the source code to determine the behavior, organization, and architecture of the software not reflected in documentation. Various methods exist to perform static analysis of the software system. Most of the existing methods focus on the structural information embedded in the source code, derived mostly from the programming language syntax (e.g., data and control flow). The software engineer must examine both the structural aspect of the source code and the nature of the problem domain (e.g., comments, documentation, and variable names) to extract the information needed to fully understand any part of the system. Static analysis directly supports software comprehension, which is a key aspect behind much of the software development or maintenance process. It is vital for learning to program, debugging, reuse, documentation, verification, and maintenance. With this in mind, we are proposing a framework that will provide the foundation for developing new static analysis methods, while enhancing existing one, by combining the analysis of structural and semantic information embedded in the software.
Existing measurement and analysis methods are investigated as well as their usage in conjunction with the analysis of the semantic information. As means to analyze the semantic information extracted from software, we use an advanced information retrieval method, namely Latent Semantic Indexing (LSI). Variants of LSI implementations are investigated and ways to improve the representation of the semantic space, by augment it with structural information.
A series of experiments are executed to validate the framework. These experiments compare results with existing methods in various areas of application. The framework will be used to both develop new analysis methods, as well as combine existing ones to provide better results. Applications of the framework and the new analysis methods to support maintenance and understanding tasks (e.g., re-modularization, clone detection, re-documentation, separation of concerns, identification of interleaved code, identification of patterns and feature location, etc.) is investigated.
Some applications of the method so far are in:
- Identification of Abstract Data Types
- Defining and measuring the Semantic Cohesion of software modules
- Identification of high-level concept clones
- Identification of concerns
Finally, the framework will provide a research platform that will help answer research questions such as:
- How well information retrieval methods perform in extracting meaningful semantic information from the source code and its associated documentation?
- What is the best way to incorporate the semantic information into the source code?
- Does the combination of semantic and structural information improve the results of static analysis? What are the best ways to perform this combination?
- What new software measures and metrics can be computed using semantic information?
- Which existing measurement and analysis methods yield better results by integrating semantic information?
Funding
The project is funded National Science Foundation through grant C-CR 02-04175.
People
Papers
- Marcus, A., Maletic, J.I. "Identification of High-Level Concept Clones in Source Code", in Proceedings of the 16th IEEE International Conference on Automated Software Engineering (ASE 2001), San Diego, CA, USA, November 26-29, pp. 107-114.
- Maletic, J.I., Marcus, A., "Supporting Program Comprehension Using Semantic and Structural Information", in Proceedings of the 23rd International Conference on Software Engineering (ICSE 2001), Toronto, Ontario, Canada, May 12-19, 2001, pp. 103-112
- Maletic, J.I., Marcus, A., "Using Latent Semantic Analysis to Identify Similarities in Source Code to Support Program Understanding", in Proceedings for the 12th IEEE International Conference on Tools with Artificial Intelligences (ICTAI 2000), Vancouver, British Columbia, Canada, November 13-14, 2000, pp. 46-53.
- Maletic, J.I., Marcus, A., "Support for Software Maintenance Using Latent Semantic Analysis", in Proceeding for the 4th Anual IASTED International Conference on Software Engineering and Applications (SEA2000), Las Vegas, Nevada, November 6-9, 2000, pp. 250-255.
- Maletic, J. I. and Valluri, N., (1999), "Automatic Software Clustering via Latent Semantic Analysis", in Proceedings of 14th IEEE International Conference on Automated Software Engineering (ASE 1999), Cocoa Beach Florida, October, pp. 251-254.