Context-Aware Models for Automatic Source Code Summarization
Source Code Summarization is a program comprehension task that consists of writing natural language descriptions of source code. These summaries are important because they are an essential part of software documentation, such as the descriptions in APIs. They are also necessary for maintenance of legacy soft- ware systems. The state-of-the-art for automatic source code summarization, when I started my work were neural networks developed for machine translation. They were designed to accept a snippet of source code, usually a subroutine, as a sequence of tokens and generate an English language description. These techniques were based on sequence-to-sequence learning , i.e., the summary sequence was built one word at a time, using an attention mechanism and code sequence. However, often some of the information required to summarize the subroutine descriptively is not inside the subroutine. The necessary information lives in the ”context” around the code, such as other subroutines, files, and build files, as well as the pre-learnt human knowledge. In this dissertation, I will present my research on modeling various types of contextual information for better automatic source code summarization.
History
Date Created
2024-04-12Date Modified
2024-05-02Defense Date
2024-04-12CIP Code
- 14.0901
Research Director(s)
Collin McMillanCommittee Members
Toby Li Joanna Cecilia da Silva Santos Yu HuangDegree
- Doctor of Philosophy
Degree Level
- Doctoral Dissertation
Language
- English
Library Record
006583167OCLC Number
1432453179Publisher
University of Notre DameAdditional Groups
- Computer Science and Engineering
Program Name
- Computer Science and Engineering