posted on 2024-05-04, 12:08authored byAakash Bansal
<p>Source Code Summarization is a program comprehension task that consists of writing natural language descriptions of source code. These summaries are important because they are an essential part of software documentation, such as the descriptions in APIs. They are also necessary for maintenance of legacy soft- ware systems. The state-of-the-art for automatic source code summarization, when I started my work were neural networks developed for machine translation. They were designed to accept a snippet of source code, usually a subroutine, as a sequence of tokens and generate an English language description. These techniques were based on sequence-to-sequence learning , i.e., the summary sequence was built one word at a time, using an attention mechanism and code sequence. However, often some of the information required to summarize the subroutine descriptively is not inside the subroutine. The necessary information lives in the ”context” around the code, such as other subroutines, files, and build files, as well as the pre-learnt human knowledge. In this dissertation, I will present my research on modeling various types of contextual information for better automatic source code summarization.</p>