University of Notre Dame
Browse
JiangS042018D.pdf (3.18 MB)

Improving Program Comprehension Using Neural Machine Translation

Download (3.18 MB)
thesis
posted on 2018-04-09, 00:00 authored by Siyuan Jiang

Program comprehension is the task of understanding software projects. One effective way to help programmers comprehend code is documentation. However, manual documentation costs a significant amount of time and effort. Therefore, software documents are often incomplete and out of date. Automatic documentation generation is to address the problem of lack of documentation and to improve the quality of documentation.

In this dissertation, I summarize my work on improving program comprehension using neural machine translation to generate documents. First, I conduct two user studies about programmers' behavior when they comprehend code before and after they make changes. Specifically, I study how programmers do change impact analysis, which is the task of finding source code that is affected by a change. The studies show that programmers do more change impact analysis before they make changes than after the changes. When programmers need to understand a change in a software repository, a summary of the change is an important component of the comprehension process. My first project of generating documents is to generate short summaries of software changes using neural machine translation. Neural machine translation (NMT) is a type of neural network for translating natural languages. This project demonstrates that NMT can also be used in translating from diff (results of text differencing techniques) to English text. My second project focuses on topic labeling to generate descriptions of key functionalities in software projects. Topic labeling is the task of labeling hidden topics in topic models, which are often used in program comprehension tools and research to find key functionalities. In these tools and research, key functionalities in software projects are assumed to be the hidden topics in topic models. However, the topics are represented by lists of words with probabilities, which are difficult to interpret. Labeling topics is often a manual task. My project uses NMT to translate topics represented by lists of words to English text. The results show that NMT-generated descriptions are more helpful for programmers to understand software projects than the lists of words.

History

Date Created

2018-04-09

Date Modified

2018-10-05

Defense Date

2018-03-07

Research Director(s)

Collin McMillan

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC