Improving Program Comprehension Using Neural Machine Translation

Jiang, Siyuan

doi:10.7274/5h73pv6635q

JiangS042018D.pdf (3.18 MB)

Improving Program Comprehension Using Neural Machine Translation

thesis

posted on 2018-04-09, 00:00 authored by Siyuan Jiang

Program comprehension is the task of understanding software projects. One effective way to help programmers comprehend code is documentation. However, manual documentation costs a significant amount of time and effort. Therefore, software documents are often incomplete and out of date. Automatic documentation generation is to address the problem of lack of documentation and to improve the quality of documentation.

In this dissertation, I summarize my work on improving program comprehension using neural machine translation to generate documents. First, I conduct two user studies about programmers' behavior when they comprehend code before and after they make changes. Specifically, I study how programmers do change impact analysis, which is the task of finding source code that is affected by a change. The studies show that programmers do more change impact analysis before they make changes than after the changes. When programmers need to understand a change in a software repository, a summary of the change is an important component of the comprehension process. My first project of generating documents is to generate short summaries of software changes using neural machine translation. Neural machine translation (NMT) is a type of neural network for translating natural languages. This project demonstrates that NMT can also be used in translating from diff (results of text differencing techniques) to English text. My second project focuses on topic labeling to generate descriptions of key functionalities in software projects. Topic labeling is the task of labeling hidden topics in topic models, which are often used in program comprehension tools and research to find key functionalities. In these tools and research, key functionalities in software projects are assumed to be the hidden topics in topic models. However, the topics are represented by lists of words with probabilities, which are difficult to interpret. Labeling topics is often a manual task. My project uses NMT to translate topics represented by lists of words to English text. The results show that NMT-generated descriptions are more helpful for programmers to understand software projects than the lists of words.

History

Date Created

2018-04-09

Date Modified

2018-10-05

Defense Date

2018-03-07

Research Director(s)

Collin McMillan

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Program Name

Computer Science and Engineering

Usage metrics

Keywords

Not Assigned

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Improving Program Comprehension Using Neural Machine Translation

History

Date Created

Date Modified

Defense Date

Research Director(s)

Degree

Degree Level

Program Name

Usage metrics

Categories

Keywords

Licence

Exports