Low Data Dialogue Act Classification for Virtual Agents during Debugging

Master's Thesis

Abstract

A “dialogue act” is a written or spoken action during a conversation. Dialogue acts are usually only a few words long, and are divided by researchers into a relatively small set (often less than 10) of dialogue act types, such as eliciting information, expressing an opinion, or making a greeting. Research interest into automatic classification of dialogue acts has grown recently due to the proliferation of Virtual Agents (VA) e.g. Siri, Cortana, Alexa. But unfortunately, the gains made into VA development in one domain are generally not applicable to other domains, since the composition of dialogue acts differs in different conversations. In this thesis, I target the problem of dialogue act classification for a VA assistant to software engineering repairing bugs in a low data setting. A problem in the SE domain is that very little sample data exists. Therefore, I present a transfer-learning approach to learn on a much larger dataset for general business conversations, and apply the knowledge to a manually created corpus of debugging conversations collected from 30 professional developers in a “Wizard of Oz” experiment and manually annotated with a predetermined dialogue act set. In experiments, we observe between 8% and 20% improvements over two key baselines. Additionally, I present a separate dialogue act classifier on the manually collected dataset that uses a manually discovered SE specific dialogue act set which achieves on average 69% precision and 50% recall over 5-fold cross validation.

Attributes

Attribute NameValues
Author Andrew Wood
Contributor Jane Cleland-Huang, Committee Member
Contributor David Chiang, Committee Member
Contributor Collin McMillan, Research Director
Degree Level Master's Thesis
Degree Discipline Computer Science and Engineering
Degree Name Master of Science in Computer Science and Engineering
Banner Code
  • MSCSE

Defense Date
  • 2019-03-19

Submission Date 2019-04-08
Subject
  • Virtual Assistants

  • Machine Learning

  • Dialogue Act Classification

  • Natural Language Processing

  • Virtual Agents

Language
  • English

Record Visibility and Access Public
Content License
Departments and Units
Catalog Record

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.