University of Notre Dame
Browse
WoodAE042019T.pdf (590.73 kB)

Low Data Dialogue Act Classification for Virtual Agents during Debugging

Download (590.73 kB)
thesis
posted on 2019-04-08, 00:00 authored by Andrew Wood

A 'dialogue act' is a written or spoken action during a conversation. Dialogue acts are usually only a few words long, and are divided by researchers into a relatively small set (often less than 10) of dialogue act types, such as eliciting information, expressing an opinion, or making a greeting. Research interest into automatic classification of dialogue acts has grown recently due to the proliferation of Virtual Agents (VA) e.g. Siri, Cortana, Alexa. But unfortunately, the gains made into VA development in one domain are generally not applicable to other domains, since the composition of dialogue acts differs in different conversations. In this thesis, I target the problem of dialogue act classification for a VA assistant to software engineering repairing bugs in a low data setting. A problem in the SE domain is that very little sample data exists. Therefore, I present a transfer-learning approach to learn on a much larger dataset for general business conversations, and apply the knowledge to a manually created corpus of debugging conversations collected from 30 professional developers in a 'Wizard of Oz' experiment and manually annotated with a predetermined dialogue act set. In experiments, we observe between 8% and 20% improvements over two key baselines. Additionally, I present a separate dialogue act classifier on the manually collected dataset that uses a manually discovered SE specific dialogue act set which achieves on average 69% precision and 50% recall over 5-fold cross validation.

History

Date Modified

2019-06-08

CIP Code

  • 14.0901

Research Director(s)

Collin McMillan

Committee Members

Jane Cleland-Huang David Chiang

Degree

  • Master of Science in Computer Science and Engineering

Degree Level

  • Master's Thesis

Language

  • English

Alternate Identifier

1103924345

Library Record

5105886

OCLC Number

1103924345

Program Name

  • Computer Science and Engineering

Usage metrics

    Masters Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC