University of Notre Dame
Browse
LevinboimT042017D.pdf (1.86 MB)

Invertibility and Transitivity in Low-Resource Machine Translation

Download (1.86 MB)
thesis
posted on 2017-04-13, 00:00 authored by Tomer Levinboim

Translation is a process that maps sentences in a source language to sentences in a target language while preserving the meaning of the original text. Viewed as a mathematical relation, one could expect that translation will exhibit two abstract properties – transitivity and invertibility. Transitivity implies that translation through an intermediate (third) language should not differ by much from direct source-to-target translation. Invertibility implies that translating a source sentence from, and then back to the source language, should likely result in the original sentence. However, we notice that these two properties are generally ignored by traditional statistical techniques for machine translation, and furthermore, that earlier research that does take them into account fails to fully utilize them.

In this dissertation, we describe novel machine learning algorithms and techniques that promote or better employ transitivity and invertibility. We integrate our techniques into the phrase-based machine translation pipeline and carry out translation experiments in the “low-resource” data scenario, which assumes the amount of training data is limited (a realistic assumption for most language pairs), and that syntactic analyzers of the source language are unavailable. Our experiments demonstrate that when faced with limited training data, phrase-based machine translation quality can significantly benefit from invertibility and transitivity considerations.

History

Date Created

2017-04-13

Date Modified

2018-10-30

Defense Date

2017-04-07

Research Director(s)

David Chiang

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC