Invertibility and Transitivity in Low-Resource Machine Translation

Doctoral Dissertation

Abstract

Translation is a process that maps sentences in a source language to sentences in a target language while preserving the meaning of the original text. Viewed as a mathematical relation, one could expect that translation will exhibit two abstract properties – transitivity and invertibility. Transitivity implies that translation through an intermediate (third) language should not differ by much from direct source-to-target translation. Invertibility implies that translating a source sentence from, and then back to the source language, should likely result in the original sentence. However, we notice that these two properties are generally ignored by traditional statistical techniques for machine translation, and furthermore, that earlier research that does take them into account fails to fully utilize them.

In this dissertation, we describe novel machine learning algorithms and techniques that promote or better employ transitivity and invertibility. We integrate our techniques into the phrase-based machine translation pipeline and carry out translation experiments in the “low-resource” data scenario, which assumes the amount of training data is limited (a realistic assumption for most language pairs), and that syntactic analyzers of the source language are unavailable. Our experiments demonstrate that when faced with limited training data, phrase-based machine translation quality can significantly benefit from invertibility and transitivity considerations.

Attributes

Attribute NameValues
Author Tomer Levinboim
Contributor David Chiang, Research Director
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name Doctor of Philosophy
Submission Date 2017-04-13
Subject
  • Low resource machine translation, building automatic translation systems with limited amounts of data

Language
  • English

Access Rights Open Access
Content License

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.