Invertibility and Transitivity in Low-Resource Machine Translation

Levinboim, Tomer

doi:10.7274/m900ns08m2r

LevinboimT042017D.pdf (1.86 MB)

Invertibility and Transitivity in Low-Resource Machine Translation

thesis

posted on 2017-04-13, 00:00 authored by Tomer Levinboim

Translation is a process that maps sentences in a source language to sentences in a target language while preserving the meaning of the original text. Viewed as a mathematical relation, one could expect that translation will exhibit two abstract properties – transitivity and invertibility. Transitivity implies that translation through an intermediate (third) language should not differ by much from direct source-to-target translation. Invertibility implies that translating a source sentence from, and then back to the source language, should likely result in the original sentence. However, we notice that these two properties are generally ignored by traditional statistical techniques for machine translation, and furthermore, that earlier research that does take them into account fails to fully utilize them.

In this dissertation, we describe novel machine learning algorithms and techniques that promote or better employ transitivity and invertibility. We integrate our techniques into the phrase-based machine translation pipeline and carry out translation experiments in the “low-resource” data scenario, which assumes the amount of training data is limited (a realistic assumption for most language pairs), and that syntactic analyzers of the source language are unavailable. Our experiments demonstrate that when faced with limited training data, phrase-based machine translation quality can significantly benefit from invertibility and transitivity considerations.

History

Date Created

2017-04-13

Date Modified

2018-10-30

Defense Date

2017-04-07

Research Director(s)

David Chiang

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Language

English

Program Name

Computer Science and Engineering

Usage metrics

Keywords

Low resource machine translation, building automatic translation systems with limited amounts of data

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Invertibility and Transitivity in Low-Resource Machine Translation

History

Date Created

Date Modified

Defense Date

Research Director(s)

Degree

Degree Level

Language

Program Name

Usage metrics

Categories

Keywords

Licence

Exports