Translation is a process that maps sentences in a source language to sentences in a target language while preserving the meaning of the original text. Viewed as a mathematical relation, one could expect that translation will exhibit two abstract properties – transitivity and invertibility. Transitivity implies that translation through an intermediate (third) language should not differ by much from direct source-to-target translation. Invertibility implies that translating a source sentence from, and then back to the source language, should likely result in the original sentence. However, we notice that these two properties are generally ignored by traditional statistical techniques for machine translation, and furthermore, that earlier research that does take them into account fails to fully utilize them.
In this dissertation, we describe novel machine learning algorithms and techniques that promote or better employ transitivity and invertibility. We integrate our techniques into the phrase-based machine translation pipeline and carry out translation experiments in the “low-resource” data scenario, which assumes the amount of training data is limited (a realistic assumption for most language pairs), and that syntactic analyzers of the source language are unavailable. Our experiments demonstrate that when faced with limited training data, phrase-based machine translation quality can significantly benefit from invertibility and transitivity considerations.