Learning Hyperparameters for Neural Machine Translation

Doctoral Dissertation


Machine Translation, the subfield of Computer Science that focuses on translating between two human languages, has greatly benefited from neural networks. However, these neural machine translation systems have complicated architectures with many hyperparameters that need to be manually chosen. Frequently, these are selected either through a grid search over values, or by using values commonplace in the literature. However, these are not theoretically justified and the same values are not optimal for all language pairs and datasets.

Fortunately, the innate structure of the problem allows for optimization of these hyperparameters during training. Traditionally, the hyperparameters of a system are chosen and then a learning algorithm optimizes all of the parameters within the model. In this work, I propose three methods to learn the optimal hyperparameters during the training of the model, allowing for one step instead of two. First, I propose using group regularizers to learn the number, and size of, the hidden neural network layers. Second, I demonstrate how to use a perceptron-like tuning method to solve known problems of undertranslation and label bias. Finally, I propose an Expectation-Maximization based method to learn the optimal vocabulary size and granularity. Using various techniques from machine learning and numerical optimization, this dissertation covers how to learn hyperparameters of a Neural Machine Translation system while training the model itself.


Attribute NameValues
Author Kenton Murray
Contributor David Chiang, Research Director
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name Doctor of Philosophy
Banner Code

Defense Date
  • 2019-12-18

Submission Date 2020-04-21
  • machine translation, neural machine translation, NMT, hyperparameters, optimal, learning, beam search, translate, neural networks, auto-sizing, BPE, EM, Proximal

Record Visibility Public
Content License
  • All rights reserved

Departments and Units
Catalog Record


Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.