Ensemble methods for flux calculation

Doctoral Dissertation


Molecular Dynamics simulation is a numerical tool for simulating movements of molecules. It generates a sequence of coordinates representing Brownian motion of molecules, which is called a trajectory. Biological studies such as drug design usually involves reaction mechanisms over long time scales, which may cost several years of CPU clock time for simulation. Therefore, ensemble algorithms are developed to significantly accelerate MD simulations using distributed computing systems.

A complication of ensemble algorithms is that they usually require a one dimensional reaction coordinate (RC), and it is challenging to extract RC from high dimensional conformational space. Two algorithms that overcome this complication attracted attention over the past few years: the so-called Weighted Ensemble (WE) and Markov State Models (MSMs). Instead of RC, clustering of microscopic configurations into networks of “macro-states’ is required for both algorithms. However, defining macro-states is still a complicated procedure which relies on sufficient sampling of the conformational space and the design of clustering algorithm.

In Chapter 1,2,3, I show that WE rate predictions are less sensitive than MSM predictions to the definition of the macro-states. MSMs introduce significant biases in the computation of reaction rates, which depend on the boundaries of macro-states. On the other hand, AWE, a formulation of Weighted Ensemble that uses the notion of colors to compute fluxes along with a different algorithm to kill and split walkers, has reliable flux estimation on varying definitions of macro-states. Rigorous numerical experiments using alanine dipeptide and penta-alanine support the analyses. The results suggest that whereas an MSM provides a good idea of the metastable sets and visualization of overall dynamics, the computation of dynamical quantities is in general less biased when done using AWE.

Although accuracy of AWE is not sensitive to the underlying partition, efficiency of AWE could be affected. Current WE algorithms are developed using Voronoi bins partition on conformational space, but this leads to poor partition on reaction coordinate. It is further discussed that the metastable states partition, which defines state with maximum kinetic connectivity inside, provides a better partition for AWE. Numerical results on alanine dipeptide show significant improvement on efficiency of AWE using metastable states partition over AWE using Voronoi bins partition, especially when setting small number of states for underlying partition.

To further accelerate AWE, I worked on improving efficiency of the algorithm for discovering metastable states from MD trajectories. In the existing studies, Monte Carlo simulated annealing (MCSA) has been widely applied to define metastable states with optimal metastability of the dynamical system. Chapter 6 proposes two greedy algorithms, G1 and G2, based on different definitions of local search space to improve efficiency and scalability of MCSA on distributed computing system. Numerical experiments are conducted on two biological systems, alanine dipeptide and WW domain. The numerical experiments demonstrate that G1 is the most efficient of the three on a single core machine and distributed computing system. Sequential version of G2 is the slowest but it gains the most speed up on distributed computing systems.


Attribute NameValues
  • etd-04032015-092240

Author Haoyun Feng
Advisor Jesus A. Izaguirre
Contributor Jesus A. Izaguirre, Committee Chair
Contributor Christopher Sweet, Committee Member
Contributor Eric Darve, Committee Member
Contributor Zoltan Toroczkai, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name PhD
Defense Date
  • 2014-12-04

Submission Date 2015-04-03
  • United States of America

  • Markov state models

  • Ensemble methods

  • Weighted ensemble

  • Molecular dynamics simulation

  • University of Notre Dame

  • English

Record Visibility Public
Content License
  • All rights reserved

Departments and Units


Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.