Ensemble methods for flux calculation

Feng, Haoyun

doi:10.7274/3j333199s3g

FengH042015D.pdf (3.23 MB)

Ensemble methods for flux calculation

thesis

posted on 2015-04-03, 00:00 authored by Haoyun Feng

Molecular Dynamics simulation is a numerical tool for simulating movements of molecules. It generates a sequence of coordinates representing Brownian motion of molecules, which is called a trajectory. Biological studies such as drug design usually involves reaction mechanisms over long time scales, which may cost several years of CPU clock time for simulation. Therefore, ensemble algorithms are developed to significantly accelerate MD simulations using distributed computing systems.

A complication of ensemble algorithms is that they usually require a one dimensional reaction coordinate (RC), and it is challenging to extract RC from high dimensional conformational space. Two algorithms that overcome this complication attracted attention over the past few years: the so-called Weighted Ensemble (WE) and Markov State Models (MSMs). Instead of RC, clustering of microscopic configurations into networks of 'macro-states' is required for both algorithms. However, defining macro-states is still a complicated procedure which relies on sufficient sampling of the conformational space and the design of clustering algorithm.

In Chapter 1,2,3, I show that WE rate predictions are less sensitive than MSM predictions to the definition of the macro-states. MSMs introduce significant biases in the computation of reaction rates, which depend on the boundaries of macro-states. On the other hand, AWE, a formulation of Weighted Ensemble that uses the notion of colors to compute fluxes along with a different algorithm to kill and split walkers, has reliable flux estimation on varying definitions of macro-states. Rigorous numerical experiments using alanine dipeptide and penta-alanine support the analyses. The results suggest that whereas an MSM provides a good idea of the metastable sets and visualization of overall dynamics, the computation of dynamical quantities is in general less biased when done using AWE.

Although accuracy of AWE is not sensitive to the underlying partition, efficiency of AWE could be affected. Current WE algorithms are developed using Voronoi bins partition on conformational space, but this leads to poor partition on reaction coordinate. It is further discussed that the metastable states partition, which defines state with maximum kinetic connectivity inside, provides a better partition for AWE. Numerical results on alanine dipeptide show significant improvement on efficiency of AWE using metastable states partition over AWE using Voronoi bins partition, especially when setting small number of states for underlying partition.

To further accelerate AWE, I worked on improving efficiency of the algorithm for discovering metastable states from MD trajectories. In the existing studies, Monte Carlo simulated annealing (MCSA) has been widely applied to define metastable states with optimal metastability of the dynamical system. Chapter 6 proposes two greedy algorithms, G1 and G2, based on different definitions of local search space to improve efficiency and scalability of MCSA on distributed computing system. Numerical experiments are conducted on two biological systems, alanine dipeptide and WW domain. The numerical experiments demonstrate that G1 is the most efficient of the three on a single core machine and distributed computing system. Sequential version of G2 is the slowest but it gains the most speed up on distributed computing systems.

History

Date Modified

2017-06-05

Defense Date

2014-12-04

Research Director(s)

Jesus A. Izaguirre

Committee Members

Christopher Sweet Eric Darve Zoltan Toroczkai

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Language

English

Alternate Identifier

etd-04032015-092240

Publisher

University of Notre Dame

Program Name

Computer Science and Engineering

Usage metrics

Keywords

Markov state models Ensemble methods Weighted ensemble Molecular dynamics simulation

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Ensemble methods for flux calculation

History

Date Modified

Defense Date

Research Director(s)

Committee Members

Degree

Degree Level

Language

Alternate Identifier

Publisher

Program Name

Usage metrics

Categories

Keywords

Licence

Exports