University of Notre Dame
Browse
MaR072021D.pdf (9.68 MB)

Polymer Design via Data-Driven Approaches

Download (9.68 MB)
thesis
posted on 2021-07-08, 00:00 authored by Ruimin Ma

Polymers are ubiquitous in real-world applications, due to possessing many superior properties. With the increasing demand for functional polymers from different fields, designing polymers that can serve different purposes is of great significance. Traditionally, designing polymers intensively depends on those time-consuming approaches, like experimental measurements and computational simulations, which cannot meet the increasing demands anymore. However, as more and more polymer data (both experimental and computational) are accumulated, data-driven approaches become a promising route to accelerate the polymer design, which is known as polymer informatics.

Currently, the core of polymer informatics research is to quantify structure-property relationships via data-driven approaches, where representing polymers is the first and foremost thing to do, as those data-driven approaches require numerical inputs. I study two general polymer representations here, including Morgan fingerprint and molecular embedding. After conducting a series of machine learning experiments on diverse datasets, I find molecular embedding outperforms Morgan fingerprint as the polymer representation. Such superiority of molecular embedding can be attributed to the accurately estimated substructure similarity in it, which has been intensively studied in this dissertation. Another important aspect of polymer informatics research is benchmark database, on which algorithms can be developed, tested, and compared. Besides, the polymers in it can also be synthesized for practical use. To achieve this, I build a benchmark database called PI1M, which contains ~1 million polymer structures for polymer informatics. 12,777 polymers are collected by hand from the well-known online polymer database PolyInfo, which is not accessible to researchers on large scale. A generative model is then trained on those collected polymers, after which around 1 million polymers are sampled from this generative model to form PI1M. The performance of PI1M is compared against PolyInfo as well as other popular organic databases, like ZINC, from different aspects, the results of which prove that PI1M can serve as a good benchmark database for polymer informatics. While PI1M is powerful as a benchmark database for polymer informatics, it does not contain polymer properties, i.e., labels, about which researchers care most. To produce polymer labels that could meet the requirements of polymer informatics, i.e., on large scale, I implement high-throughput molecular dynamics simulation for polymer labeling, where everything is automated. Taking advantage of the molecular embedding, PI1M, and high-throughput molecular dynamics simulation, while additionally assisted by machine learning, I successfully create the state-of-the-art approach for a previously underexplored task: finding amorphous polymers with high thermal conductivity. However, the polymer candidates screened from existing database cannot always meet the design goal. To mitigate this problem, we develop an inverse polymer design algorithm that can generate new polymer structures to meet the design goal. The synthesizability of the generated polymer structures is also investigated, to maximize their utility in real-world scenarios.

History

Date Modified

2021-09-08

Defense Date

2021-03-26

CIP Code

  • 14.1901

Research Director(s)

Tengfei Luo

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Alternate Identifier

1262585527

Library Record

6102799

OCLC Number

1262585527

Program Name

  • Aerospace and Mechanical Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC