University of Notre Dame
Browse

File(s) under embargo

3

month(s)

12

day(s)

until file(s) become available

Advancing chemical synthesis with machine learning: opportunities and limitations

dataset
posted on 2024-09-04, 15:28 authored by Bozhao Nan
With advancements in computational power and increased data availability, machine learning (ML) has been applied in predicting chemical reactions and proposing synthetic pathways. This thesis contributes to advancing chemical reaction discovery through ML across three primary domains. Initially, computational methods were used to analyze transition states in reaction routes generated by the Sarpong group using Synthia™, evaluating their computational feasibility. Then, industrial electronic lab notebook (ELN) data, supported by AZ, were processed and featurized. Various ML techniques, including Random Forests (RF), k-Nearest Neighbors (KNN), Neural Networks (NN), and Graph Neural Networks (GNN), were applied to predict reaction yields. Yield imbalances in HTE and ELN were addressed to enhance yield prediction in critical regions using imbalanced regression methods. Large Language Models (LLMs) were integrated for data extraction, solving inconsistencies in USPTO datasets from multiple sources, and investigating the intricate information space of reaction procedure through a specific study on t-butyl ester deprotection. In the second part, substantial advancements were achieved in Molecular Representation Learning (MRL) to accurately capture molecular structures and physical behavior. By evaluating 3D GNNs and conformer ensemble-based models, this research extends beyond traditional SMILES, fingerprints, and 2D molecular graphs, enhancing the precision of predictions for molecule and reaction-level properties. These improvements are crucial for tasks such as enantiomeric excess (ee) selectivity prediction and binding energy (BE) prediction studies.

History

Date Created

2024-08-23

Date Modified

2024-08-27

Defense Date

2024-08-15

CIP Code

  • 40.0501

Research Director(s)

Olaf Wiest Paul Helquist

Committee Members

Richard Taylor Xiangliang Zhang Brandon Ashfeld

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Publisher

University of Notre Dame

Additional Groups

  • Chemistry and Biochemistry

Program Name

  • Chemistry and Biochemistry

Usage metrics

    Dissertations

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC