University of Notre Dame
Browse

File(s) under embargo

19

day(s)

until file(s) become available

Thermodynamics-Informed Machine Learning for the Design of Sustainable Materials: The Dawn of Digital Molecular Discovery

dataset
posted on 2024-04-30, 16:10 authored by João Dinis Oliveira Abranches
Scientists have traditionally employed trial-and-error methodologies to design novel materials, often complemented by basic heuristic rules or chemical intuition (e.g., “like dissolves like”). However, to date, this simplistic approach has led to the discovery and characterization of only a small fraction of all synthesizable compounds. Data-driven approaches such as machine learning are promising alternative routes to these traditional trial-and-error methodologies. Unfortunately, most machine learning models proposed so far do not embed chemical or thermodynamic information in their architectures and molecular descriptors. In turn, this leads to overly complex models that require a tremendous volume of experimental data to be properly trained. At the interface between artificial intelligence and green chemistry, the work developed throughout this dissertation uses thermodynamics-informed machine learning to bridge the gap between small, scarce datasets and data-driven approaches. This is accomplished using two major avenues. The first is through the development of active learning workflows, based on Gaussian process machine learning models, that target the description of activity coefficients. This unique approach was particularly directed at capturing the physicochemical properties of mixtures, namely deep eutectic solvents. Active learning was able to efficiently guide the acquisition of experimental data, and, in many cases, a single data point was sufficient to accurately describe mixture properties (namely phase equilibria diagrams), dramatically reducing the effort and cost necessary to characterize novel sustainable materials. The second major avenue lies in the development of a digital molecular space based on sigma profiles. These molecular descriptors, derived from quantum chemistry, were shown to be a powerful feature set for neural networks, leading to the accurate prediction of assorted physicochemical properties (e.g., boiling points and aqueous solubilities) for organic and inorganic molecules. A graph neural network was also developed to predict sigma profiles, bypassing the need for expensive quantum chemistry calculations. Finally, sigma profiles were shown to behave as a digital molecular space where optimization tasks can be performed. A remarkable example of this was that of Bayesian optimization towards boiling point optimization. Holding no knowledge of chemistry except for the sigma profile and normal boiling temperature of carbon monoxide (the worst possible initial guess), Bayesian optimization found the global maximum of the available normal boiling temperature dataset (over 1000 molecules encompassing more than 40 families of organic and inorganic compounds) in just fifteen iterations (i.e., fifteen property measurements), cementing sigma profiles as an ideal digital chemical space for molecular optimization and discovery, particularly when little experimental data is available.

History

Date Created

2024-04-04

Date Modified

2024-04-30

Defense Date

2024-03-26

CIP Code

  • 14.0701

Research Director(s)

Edward Maginn,Yamil Colon

Committee Members

William Schneider

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Library Record

006582878

OCLC Number

1432159676

Publisher

University of Notre Dame

Program Name

  • Chemical and Biomolecular Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC