Artificial Intelligent Approaches for Navigating Thermodynamic, Molecular, and Material Design Space in Porous Materials
dataset
posted on 2025-04-23, 14:19authored byEtinosa James Osaro
The rapid discovery and optimization of metal-organic frameworks (MOFs) for adsorption, diffusion, and gas separation applications require computational strategies that efficiently balance predictive accuracy with computational cost. Traditional simulation techniques such as Classical and Quantum Chemistry methods provide accurate insights but are computationally prohibitive when applied to large-scale materials screening. Machine learning (ML) has emerged as a powerful tool to accelerate MOF discovery, but its effectiveness depends on the availability of large, high-quality training datasets. This dissertation integrates Active Learning (AL), Reinforcement Learning (RL), and Inducing Points (IPs) to systematically explore the thermodynamic, molecular, and materials design space of MOFs, significantly enhancing the efficiency of computational screening workflows.
Active Learning is employed to optimize gas adsorption predictions in MOFs while minimizing the number of required simulations. Gaussian Process Regression (GPR) models, combined with various acquisition functions, guide iterative data selection for adsorption modeling, enabling high predictive accuracy with a fraction of the data typically required. Extending this approach, an alchemical molecule-based AL strategy is introduced to predict real-molecule adsorption using surrogate molecular interactions, reducing training dataset size while maintaining accuracy. Furthermore, AL is applied to selectivity predictions in gas separations by integrating adsorption and diffusion modeling into an end-to-end (E2E) framework, improving data acquisition efficiency and minimizing redundant sampling across adsorption and diffusion models.
To refine training dataset selection, Bayesian Optimization strategies such as Expected Improvement (EI) and Probability of Improvement (PI) are integrated within an AL framework, ensuring that the most informative MOFs are selected based on key structural properties.
Inducing Points (IPs) are incorporated as a complementary strategy to further enhance model efficiency by selecting a representative subset of MOFs that capture the diversity of the full dataset. By leveraging kernel-based methods and principal component analysis (PCA), IPs reduce training data requirements while maintaining model generalizability. A comparative analysis across different AL, BO, and IP-based approaches reveals that combining these strategies significantly improves model robustness while minimizing computational expense.
On the other hand, reinforcement learning (RL) is further introduced to actively guide data selection for adsorption modeling. Using Q-learning within a Gaussian Process framework, RL optimizes the selection of MOFs for gas adsorption studies, improving predictive convergence while reducing computational cost compared to standard AL strategies.
By integrating AL, BO, RL, and IP methodologies, this research establishes a scalable and computationally efficient framework for MOF screening, offering a transformative approach to materials discovery. The findings contribute to the broader field of AI-assisted materials informatics, facilitating the rapid identification of MOFs for applications in energy storage, carbon capture, and industrial gas separations. Through these innovations, this work advances the role of artificial intelligence in accelerating the exploration of porous materials, bridging the gap between computational efficiency and predictive accuracy.