University of Notre Dame
Browse

Machine Learning Methods for High-Dimensional and Multimodal Single-Cell Data

Download (20.45 MB)
dataset
posted on 2025-06-09, 17:04 authored by Ouyang Zhu
Recent advances in single-cell and multi-omics technologies have enabled high-resolution profiling of cellular states, but also introduced new computational challenges. This dissertation presents machine learning methods to improve data quality and extract insights from high-dimensional, multimodal single-cell datasets. First, we propose Decaf K-means, a clustering algorithm that accounts for cluster-specific confounding effects, such as batch variation, directly during clustering. This approach improves clustering accuracy in both synthetic and real data. Second, we develop scPDA, a denoising method for droplet-based single-cell protein data that eliminates the need for empty droplets or null controls. scPDA models protein-protein relationships to enhance denoising accuracy and significantly improves cell-type identification. Third, we introduce Scouter, a model that predicts transcriptional outcomes of unseen gene perturbations. Scouter combines neural networks with large language models to generalize across perturbations, reducing prediction error by over 50% compared to existing methods. Finally, we extend this to TranScouter, which predicts transcriptional responses under new biological conditions without direct perturbation data. Using a tailored encoder-decoder architecture, TranScouter achieves accurate cross-condition predictions, paving the way for more generalizable models in perturbation biology.

History

Date Created

2025-05-29

Date Modified

2025-06-09

Defense Date

2025-03-28

CIP Code

  • 27.9999

Research Director(s)

Jun Li

Committee Members

Xiufan Yu Tiffany Tang

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Library Record

006714593

OCLC Number

1522965215

Publisher

University of Notre Dame

Additional Groups

  • Applied and Computational Mathematics and Statistics

Program Name

  • Applied and Computational Mathematics and Statistics

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC