University of Notre Dame
Browse
GUOZ42024D.pdf (5.82 MB)

Empowering Graph Neural Networks for Real-World Tasks

Download (5.82 MB)
dataset
posted on 2024-05-09, 16:49 authored by Zhichun Guo
Numerous types of real-world data can be naturally represented as graphs, such as social networks, trading networks, and biological molecules. This highlights the need for effective graph representations to support various tasks. In recent years, graph neural networks (GNNs) have demonstrated remarkable success in extracting information from graphs and enabling graph-related tasks. However, they still face a series of challenges in solving real-world problems, including scarcity of labeled data, scalability issues, potential bias, etc. These challenges stem from both domain-specific issues and inherent limitations of GNNs. This thesis introduces various strategies to tackle these challenges and empower GNNs on real-world tasks. For the domain-specific challenges, in this thesis, we especially focus on challenges in the chemistry domain, which plays a pivotal role in the drug discovery process. Considering the significant resources needed for labeling through wet lab experiments, the AI for chemistry domain struggles with the scarcity of labeled datasets. To address this, we present a comprehensive set of strategies that span model-based and data-based strategies alongside a hybrid method. These methods ingeniously utilize the diversity of data, models, and molecular representations to compensate for the lack of labels in individual datasets. For the inherent challenges, this thesis introduces strategies to overcome two main challenges: scalability and degree-based issues, especially in the context of link prediction tasks. Both of these two challenges originate from the mechanism of GNNs, which involves the iterative aggregation of neighboring nodes' information to update each central node. For the scalability issue, our work not only preserves GNNs' prediction performance but also significantly boosts inference speed. Regarding degree bias, our work highly improves the effectiveness of GNNs for underrepresented nodes with very light additional computational costs. These contributions not only address critical gaps in applying GNNs to specific domains but also lay the groundwork for future exploration in the broader field of graph-based real-world tasks.

History

Date Created

2024-04-15

Date Modified

2024-05-08

Defense Date

2024-03-26

CIP Code

  • 14.0901

Research Director(s)

Nitesh V. Chawla

Committee Members

Meng Jiang|Xiangliang Zhang|Wei Wang|Neil Shah

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Library Record

006584505

OCLC Number

1433026831

Publisher

University of Notre Dame

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC