University of Notre Dame
Browse

Novel Computational Approaches for Network-Based Protein Structural Classification

Download (1.45 MB)
thesis
posted on 2019-07-06, 00:00 authored by Mahboobeh Ghalehnovi

Experimental determination of protein function is resource-consuming. As an alternative, computational prediction of protein function has received attention. In this context, protein structural classification (PSC) can help, by allowing for determining structural classes of currently unclassified proteins based on their features, and then relying on the fact that proteins with similar structures have similar functions. Existing PSC approaches rely on sequence-based or direct 3-dimensional (3D) structure-based protein features. However, in this thesis, we first model protein 3D structures as protein structure networks (PSNs). Then, we use network-based features for PSC. We propose the use of graphlets, state-of-the-art features in many research areas of network science, in the task of PSC. Moreover, because graphlets can deal only with unweighted PSNs, and because accounting for edge weights when constructing PSNs could improve PSC accuracy, we also propose a deep learning framework that automatically learns network features from the weighted PSNs.

At a higher scale of cellular organization lies another biological network type: a protein-protein interaction (PPI) network of a species. In a PPI network, nodes are proteins (i.e., PSNs themselves) and edges correspond to physical bindings between the proteins. So, we also use proteins' features from the PPI network in the task of PSC. Importantly, we evaluate whether integrating PSN and PPI features of proteins improves PSC accuracy compared to using PSN features alone or PPI features alone. In the process, we compare a traditional machine learning approach (which is based on user-predefined graphlet features) against a deep learning approach (which is based on features learned automatically by a graph convolutional network method called GraphSAGE). Specifically, we propose an approach that integrates graphlet features and GraphSAGE features. Importantly, we find that the integrative approach improves the accuracy compared to using only graphlet features or GraphSAGE features.

History

Date Modified

2019-08-25

CIP Code

  • 14.0901

Research Director(s)

Tijana Milenković

Degree

  • Master of Science

Degree Level

  • Master's Thesis

Alternate Identifier

1112606575

Library Record

5192211

OCLC Number

1112606575

Additional Groups

  • Computer Science and Engineering

Program Name

  • Computer Science and Engineering

Usage metrics

    Masters Theses

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC