University of Notre Dame
Browse

File(s) under permanent embargo

Creation of Breast Cancer Subtypes: A Consensus-Based Network Approach

thesis
posted on 2020-04-04, 00:00 authored by Christina Horr

Breast cancer is a heterogeneous disease composed of multiple subtypes, each with its own distinct biological characteristics, therapy, and clinical outcomes. There are five generally well-known subtypes, also known as “intrinsic” subtypes: luminal A (Lum A), luminal B (Lum B), HER2-enriched, basal-like, and normal-like. These subtypes, also now referred to as PAM50, were introduced in 2001 by unsupervised clustering on whole-genome expression data. Recently, some studies have suggested that these PAM50 subtypes are dependent upon normalization, gene-centering techniques, or other transformations of gene expression data, resulting in uncertainty in the subtype classifications of a large percentage of patients. Even with this uncertainty, there are no other subtyping systems or studies that currently challenge PAM50’s use. Hence, there exists a need to address these classification inconsistencies and uncertainties in these subtypes.

An existing subtyping method is applied to breast cancer gene expression data in which multiple different clustering approaches on a common set of samples are coalesced together, formed into a network, and ultimately results in consensus-based classifications of samples. The advantage of this consensus-based subtyping method is that it does not produce subtypes based on a single unsupervised clustering method like PAM50, but rather produces subtypes that are based on multiple clustering strategies that are integrated together. In doing so, this subtyping method will not be biased towards one unsupervised clustering algorithm, as in PAM50.

The consensus-based subtyping method is applied to two different datasets: one containing both ER+ and ER− samples, and one containing only ER− samples. A single-sample classifier for the subtype system is applied to expression data from different technologies, resulting in clusters that have the same biological features across numerous cohorts. This suggests that these clusters are biologically driven as their characteristics are conserved in the clustering, no matter which dataset is being inspected. Results from the clustering of this subtyping method identify clusters that were not included in the PAM50 clusters. One cluster, C5, from the dataset containing both ER+ and ER− samples, exhibits characteristics associated with epithelial mesenchymal transition (EMT), along with increased abundance of endothelial cells and fibroblasts. Clustering from the data that contains only ER− samples identified another novel cluster, C3, which is associated with immune-like characteristics.

These two novel subtypes comprise of characteristics that have not been previously identified in past breast cancer subtype studies. These subtypes can potentially have an important effect on the future of breast cancer, as well as promoting different types of treatments for patients whose tumors belong to these new subtypes. To gain deeper insights of these potentially novel clusters, more analyses focusing on these two subtypes will be needed.

History

Date Modified

2020-05-13

Defense Date

2020-03-19

Research Director(s)

Steven A. Buechler

Committee Members

Fan Liu Jun Li

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Alternate Identifier

1154015048

Library Record

5501936

OCLC Number

1154015048

Program Name

  • Applied and Computational Mathematics and Statistics

Usage metrics

    Dissertations

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC