University of Notre Dame
Browse
- No file added yet -

Scalable Learning With Thread-Level Parallelism

Download (216.09 kB)
thesis
posted on 2007-04-19, 00:00 authored by Karsten J.K. Steinhaeuser
A significant increase in the ability to collect and store diverse information over the past decade has led to an outright data explosion, providing larger and richer datasets than ever before. This proliferation in dataset sizes is accompanied by the quandary of successfully mining this data to discover patterns of interest. However, extreme dataset sizes place unprecedented demands on high-performance computing infrastructures, and a gap has developed between the available real-world datasets and our ability to process them. Dataset sizes are quickly approaching Tera and Petabytes. This rate of increase also challenges the subsampling paradigm, as even a subsample of data runs into Gigabytes. It is our goal to exploit recent advances in multi-threaded processor technology for scalable data mining. With this work, we explore one such architecture -- the Cray MTA-2. We conjecture that the architectural design is well suited for the application of machine learning to massive datasets. To that end, we present a thorough complexity analysis and experimental evaluation of five different popular learning algorithms. We use a diverse body of datasets with sizes varying in both the dimensions (instances and attributes). Our results lead to an analysis of whether the architectural design of the Cray MTA-2 is an appropriate platform for massively parallel, highly scalable learning algorithm implementations.

History

Date Modified

2017-06-05

Research Director(s)

Nitesh V. Chawla

Committee Members

Jay B. Brockman Peter M. Kogge

Degree

  • Master of Science in Computer Science and Engineering

Degree Level

  • Master's Thesis

Language

  • English

Alternate Identifier

etd-04192007-125153

Publisher

University of Notre Dame

Program Name

  • Computer Science and Engineering

Usage metrics

    Masters Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC