Scalable Learning With Thread-Level Parallelism

Steinhaeuser, Karsten J.K.

doi:10.7274/dn39x061h09

Scalable Learning With Thread-Level Parallelism

thesis

posted on 2007-04-19, 00:00 authored by Karsten J.K. Steinhaeuser

A significant increase in the ability to collect and store diverse information over the past decade has led to an outright data explosion, providing larger and richer datasets than ever before. This proliferation in dataset sizes is accompanied by the quandary of successfully mining this data to discover patterns of interest. However, extreme dataset sizes place unprecedented demands on high-performance computing infrastructures, and a gap has developed between the available real-world datasets and our ability to process them. Dataset sizes are quickly approaching Tera and Petabytes. This rate of increase also challenges the subsampling paradigm, as even a subsample of data runs into Gigabytes. It is our goal to exploit recent advances in multi-threaded processor technology for scalable data mining. With this work, we explore one such architecture -- the Cray MTA-2. We conjecture that the architectural design is well suited for the application of machine learning to massive datasets. To that end, we present a thorough complexity analysis and experimental evaluation of five different popular learning algorithms. We use a diverse body of datasets with sizes varying in both the dimensions (instances and attributes). Our results lead to an analysis of whether the architectural design of the Cray MTA-2 is an appropriate platform for massively parallel, highly scalable learning algorithm implementations.

History

Date Modified

2017-06-05

Research Director(s)

Nitesh V. Chawla

Committee Members

Jay B. Brockman Peter M. Kogge

Degree

Master of Science in Computer Science and Engineering

Degree Level

Master's Thesis

Language

English

Alternate Identifier

etd-04192007-125153

Publisher

University of Notre Dame

Additional Groups

Computer Science and Engineering

Program Name

Computer Science and Engineering

Usage metrics

Keywords

scalable machine learning implementations high-dimensional data high-performance data mining parallel computing

Scalable Learning With Thread-Level Parallelism

History

Date Modified

Research Director(s)

Committee Members

Degree

Degree Level

Language

Alternate Identifier

Publisher

Additional Groups

Program Name

Usage metrics

Categories

Keywords

Licence

Exports