A Rich Metadata Filesystem for Scientific Data

Doctoral Dissertation

Abstract

As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides both large, robust, and scalable storage and efficient rich metadata queries for scientific applications. This dissertation presents the design and implementation of ROARS, focusing primarily on the challenge of maintaining data integrity and achieving data scalability. We evaluate the performance of ROARS on a storage cluster compared to the Hadoop distributed file system. We observe that ROARS has read and write performance that scales with the number of storage nodes. We show the ability of ROARS to function correctly through multiple system failures and reconfigurations. We prove that ROARS is reliable not only for daily data access but also for longtime data preservation. We also demonstrate how to integrate ROARS with existing distributed frameworks to drive large scale distributed scientific experiments. ROARS has been in production use for over three years as the primary data repository for a biometrics research lab at the University of Notre Dame.

Attributes

Attribute NameValues
URN
  • etd-05242012-151339

Author Hoang Bui
Advisor Douglas Thain
Contributor Scott Emrich, Committee Member
Contributor Brian Blake, Committee Member
Contributor Patrick Flynn, Committee Member
Contributor Douglas Thain, Committee Chair
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name PhD
Defense Date
  • 2012-05-24

Submission Date 2012-05-24
Country
  • United States of America

Subject
  • big data

  • workflow

  • scientific data

  • biometrics

  • distributed system

Publisher
  • University of Notre Dame

Language
  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.