Design of a Data Repository for a Long-Running Physics Experimen

Master's Thesis

Abstract

Dataset sizes for scientific experiments are expanding at a prodigious rate. Even small-scale laboratories can produce terabytes of raw data each year. This data needs to be stored, but also needs to be analyzed in order to make it anything other than a waste of space. Furthermore, in areas like physics, scientists are frequently looking for interesting events or trends amongst a sea of boring data, making visualization and mass analysis very important.

One experiment that follows this pattern is the Gamma Ray Astrophysics experiment at Notre Dame. In this work I discuss the needs and constraints of data repositories for data-intensive scientific experiments in the context of developing such a system for GRAND. Challenges such as storing large datasets, interface design, fast data analysis, and large-scale data visualization are examined, and solutions are presented in the form of distributed storage and parallel computation.

Attributes

Attribute NameValues
URN
  • etd-12142010-161930

Author Michael Albrecht
Advisor Douglas Thain
Contributor Scott Emrich, Committee Member
Contributor Greg Madey, Committee Member
Contributor Douglas Thain, Committee Chair
Degree Level Master's Thesis
Degree Discipline Computer Science and Engineering
Degree Name MSCSE
Defense Date
  • 2010-09-24

Submission Date 2010-12-14
Country
  • United States of America

Subject
  • distributed computing

  • active storage

  • scientific computing

Publisher
  • University of Notre Dame

Language
  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.