Design of a Data Repository for a Long-Running Physics Experimen

Master's Thesis


Dataset sizes for scientific experiments are expanding at a prodigious rate. Even small-scale laboratories can produce terabytes of raw data each year. This data needs to be stored, but also needs to be analyzed in order to make it anything other than a waste of space. Furthermore, in areas like physics, scientists are frequently looking for interesting events or trends amongst a sea of boring data, making visualization and mass analysis very important.

One experiment that follows this pattern is the Gamma Ray Astrophysics experiment at Notre Dame. In this work I discuss the needs and constraints of data repositories for data-intensive scientific experiments in the context of developing such a system for GRAND. Challenges such as storing large datasets, interface design, fast data analysis, and large-scale data visualization are examined, and solutions are presented in the form of distributed storage and parallel computation.


Attribute NameValues
  • etd-12142010-161930

Author Michael Albrecht
Advisor Douglas Thain
Contributor Scott Emrich, Committee Member
Contributor Greg Madey, Committee Member
Contributor Douglas Thain, Committee Chair
Degree Level Master's Thesis
Degree Discipline Computer Science and Engineering
Degree Name MSCSE
Defense Date
  • 2010-09-24

Submission Date 2010-12-14
  • United States of America

  • distributed computing

  • active storage

  • scientific computing

  • University of Notre Dame

  • English

Record Visibility and Access Public
Content License
  • All rights reserved

Departments and Units


Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.