University of Notre Dame
Browse

GPU-Accelerated Summarization and Reconstruction Techniques for Big Data Analysis and Visualization

Download (52.22 MB)
thesis
posted on 2020-04-22, 00:00 authored by Martin Imre

With the ever-growing amount of data we are capable of collecting nowadays, the need for methods to analyze and generate insight into the big data emerges. Data visualization offers a powerful tool to allow humans to better understand the data. However, the large-scale data generated through simulation or collected from real-world scenarios leads to an insurmountable amount to sift through for humans, even when the data are visualized in a concise format. To combat this, it is necessary to simplify the data using summarization techniques. While summarization allows an overview that is easier to digest for humans, it comes with the drawback of omitting parts of the data. To overcome this drawback, data reconstruction techniques allow for a level-of-detail analysis of the underlying data. They further make it possible to synthesize missing data. For both data summarization and reconstruction, it is important to tackle different kinds of data in their respective ways. In this dissertation, I describe several ways to summarize and reconstruct time-varying multivariate, vector field data, and graph data.

Time-varying multivariate volumetric data typically stem from scientific simulations that describe physical or chemical processes, usually in either two or three spatial di- mensions, a temporal dimension, and contain multiple variables. Volume visualization techniques are usually used to visualize and analyze them. In this dissertation, I focus on data analysis using isosurface rendering, a commonly used volume visualization technique.

While isosurface rendering allows detailed insight into time-varying multivariate data sets, the sheer complexity of them makes it often impossible to analyze and visualize all the data at once. A major challenge is the selection of interesting or important parts to bring to the attention of the analyst. Previous works have presented algorithms to select salient values, however, these algorithms do not scale well enough for big data analysis. In this dissertation, I present an acceleration and analysis framework to efficiently analyze and visualize complete large-scale time-varying multivariate data sets using isosurfaces. A different type of data produced by scientific simulation is vector field data. At the core of describing fluid dynamics, these data sets are composed of vector fields revealing the underlying development of the flow. When it comes to the simulation of unsteady flow, scientists often face the challenge of only being able to generate output at a coarse temporal granularity. This leads to the problem of reconstructing the data at missing time steps with high quality for postprocessing and analysis. In this dissertation, I study three different deep-learning approaches to reconstruct missing vector fields from a small set of stored ones.

Another way to navigate through data is by using a graph representation. Graph representations are commonly used to show the relationship among entities. A typical way to depict graph is a node-link diagram with entities being the nodes and their connections the links. Similar to isosurface rendering, graph visualization suffers from the data overloading issue. Further, computing a pleasing and information revealing layout for large graphs can take a long time. To combat these problems, I present a framework to efficiently summarize a graph into sparse levels, compute a layout, and then reconstruct this information for denser levels of the graph.

History

Date Modified

2020-05-29

Defense Date

2020-04-16

CIP Code

  • 40.0501

Research Director(s)

Chaoli Wang

Committee Members

Hanqi Guo Collin McMillan Ronald Metoyer

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Alternate Identifier

1155639838

Library Record

5503968

OCLC Number

1155639838

Additional Groups

  • Computer Science and Engineering

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC