University of Notre Dame
Browse
LagunaAFB042022D.pdf (11.47 MB)

Accelerating Memory Intensive Algorithms and Applications using In-Memory Computing

Download (11.47 MB)
thesis
posted on 2022-04-07, 00:00 authored by Ann Franchesca Laguna

Data-intensive applications do not fully utilize the compute capabilities of Von Neumann architectures because of the memory bandwidth bottleneck. These memory-bandwidth limited applications can be accelerated by minimizing the data movement between the memory and the compute units through in-memory computing (IMC). Using IMC, this work accelerated four different types of applications and algorithms.

The first part focuses on accelerating the attention mechanism of few-shot learning algorithms such as Memory Augmented Neural Networks and Prototypical Networks by utilizing different distance metrics and in-memory computing circuits. \linf\ distance is implemented using content-addressable memories (CAMs) via range encoding. \linf\ + \lone\ distance is implemented using a combination of CAMs and general-purpose computing-in-memory. CAMs are also used to implement locality-sensitive hashing. Multi-bit CAMs, with their corresponding distance metric, are also utilized.

Transformer networks have outperformed other deep neural networks (DNN) in various sequential tasks. However, memory and compute bottlenecks prevent transformer networks from scaling to long sequences due to their high execution time and energy consumption. We propose an in-memory transformer network accelerator (iMTransformer) that uses a combination of crossbars and CAMs to accelerate transformer networks.

Deep random forests (DRF) have comparable classification accuracy, easier interpretability, and lower memory and computational requirements than DNN. However, the development of efficient hardware to accelerate DRF is lagging behind its DNN counterparts. The key to hardware acceleration of DRF lies in efficiently realizing the branch-split operation at decision nodes when traversing a decision tree. We propose implementing DRF through simple associative searches using ferroelectric analog CAMs (ACAMs).

Genome analysis is becoming more important in forensic science, medicine, and history. Sequencing technologies such as High Throughput Sequencing and Third Generation Sequencing have greatly accelerated genome sequencing. However, genome read mapping remains significantly slower than sequencing. This research presents a genome read mapping accelerator that uses TCAMs to execute the Fast Seed and Vote algorithm that can map both short and long reads.

This research demonstrates a hardware-software co-design of data-intensive algorithms and applications, particularly few-shot learning, transformer networks, deep random forests, and DNA read mapping. Each accelerator is evaluated in terms of accuracy, latency, and energy improvements.

History

Date Modified

2022-05-04

Defense Date

2022-03-30

CIP Code

  • 40.0501

Research Director(s)

X. Sharon Hu

Committee Members

Meng Jiang Siddharth Joshi Sri Parameswaran Xunzhao Yin

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Alternate Identifier

1313829071

Library Record

6208960

OCLC Number

1313829071

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC