University of Notre Dame
Browse

Hardware and Algorithm Co-Exploration for Efficient On-Device Personalization of Large Language Models

Download (2.84 MB)
thesis
posted on 2025-05-07, 19:10 authored by Ruiyang Qin
The deployment of large AI models on edge devices has emerged as a compelling research area, promising enhanced personalization, privacy, and offline functionality. This dissertation explores the transition from server-based large AI to edge devices, examining the associated constraints and challenges while proposing innovative solutions leveraging emerging technologies. The overarching goal is to improve the efficiency and viability of large AI on edge, positioning it as a cornerstone for future intelligent, personalized devices. The research begins by delineating the concept and benefits of deploying large AI models on edge devices. This approach has the potential to revolutionize intelligent device operation by enabling personalized, offline functionalities while preserving user privacy and fairness. However, edge deployment faces significant constraints, including limited memory, computational power, learning efficiency, and energy consumption. These limitations necessitate a strategic algorithm and hardware co-design approach tailored to edge environments. The research is motivated by the potential to harness edge advantages in creating a new era of efficient, intelligent, and personalized devices. Addressing the critical challenge of accelerating large AI on edge, the second part of the research proposes methods to reduce learning data volume while maintaining high-quality, in-situ data for efficient learning. High-speed matrix multiplication search via Compute-in-Memory (CiM) technology is explored to accelerate large language model (LLM)-user interaction. Additionally, performance DNN-backed gradient estimators on CiM are investigated to enhance LLM/LVLM learning processes. These approaches aim to optimize training and inference processes within the resource constraints of edge devices. In addition, this dissertation examines the integration of emerging technologies to further accelerate large AI on edge. Recognizing the potential limitations of on-device acceleration alone, this research explores complementary solutions such as data selection and synthesis for on-device LLM, retrieval-augmented generation, and prompt tuning for LLM/LVLM on CiM. A comprehensive toolbox for large AI model deployment on edge devices is proposed, incorporating both AI and emerging technologies. This holistic approach seeks to bridge the gap between large AI and edge computing, facilitating seamless deployment and usage of advanced AI models on edge devices. In conclusion, this dissertation provides a comprehensive roadmap for deploying and accelerating large AI models on edge devices. By addressing inherent challenges and proposing innovative solutions through emerging technologies, particularly CiM, this research aims to enhance the efficiency and performance of large AI on edge. The findings pave the way for intelligent, personalized devices that operate effectively within edge environment constraints, setting the stage for future advancements in AI and edge computing.

History

Date Created

2025-04-09

Date Modified

2025-05-07

Defense Date

2025-03-24

CIP Code

  • 14.0901

Research Director(s)

Yiyu Shi

Committee Members

X. Sharon Hu Kai Ni Siddharth Joshi Jinjun Xiong

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Library Record

006700747

OCLC Number

1518697961

Publisher

University of Notre Dame

Additional Groups

  • Computer Science and Engineering

Program Name

  • Computer Science and Engineering

Usage metrics

    Dissertations

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC