The deployment of large AI models on edge devices has emerged as a compelling research area, promising enhanced personalization, privacy, and offline functionality. This dissertation explores the transition from server-based large AI to edge devices, examining the associated constraints and challenges while proposing innovative solutions leveraging emerging technologies. The overarching goal is to improve the efficiency and viability of large AI on edge, positioning it as a cornerstone for future intelligent, personalized devices.
The research begins by delineating the concept and benefits of deploying large AI models on edge devices. This approach has the potential to revolutionize intelligent device operation by enabling personalized, offline functionalities while preserving user privacy and fairness. However, edge deployment faces significant constraints, including limited memory, computational power, learning efficiency, and energy consumption. These limitations necessitate a strategic algorithm and hardware co-design approach tailored to edge environments. The research is motivated by the potential to harness edge advantages in creating a new era of efficient, intelligent, and personalized devices.
Addressing the critical challenge of accelerating large AI on edge, the second part of the research proposes methods to reduce learning data volume while maintaining high-quality, in-situ data for efficient learning. High-speed matrix multiplication search via Compute-in-Memory (CiM) technology is explored to accelerate large language model (LLM)-user interaction. Additionally, performance DNN-backed gradient estimators on CiM are investigated to enhance LLM/LVLM learning processes. These approaches aim to optimize training and inference processes within the resource constraints of edge devices.
In addition, this dissertation examines the integration of emerging technologies to further accelerate large AI on edge. Recognizing the potential limitations of on-device acceleration alone, this research explores complementary solutions such as data selection and synthesis for on-device LLM, retrieval-augmented generation, and prompt tuning for LLM/LVLM on CiM. A comprehensive toolbox for large AI model deployment on edge devices is proposed, incorporating both AI and emerging technologies. This holistic approach seeks to bridge the gap between large AI and edge computing, facilitating seamless deployment and usage of advanced AI models on edge devices.
In conclusion, this dissertation provides a comprehensive roadmap for deploying and accelerating large AI models on edge devices. By addressing inherent challenges and proposing innovative solutions through emerging technologies, particularly CiM, this research aims to enhance the efficiency and performance of large AI on edge. The findings pave the way for intelligent, personalized devices that operate effectively within edge environment constraints, setting the stage for future advancements in AI and edge computing.