Deep Neural Networks (DNNs) have shown excellent results in various perception tasks, but their deployment on edge devices with limited computation resources and power budgets is challenging. Non-volatile Computing-in-Memory (nvCiM) DNN accelerators, employing emerging non-volatile memory (NVM) devices, offer an efficient solution for edge applications by minimizing data movement, enhancing energy efficiency, and improving memory density. However, these accelerators are not optimized due to energy and latency overhead from analog-digital conversion and reduced DNN performance from device variations. To overcome these issues, we propose to utilize software-hardware (SW/HW) co-design, which enables cross-layer multi-objective optimization to leverage nvCiM accelerators' potential while mitigating their inherent challenges.
This dissertation covers optimizations from three levels: (1) Novel SW/HW co-design techniques tailored for nvCiM accelerators. We demonstrate the effectiveness of reinforcement learning and differentiable neural architecture search in SW/HW co-design. (2) Device variation and quantization-aware training methods for more robust DNN models. We introduce methods to identify worst-case scenarios under device variations, and a novel noise-aware training technique to improve worst-case DNN performance. (3) Novel device programming methods to mitigate device variation effects during the programming phase. We propose methods to identify important DNN weights requiring precise RRAM device programming, expand this to other devices, and validate it on fabricated nvCiM platforms.