Deep neural networks (DNN) can perform cognitive tasks such as speech recognition and object detection with high accuracy, but are limited by their large computational cost. For this reason, it has been proposed to use resistive crossbar arrays to minimize data movement and perform matrix-vector multiplications in the analog domain. One of the main challenges of these architectures is the limited resolution and nonlinearity of resistive memories available today. In this thesis, this limitation is addressed in two ways:
First, ferroelectrics are studied for multilevel memory devices in resistive crossbar arrays. In their polycrystalline form, these materials are composed of a multitude of grains with independent polarization states, allowing for dense, nonvolatile, multilevel memories compatible with standard semiconductor fabrication processes. However, modeling the dynamics of polycrystalline ferroelectrics is challenging due to the statistical variations in the composition of its grains. For this purpose, a model to extract the statistical properties of a ferroelectric film and a Monte Carlo simulation that can describe and predict its polarization dynamics and variability were developed. This model provides the tools to characterize and optimize ferroelectric materials, and to design and evaluate devices, circuits and architectures for deep learning and other applications.
Secondly, architecture improvements to train DNN models in resistive crossbar arrays are presented. An accurate scheme for parallel weight update in resistive crossbar arrays is proposed and evaluated. By using pulse width- and frequency-modulated signals, the value of resistive elements in a crossbar array can be updated in parallel with higher accuracy than that of existing techniques based on stochastic multiplication. Finally, the mapping of DNN models to resistive crossbar arrays is analyzed by decomposing a general vector-matrix multiplication into a multiplication with nonnegative weights performed in a crossbar array, followed by a limited set of addition and subtraction operations described by a connection matrix. Based on this analysis, an efficient mapping scheme is designed, which mitigates the effect of weight nonlinearity and limited resolution.