This thesis describes CACTI-D, a memory modeling modeling tool that supports both SRAM and DRAM circuits and technologies. With CACTI-D it is possible to project area, access time, cycle time, dynamic read and write energies per access, and standby leakage power of memories and caches for technology nodes between 90nm and 32nm. CACTI-D supports SRAM, logic process based DRAM (LP-DRAM) and commodity DRAM (COMM-DRAM) technologies. CACTI-D also supports the modeling of main memory DRAM chips. Thus with CACTI-D, modeling of the complete memory hierarchy with consistent models all the way from SRAM based L1 caches through main memory DRAMs on DIMMs becomes possible.
CACTI-D is based on the well-known memory and cache modeling tool CACTI. We borrow key circuit models from CACTI but revamp CACTI-D from the ground up with a complete rewrite of the source code. We incorporate a new technology foundation into CACTI-D with device data based on the ITRS roadmap. We incorporate device data for different ITRS device types such as high performance (HP), low standby power (LSTP) and low operating power (LOP). We also incorporate data for interconnect technology based on well-documented models and data from the literature.
We have validated CACTI-D by comparing its projections against real designs. For SRAM validation, we compare against two prominent 90nm and 65nm SRAM caches and for DRAM validation, we compare against a 78nm DDR3 DRAM chip. Taking into account the extremely generic nature of CACTI-D, there is good agreement between the projections produced by CACTI-D and the published data.
We illustrate the potential applicability of CACTI-D in the design and analysis of future memory hierarchies by carrying out a last level cache study for a multicore multithreaded architecture at the 32nm technology node. In this study we use CACTI-D to model all components of the memory hierarchy including L1, L2, last level SRAM, LP-DRAM or COMM-DRAM based L3 caches, and main memory DRAM chips. We carry out architectural simulation using benchmarks with large data sets and present results of their execution time, breakdown of power in the memory hierarchy, and system energy-delay product for the different system configurations. We find that COMM-DRAM technology is most attractive for stacked last level caches, with significantly lower energy-delay products.