An Integrated Power, Area, and Timing Modeling Framework for the Design of Multithreaded and Multi/Manycore Architectures

Doctoral Dissertation

Abstract

Multithreaded and multi/manycore processors have already become an important new research direction. These processors have demonstrated great performance and efficiency advantages. This dissertation presents McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. McPAT includes models for the components of a complete chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, and integrated memory controllers. McPAT models timing, area, and dynamic, short-circuit, and leakage power for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP).

This dissertation examines several new architecture ideas. We study the scaling trends of a multithreaded chip multiprocessor across technology generations from 90nm to 22nm. We also explore the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

This dissertation also proposes a Lightweight Chip Multi-Threaded (LCMT) architecture targeting parallel irregular and dynamic applications. The LCMT is implemented a by extending techniques previously used in supercomputing framework to mainstream general purpose processors. The LCMT architecture is implemented atop a mainstream architecture with minimum extra hardware and leverage existing legacy software environments. We evaluate the proposed LCMT architecture using McPAT and a performance simulator. Comparisons between the proposed LCMT architecture with a Niagara-like baseline architecture show that LCMT achieves up to 1.74X better performance per Watt when running irregular and dynamic benchmarks

Attributes

Attribute NameValues
URN
  • etd-04142010-233629

Author Sheng Li
Advisor Dr. Norm Jouppi
Contributor Dr. Greg Snider, Committee Member
Contributor Dr. Jay Brockman, Committee Member
Contributor Dr. Jay Brockman, Committee Member
Contributor Dr. Peter Kogge, Committee Member
Contributor Dr. Norm Jouppi, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Electrical Engineering
Degree Name Doctor of Philosophy
Defense Date
  • 2010-03-30

Submission Date 2010-04-14
Country
  • United States of America

Subject
  • modeling

  • power

  • multicore processor

  • area

  • timing

Publisher
  • University of Notre Dame

Language
  • English

Record Visibility Public
Content License
  • All rights reserved

Departments and Units

Digital Object Identifier

doi:10.7274/02870v8510q

This DOI is the best way to cite this doctoral dissertation.

Files

Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.