Please use this identifier to cite or link to this item: http://hdl.handle.net/2440/96726
Type: Thesis
Title: A study of low power and high performance cache hierarchy for multi-core processor.
Author: Tian, Geng
Issue Date: 2015
School/Discipline: School of Electrical and Electronic Engineering
Abstract: The increasing levels of transistor density have enabled integration of an increasing number of cores and cache resources on a single chip. However, power, as a first order design constraint may bring this trend to a dead end. Recently, the primary design objective has been shifted from pursuing faster speed to higher power-performance efficiency. This is also reflected by the fact that design preference has transitioned from fast super-scalar architecture to slower multi-core architecture. Tiled chip multiprocessors (CMPs) have shown unmatched advantages in recent years, and they are very likely to be the mainstream in the future. Meanwhile, increasing number of cores will exert higher pressure on the cache system. Expanding cache storage can ease the pressure but will incur higher static power consumption. More importantly, very large caches in future multi-core systems may not be fully utilised. Under-utilised caches consume static power for no productivity. Off-line profiling of applications to determine optimal cache size and configuration is not practical. This thesis describes dynamic cache reallocation techniques for tiled multi core architectures. We proposed the idea of Break Even number of Misses (BEM). BEM defines, for a given cache configuration and time interval, the maximum number of misses that can be tolerated without increasing the energy delay product. We use BEM as the upper bound to determine a set of thresholds that are used to periodically evaluate the utility of cache. Based on this scheme, we then propose a conservative increase-only resizing method to tune the cache size at tile-level granularity. The increasing only method can be further extended to a dynamic downsizing scheme. In simulations, our tuning scheme can reduce the static power significantly with a very minor degradation of IPC (Instruction Per Cycle). One thing that can be improved from our resizing scheme is the replacement policy. The estimation of cache utility is based on stack-distance which relies on the recency position of a real LRU (least recently used) replacement policy. As is commonly known, LRU is not easy to implement, especially in high associativity caches, which are becoming more commonly used. LRU also suffers from “cache thrashing”, “scan” and “inter-thread interference”. To solve these three problems, we further propose a novel replacement policy, the Effectiveness-Based Replacement policy (EBR) and a refinement, Dynamic EBR (D-EBR), which combines measures of recency and frequency to form a rank sequence inside each set and evict blocks with lowest rank. To evaluate our design, we simulated all 30 applications from SPEC CPU2006 for uni-core system and a set of combinations for 4-core systems, for different cache sizes. The results show that both of them can achieve higher performance with only half the hardware overhead of real LRU. With the help of EBR, we further extend our last level cache resize scheme. We discuss how to estimate equivalent utility of cache using EBR replacement policy rather than LRU, and introduce an EBR based resizing scheme. Since EBR replacement policy is hardware economical and cache thrashing protected, it is more suitable for the utility estimation. Finally, to shorten average cache access latency, we propose the idea of using a private replica region to store useful data replicas. Keeping them close to the requester can significantly reduce average access latency, however, it also reduces effective storage size causing higher cache miss rates. We leverage our EBR based cache utility estimation method to dynamically change the partition based on cache access patterns to achieve a near optimal result.
Advisor: Liebelt, Michael J.
Phillips, Braden Jace
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Electrical and Electronic Engineering, 2015
Keywords: CMPs; cache; low power; replacement policy
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
01front.pdf116.45 kBAdobe PDFView/Open
02whole.pdf4.27 MBAdobe PDFView/Open
PermissionsLibrary staff access only241.22 kBAdobe PDFView/Open
RestrictedLibrary staff access only4.31 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.