Browsing by Author "Hilton, Andrew D"
Results Per Page
Sort Options
Item Open Access Coordinating Software and Hardware for Performance Under Power Constraints(2019) Huang, ZiqiangFor more than 50 years since its birth in 1965, Moore's Law has been a self-fulfilling prophecy that drives computing forward. However, as Dennard scaling ends, chip power density presents a challenge that becomes increasingly severe with every process generation. Consequently, a growing subset of transistors on a chip will need to be powered off in order to operate under a sustainable thermal envelope, a design strategy commonly referred as ``dark silicon''.
Although dark silicon poses a major challenge for consistently delivering higher performance, it also inspires researchers to rethink how a chip should be designed and managed under a power and thermal constrained environment. The historical way of extracting performance, whether single-thread or multi-thread, by throwing complicated and power-hungry hardwares is no longer applicable. Instead, we need to rely more on software, but the hardware needs to provide new mechanisms. In this thesis, we present three pieces of work on software and hardware codesign to demonstrate how coordinating softwares like compilers and runtimes with underlying hardware support can help boosting performance in a power-efficient way.
First, out-of-order (OoO) processors achieves higher performance than the in-order (IO) ones by aggressively scheduling instructions out of program order during execution. However, dynamic scheduling requires sophisticated control and numerous bookkeeping structures---e.g., reorder buffer, load-store queue, register alias table---that increase complexity, area, and most importantly power. Observing that a compiler produces better static schedules when the instruction set defines simple operation, we propose an ISA extension that decouples the data access and register write operations in a load instruction. We show that with modest system and hardware support, we can improve compilers' instruction scheduling by hoisting a decoupled load's data access above may-alias stores and branches. We find that decoupled loads improve performance with geometric mean speedup of 8.4% for a wide range of applications, bringing a step closer to OoO performance on IO design.
Second, sprinting is a class of computational mechanisms that provides a short but significant performance boost while temporarily exceeding the thermal design point. Using phase change material to buffer heat, sprinting is a promising way to deliver high performance in future chip designs that are likely to be power and thermal constrained. However, because sprints cannot be sustained, the system needs a mechanism to decide when to start and stop a sprint. We propose UTAR, a software runtime framework that manages sprints by dynamically predicting utility and modeling thermal headroom. Moreover, we propose a new sprint mechanism for caches, increasing capacity briefly for enhanced performance. For a system that extends last-level cache capacity from 2MB to 4MB per core and can absorb 10J of heat with phase change material, UTAR-guided cache sprinting improves performance by 17% on average and by up to 40% over a non-sprinting system. These performance outcomes, within 95% of an oracular policy, are possible because UTAR accurately predicts phase behavior and sprint utility.
Finally, applications often exhibit phase behaviors that demands for different types of system resources. As a result, management frameworks need to coordinate between different sprinting mechanisms to realize the full performance potential. we propose UTAR+, an extended version of UTAR that not only determines when to sprint, but also the type of resource as well as the sprinting intensity. Building upon UTAR's phase predictor and utility and thermal-aware policy, UTAR+ quickly identifies the most profitable sprinting option to maximize performance/watt. For a system that offers multiple sprinting options, UTAR+-guided multi-resource sprinting improves performance by 22% on average and by up to 83% over a non-sprinting system, outperforming UTAR+guided single-resource sprinting for a variety of applications.
Item Open Access Design Strategies for Efficient and Secure Memory(2019) Lehman, Tamara SilbergleitRecent computing trends force users to relinquish physical control to unknown parties, making the system vulnerable to physical attacks. Software alone is not well equipped to protect against physical attacks. Instead software and hardware have to enforce security in collaboration to defend against physical attacks. Many secure processor implementations have surfaced over the last two decades (i.e. Intel SGX, ARM Trustzone) but inefficiencies are hindering their adoption.
Secure processors use secure memory to detect and guard against physical attacks. Secure memory assumes that everything within the chip boundary is trusted and provides confidentiality and integrity verification for data in memory. Both of these features, confidentiality and integrity, require large metadata structures which are
stored in memory. When a system equipped with secure memory misses at the last-level-cache (LLC), the memory controller has to issue additional memory requests to fetch the corresponding metadata from memory. These additional memory requests increase delay and energy. The main goal of this dissertation is to reduce overheads of secure memory in two dimensions: delay and energy.
First, to reduce the delay overhead we propose the first safe speculative integrity verification mechanism, PoisonIvy, that effectively hides the integrity verification latency while maintaining security guarantees. Secure memory has high delay overheads due to the long integrity verification latency. Speculation allows the system to return decrypted data back to the processor before the integrity verification completes, effectively removing the integrity verification latency from the critical path of a memory access. However, speculation without any other mechanism to safeguard security is unsafe. PoisonIvy safeguards security guarantees by preventing any effect of unverified data from leaving the trusted boundary. PoisonIvy is able to realize all the benefits of speculative integrity verification while maintaining the same security guarantees as the non-speculative system.
Speculation is effective in reducing delay overhead but it has no effect on reducing the number of additional memory accesses, which cause large energy overhead. Secure memory metadata has unique memory access patterns that are not compatible with traditional cache designs. In the second part of this work, we provide the first in-depth study of metadata access patterns, MAPS, to help guide architects design more efficient cache architectures customized for secure memory metadata. Based on the unique characteristics of secure memory metadata observed in the in-depth analysis, in the third part of this work we explore the design space of efficient
cache designs. We describe one possible design, Metadata Cache eXtension (MCX), which exploits the bimodal reuse distance distribution of metadata blocks to improve the cache efficiency thereby reducing the number of additional memory accesses. We
also explore an LLC eviction policy suitable to handle multiple types of blocks to improve the efficiency of caching metadata blocks on-chip further.