Now showing items 1-6 of 6
Decoupled store completion/silent deterministic replay: Enabling scalable data memory for CPR/CFP processors
(Proceedings - International Symposium on Computer Architecture, 2009-11-30)
CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction window that scales to tolerate last-level cache misses. CPR/CFP scale the register file by aggressively reclaiming the ...
Icfp: tolerating all-level cache misses in in-order processors
(Proceedings - International Symposium on High-Performance Computer Architecture, 2009-04-24)
Growing concerns about power have revived interest in in-order pipelines. In-order pipelines sacrifice single-thread performance. Specifically, they do not allow execution to flow freely around data cache misses. As a result, ...
Ginger: Control independence using tag rewriting
(Proceedings - International Symposium on Computer Architecture, 2007-10-22)
The negative performance impact of branch mis-predictions can be reduced by exploiting control independence (CI). When a branch mis-predicts, the wrong-path instructions up to the point where control converges with the correct ...
Flexible register management using reference counting
(Proceedings - International Symposium on High-Performance Computer Architecture, 2012-05-03)
Conventional out-of-order processors that use a unified physical register file allocate and reclaim registers explicitly using a free list that operates as a circular queue. We describe and evaluate a more flexible register ...
CPROB: Checkpoint processing with opportunistic minimal recovery
(Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, 2009-11-23)
CPR (Checkpoint Processing and Recovery) is a physical register management scheme that supports a larger instruction window and higher average IPC than conventional ROB-style register management. It does so by restricting ...
BOLT: Energy-efficient out-of-order latency-tolerant execution
(Proceedings - International Symposium on High-Performance Computer Architecture, 2010-05-27)
LT (latency tolerant) execution is an attractive candidate technique for future out-of-order cores. LT defers the forward slices of LLC (last-level cache) misses to a slice buffer and re-executes them when the misses return. ...