Decoupled store completion/silent deterministic replay: Enabling scalable data memory for CPR/CFP processors

dc.contributor.author

Hilton, A

dc.contributor.author

Roth, A

dc.date.accessioned

2016-02-24T19:34:27Z

dc.date.issued

2009-11-30

dc.description.abstract

CPR/CFP (Checkpoint Processing and Recovery/Continual Flow Pipeline) support an adaptive instruction window that scales to tolerate last-level cache misses. CPR/CFP scale the register file by aggressively reclaiming the destination registers of many in-flight instructions. However, an analogous mechanism does not exist for stores and loads. As the window expands, CPR/CFP processors must track all in-flight stores and loads to support forwarding and detect memory ordering violations. The previously-described SVW (Store Vulnerability Window) and SQIP (Store Queue Index Prediction) schemes provide scalable, non-associative load and store queues, respectively. However, they don't work smoothly in a CPR/CFP context. SVW/SQIP rely on the ability to dynamically stall some loads until a specific older store writes to the cache. Enforcing this serialization in CPR/CFP is expensive if the load and store are in the same checkpoint. We introduce two complementary procedures that implement this serialization efficiently. Decoupled Store Completion (DSC) allows stores to write to the cache before the enclosing checkpoint completes execution. Silent Deterministic Replay (SDR) supports mis-speculation recovery in the presence of DSC by replaying loads older than completed stores using values from the load queue. The combination of DSC and SDR enables an SVW/SQIP based CPR/CFP memory system that outperforms previous designs while occupying less area. Copyright 2009 ACM.

dc.identifier.isbn

9781605585260

dc.identifier.issn

1063-6897

dc.identifier.uri

https://hdl.handle.net/10161/11637

dc.publisher

ACM Press

dc.relation.ispartof

Proceedings - International Symposium on Computer Architecture

dc.relation.isversionof

10.1145/1555754.1555786

dc.title

Decoupled store completion/silent deterministic replay: Enabling scalable data memory for CPR/CFP processors

dc.type

Conference

pubs.begin-page

245

pubs.end-page

254

pubs.organisational-group

Computer Science

pubs.organisational-group

Duke

pubs.organisational-group

Electrical and Computer Engineering

pubs.organisational-group

Pratt School of Engineering

pubs.organisational-group

Trinity College of Arts & Sciences

pubs.publication-status

Published

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
dsc-isca09.pdf
Size:
158.73 KB
Format:
Adobe Portable Document Format
Description:
Accepted version