Show simple item record

dc.contributor.advisor Sorin, Daniel J en_US
dc.contributor.author Meixner, Albert en_US
dc.date.accessioned 2008-05-14T16:28:57Z
dc.date.available 2008-05-14T16:28:57Z
dc.date.issued 2008-04-10 en_US
dc.identifier.uri http://hdl.handle.net/10161/599
dc.description Dissertation en_US
dc.description.abstract There is broad consensus among academic and industrial researchers in computer architecture that hardware faults, both transient and permanent, will become significantly more frequent as CMOS feature sizes continue to shrink. Circuit-level techniques alone are insufficient to overcome this problem, and therefore system designers have begun to add fault tolerance features to processor micro-architectures and memory systems. Many of the techniques used today were developed in a time when fault coverage was the primary optimization target; hardware, power, and performance costs were only secondary concerns. These priorities do not accurately reflect the needs of today's commodity systems, which are very sensitive to manufacturing and performance costs and can trade-off some amount of fault coverage to reduce these costs. In my dissertation work I have developed novel error detection techniques with significantly lower area and performance costs than those traditionally used in high availability designs. These savings were made possible by a guiding principle of verifying high-level system tasks rather than checking correct operation of specific low-level components. This high-level, end-to-end approach to error-detection has distinct advantages over checking low-level components in terms of applicability to a wide range of systems, coverage of complex component interactions, and implementation cost. The major challenge in developing end-to-end checkers is to find high-level tasks that are both relevant and verifiable at runtime. I approached this problem by decomposing system-level tasks into sub-tasks that are more easily verifiable and, when combined, are sufficient to ensure correctness of a high-level task. Such a decomposition is a step back from a full end-to-end design and requires additional assumptions about the underlying system, but I found the resulting cost and complexity benefits to outweigh the loss in flexibility that comes with them. I have applied the ideas of task decomposition and high-level checking to processor cores, memory systems, and the I/O system, in order to develop low-cost checkers for each of these subsystems. The checking mechanisms resulting from this work are highly effective in detecting errors and incur lower hardware and performance cost than mechanisms with comparable error coverage proposed in the past. en_US
dc.format.extent 1312976 bytes
dc.format.mimetype application/pdf
dc.language.iso en_US
dc.subject Computer Science en_US
dc.title Low-cost Methods for Error Detection in Multi-core Systems en_US
dc.type Dissertation en_US
dc.department Computer Science en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record