Abstract
An RRAM-based computing system (RCS) provides an energy efficient hardware implementation
of vector-matrix multiplication for machine-learning hardware. However, it is vulnerable
to faults due to the immature RRAM fabrication process. We propose an efficient fault
tolerance method for RCS; the proposed method, referred to as extended-ABFT (X-ABFT),
is inspired by algorithm-based fault tolerance (ABFT). We utilize row checksums and
test-input vectors to extract signatures for fault detection and error correction.
We present a solution to alleviate the overflow problem caused by the limited number
of voltage levels for the test-input signals. Simulation results show that for a Hopfield
classifier with faults in 5% of its RRAM cells, X-ABFT allows us to achieve nearly
the same classification accuracy as in the fault-free case.
Material is made available in this collection at the direction of authors according
to their understanding of their rights in that material. You may download and use
these materials in any manner not prohibited by copyright or other applicable law.
Rights for Collection: Research and Writings
Works are deposited here by their authors, and
represent their research and opinions, not that of Duke University. Some materials
and descriptions may include offensive content.
More info