Micro Rollback in Systems with Multiple Modules

CELSY PHILLIPS
Fault Tolerance in VLSI Circuits
3 min readJun 7, 2021

--

In a system that consists of several modules, a rollback signal initiated by a module may affect other modules connected to it. Following a rollback of one of the modules, its state may be inconsistent with the state of other modules. If at time T module M1 is rolled back t time units, its new state is consistent with the state of another module M2 if, and only if, one of the following conditions is met:

(a) there were no interactions between M1 and M2 in the interval [T-t, T], or

(b) M2 is rolled back to its state prior to any interactions with M1 during the interval [T-t, T].

In case (a) there is no need to roll back module M2. Since M2 has not interacted with M1 since time T-t, if the states of the two modules were consistent before M1 was rolled back, they remain consistent following the rollback without requiring further action by M2 . In case (b) both M1 and M2 must be rolled back. To determine which case applies as well as the ‘‘distance’’ that M2 may have to roll back, interactions between modules must be monitored. An interaction or a transaction is any transfer of information between two modules, such as data transfer, control signals, etc.

In a synchronous system all inter-module interactions are synchronous with a common clock. If one module, M1, rolls back C cycles, the simplest way to maintain consistency in the system is to roll back all other modules C cycles. This implies that some modules unnecessarily roll back even if they have not interacted with module M1 in the past C cycles. In some cases, performance can be improved if modules are rolled back selectively depending on their recent interaction with M1.

Many systems consist of modules that operate with different clocks and interact asynchronously. For example, the Motorola 68020 processor can operate at 25MHz together with a Motorola 68881 floating-point unit operating with a 16.7MHz clock. Since there is no common clock, rollback in an asynchronous system cannot be coordinated based on the number of cycles to roll back. If two modules, M1 and M2, each roll back C cycles of their internal clock, their states following rollbacks may be inconsistent. If module M1 rolls back C1 cycles internally and during the last C1 cycles it has participated in T transactions with another module M2, then module M2 must roll back to the state it had prior to the last T transactions with M1. For M2, T transactions may correspond to a different number of internal cycles, C2.

In order to coordinate rollback that will result in consistent states, a module that initiates rollback of a specified number of internal cycles must be able to translate that number to the number of transactions that have occurred during that time with other modules and send this number of transactions to the other modules. In order to participate in a rollback initiated by other modules, each module must be able to receive the number of transactions to rollback and translate them to internal cycles. For this purpose, a circuit called as transactions-to-cycles/cycles-to-transactions transducer is implemented for performing the mapping between internal cycles and transactions. (see figure below)

Synchronization Through Transducers

References

  1. Tremblay, Marc & Tamir, Yuval. (2001). Fault-Tolerance for High-Performance Multi-Module VLSI Systems Using Micro Rollback.

--

--