Article Preview
Top1. Introduction
For the fault tolerance of mobile computing, checkpointing and rollback recovery are well-known backward error recovery techniques to minimize loss of computation in the presence of process faults (Kuang et al., 2014; Meroufel et al., 2014). Basically, the transparent fault tolerant schemes that do not require user interaction can be classified into two categories: checkpoint-based and log-based rollback recovery scheme (Islam et al., 2014; Mendizabal et al., 2014; Awasthi et al., 2014). In the log-based rollback recovery scheme, each process typically records both the content and delivering causal relations of all the messages it has delivered into a location (called a message log) that will survive the failure of the process (Chen et al., 2005). Each of the saved state of the process is called a checkpoint, to reduce the number of event logs to be replayed during the recovery phase (Elnozahy et al., 2002). Upon a process failure, there is a rollback recovery mechanism which brings the failure process back to normal operation, through replaying message logs, starting with the reloaded checkpoint (Elnozahy et al., 2002). Commonly, the log-based rollback recovery schemes require that once the set of the failure process has recovered, the related states have to be consistent with the states of the other failure-free processes (Chen et al., 2005; Elnozahy et al., 2002). This consistency requirement is usually expressed in terms of orphan process, whose state is inconsistent with the recovered state of the other process (Alvisi et al., 1998).
For the traditional wired distributed computing system, there are already two orphan-free consistent conditions proposed for the consistent recovery, No-Orphans Consistency Condition (NOCC) (Alvisi et al., 1998) and Orphan-free Consistency Condition (OCC) (Xu et al., 2013). Due to the topology differences between the traditional distributed computing system and the mobile computing system, both NOCC and OCC are not suitable for the mobile computing system as the related events of the mobile computing stations are not considered. PCRD is described in the form of the state interval in general for mobile computing (Park ekt al., 2002). However, it may still lead to the orphan inconsistent recovery, when the rollback propagation of the failure-free process is involved in the recovery, since the definition only specifies the lost state interval of the failure process. Furthermore, PCRD does not specify the particular message log requirement for an orphan-free consistent recovery process (Park et al., 2002).
Many new characteristics are introduced in mobile computing, such as mobility, disconnections, finite power source, vulnerable to physical damage, lack of stable storage (Park et al., 2003; Gupta et al., 2008). Therefore, the wireless network connection is more fragile and mobile host is much less reliable than the traditional wired distributed computing. Mobile hosts may disconnect from the rest of the network due to doze mode, abrupt power off or permanents damage. Therefore, it is more desirable for mobile computing to be equipped with an appropriate rollback recovery scheme to minimize the loss of computation due to the process fault. Research on rollback recovery fault tolerant scheme for mobile computing systems has received tremendous interests in recent years. Various schemes have been presented to accommodate the characteristics of mobile computing (Agbaria et al., 2004; Brzezinsk et al., 2006; Li et al., 2005).