|
|
|
ckp_util
- Checkpointing UtilitiesThe CKP_UTIL package provides an application with the ability to save and restore its internal state. Long-running applications can checkpoint intermediate states; should it crash, the application can start up again and restore the most recent state and continue from there.
The checkpointed states are saved in a checkpoint file, which is physically two files: the checkpoint file itself and an index file. The checkpoint "file" is created as follows:
#include "ckp_util.h" -- Checkpoint utilities. CheckpointFile file ; ... ckpOpen ("pathname", NULL, &file) ;
Named regions of memory containing an application's internal state must then be registered as checkpoints associated with the file:
Checkpoint checkpoint ; struct { ... whatever ... } internalState ; ... ckpRegister (file, "name", &internalState, sizeof internalState, &checkpoint) ;
The information stored in the internalState
structure can then
be saved, at any time, to the checkpoint file:
ckpSave (checkpoint) ;
If an application has multiple memory regions registered as checkpoints, the current contents of all of them can be saved with a single call:
ckpSaveAll (file) ;
When a memory region is checkpointed, the contents of the region are appended to the checkpoint file, thereby creating a new version of that checkpoint's state. The most recent version or earlier versions of a checkpoint may be retrieved from the checkpoint file and restored to the memory region:
ckpRestore (checkpoint, version) ;
(Since this may be a different run of the application, the address of the memory region being restored may be different than the address of the saved memory region; hence, you should be wary of storing pointers in checkpointed regions.) version can be an absolute version number (1..N) or a relative version number (0 for the most recent, -1 for the next most recent, ..., -N+1). All of an application's registered checkpoints may be restored at one time:
ckpRestoreAll (file, version) ;
When an application is done, it should close the checkpoint file(s):
ckpClose (file) ;
This automatically unregisters the file's registered checkpoints.
The writing of this package was inspired by the checkpointing capabilities found in AT&T's fault-tolerant library, described in "A Software Fault Tolerance Platform" by Yennun Huang and Chandra Kintala, in Practical Reusable Unix Software, edited by Balachander Krishnamurthy. I haven't read the book or that chapter; my knowledge of the checkpoint library was gathered from the man(1) page for the library. My package is simpler (and more understandable, I hope) than theirs, plus mine comes with the source code!
ckpClose()
- closes a checkpoint file.
ckpLocate()
- locates a checkpoint by name in a file's list of checkpoints.
ckpOpen()
- opens a checkpoint file.
ckpRegister()
- registers a named region of memory as a checkpoint.
ckpRestore()
- restores a specified checkpoint from a checkpoint file.
ckpRestoreAll()
- restores all of the registered checkpoints from the file.
ckpSave()
- saves a specified checkpoint to a checkpoint file.
ckpSaveAll()
- saves all of the registered checkpoints to the file.
ckpUnregister()
- removes the registration of a checkpoint.
ckp_util.c
ckp_util.h