ckp_util - Checkpointing Utilities

The CKP_UTIL package provides an application with the ability to save and restore its internal state. Long-running applications can checkpoint intermediate states; should it crash, the application can start up again and restore the most recent state and continue from there.

The checkpointed states are saved in a checkpoint file, which is physically two files: the checkpoint file itself and an index file. The checkpoint "file" is created as follows:

    #include  "ckp_util.h"			-- Checkpoint utilities.
    CheckpointFile  file ;
    ckpOpen ("pathname", NULL, &file) ;

Named regions of memory containing an application's internal state must then be registered as checkpoints associated with the file:

    Checkpoint  checkpoint ;
    struct {
        ... whatever ...
    }  internalState ;
    ckpRegister (file, "name", &internalState, sizeof internalState, &checkpoint) ;

The information stored in the internalState structure can then be saved, at any time, to the checkpoint file:

    ckpSave (checkpoint) ;

If an application has multiple memory regions registered as checkpoints, the current contents of all of them can be saved with a single call:

    ckpSaveAll (file) ;

When a memory region is checkpointed, the contents of the region are appended to the checkpoint file, thereby creating a new version of that checkpoint's state. The most recent version or earlier versions of a checkpoint may be retrieved from the checkpoint file and restored to the memory region:

    ckpRestore (checkpoint, version) ;

(Since this may be a different run of the application, the address of the memory region being restored may be different than the address of the saved memory region; hence, you should be wary of storing pointers in checkpointed regions.) version can be an absolute version number (1..N) or a relative version number (0 for the most recent, -1 for the next most recent, ..., -N+1). All of an application's registered checkpoints may be restored at one time:

    ckpRestoreAll (file, version) ;

When an application is done, it should close the checkpoint file(s):

    ckpClose (file) ;

This automatically unregisters the file's registered checkpoints.


The writing of this package was inspired by the checkpointing capabilities found in AT&T's fault-tolerant library, described in "A Software Fault Tolerance Platform" by Yennun Huang and Chandra Kintala, in Practical Reusable Unix Software, edited by Balachander Krishnamurthy. I haven't read the book or that chapter; my knowledge of the checkpoint library was gathered from the man(1) page for the library. My package is simpler (and more understandable, I hope) than theirs, plus mine comes with the source code!

Public Procedures

ckpClose() - closes a checkpoint file.
ckpLocate() - locates a checkpoint by name in a file's list of checkpoints.
ckpOpen() - opens a checkpoint file.
ckpRegister() - registers a named region of memory as a checkpoint.
ckpRestore() - restores a specified checkpoint from a checkpoint file.
ckpRestoreAll() - restores all of the registered checkpoints from the file.
ckpSave() - saves a specified checkpoint to a checkpoint file.
ckpSaveAll() - saves all of the registered checkpoints to the file.
ckpUnregister() - removes the registration of a checkpoint.

Source Files


Alex Measday  /  E-mail