Generic Disk I/O Interface

March 14, 1994

After an informal discussion of the LZP disk interface, I mailed Stan a message proposing a generic I/O interface for LZP data streams. The applications interface I proposed has several strong points:

Applications are presented with a simple, common, read/write/seek interface to any of a number of data streams.
The calculations and data structures used to currently locate information on disk could be preserved in a layer beneath the generic I/O interface, thus allowing the current data layout on disk to be retained. This would, in turn, allow applications to be upgraded to use the new, generic I/O routines while remaining compatible with the existing software.
Since the applications would now be shielded from the actual disk interface, the physical location and layout of the data could be changed without affecting the applications. For example, new disk types (e.g., HIPPI) could be substituted, a real file system could be implemented, and so on.

The weak points? I'm hoping you'll tell me. Stan never read my message, so I never got any feedback, one way or another!

The Extended I/O Utilities

The proposed package, the Extended I/O (EIO) Utilities, would provide a common application interface to LZP logical data streams. In our case, these streams are files, but the EIO utilities could just as easily be used to access other types of streams, such as network connections to TPCE, shared memory objects, etc. A UNIX-like, hierarchical naming scheming is used to identify different streams.

The EIO utilities would include the following public functions:

    status = eio_open (name, options, timeout, &stream) ;
    status = eio_read (stream, buffer, length, timeout) ;
    status = eio_write (stream, buffer, length, timeout) ;
    status = eio_seek (stream, position) ;
    status = eio_poll (stream, &length) ;
    status = eio_fileno (stream, &fd) ;
    status = eio_fork (stream1, &stream2) ;
    status = eio_close (stream, options) ;

and one semi-private function:

    status = eio_parse (name, options, timeout, stream) ;

An application accessing a data stream would call eio_open() to open the stream, eio_read() and eio_write() to perform I/O, and, if non-sequential access is desired, eio_seek() to reposition the point of next access within the stream. The other functions, except for eio_close(), are intended for non-file stream types.

Opening a Data Stream

eio_open() opens the logical data stream identified by the name argument and returns a handle that is used in calls to the other functions. The stream handle is the address of an EIO stream structure:

    typedef  struct  _eioStream {
        char  *name ;             /* Stream name. */
        int  debug ;              /* Debug switch. */
        int  (*read)() ;          /* Pointer to READ function. */
        int  (*write)() ;         /* Pointer to WRITE function. */
        int  (*seek)() ;          /* Pointer to SEEK function. */
        int  (*poll)() ;          /* Pointer to POLL function. */
        int  (*fileno)() ;        /* Pointer to FILENO function. */
        int  (*fork)() ;          /* Pointer to FORK function. */
        int  (*close)() ;         /* Pointer to CLOSE function. */
        void  *private ;          /* Pointer to private data. */
    }  _eioStream, *eioStream ;

eio_open() is a very simple function: it allocates the stream structure, stores the stream name in the structure, converts the options string into an ARGC/ARGV array, and calls eio_parse() to parse the stream name. Without error handling, the function would look as follows:

    int  eio_open (const char *name,
                   const char *options,
                   double timeout,
                   eioStream *stream)
    {
        char  **argv ;
        int  argc, status ;

        *stream = malloc (sizeof (_eioStream)) ;
        (*stream)->name = str_dupl (name, -1) ;
        ... Initialize other fields to zero or NULL ...
        opt_create_argv ("eio_open", options, &argc, &argv) ;
        status = eio_parse (name, argc, argv, timeout, *stream) ;
        opt_delete_argv (argc, argv) ;
        return (status) ;
    }

A data stream name is a UNIX-like pathname consisting of a sequence of "directory" specifications followed by a file name; e.g., /ap/session_999/source_999.anno. The algorithm for parsing the data stream name is very simple. eio_parse() simply takes the first component of the pathname and calls that "directory"'s open function, passing it the remainder of the pathname. In our example, eio_parse() would extract /ap and call ap_open(), passing it /session_999/source_999.anno and the stream structure created by eio_open().

The coding of eio_parse() is fairly trivial:

    int  eio_parse (const char *name,
                    int argc,
                    const char *argv[],
                    double timeout,
                    eioStream stream)
    {
        char  *rest_of_name ;
        int  (*open_function)() ;      /* Pointer to open function. */

        ... Call strtok(3) to get the first component of the pathname; e.g., "ap".
        ... Call sprintf(3) to construct the name of that component's open function; e.g., "ap_open".
        ... Call symFindByName(2) to look up the open function's entry point in the system symbol table.
        ... Determine the start of the rest of the pathname; e.g., "/session_999/source_999.anno".

                                       /* Call the open function. */
        return (open_function (rest_of_name, arg, argv, timeout, stream)) ;

    }

By using my dynamic object module loading (DYMPL) functions instead of symFindByName(2), eio_parse() could be written in an operating system-independent way, making it usable under both VxWorks and UNIX.

Stream-Specific I/O Functions

Associated with each LZP stream type would be a package of functions that perform I/O on that type of stream:

    int  type_open () ;

    static  int  type_read () ;
    static  int  type_write () ;
    static  int  type_seek () ;
    static  int  type_close () ;
    ...

The open function is dynamically loaded and called by eio_parse() and it must be public; the remaining functions need only be known to the open function and can thus be private (static). The open function has the same interface as eio_parse() and can do one of two things:

it can call eio_parse() recursively in order to pass the open operation down another level, or
it can open the data stream.

In a detailed example I gave to Stan, opening /meds/tpce/status (a network-based message stream between the MEDS controller and TPCE) resulted in an intermediate call to meds_open() and a final call to tpce_open().

The open function that actually opens the data stream is responsible for initializing the remaining fields in the data stream structure created by eio_open(). In the TPCE example above, tpce_open() would store the addresses of the tpce_read(), tpce_write(), tpce_seek(), and tpce_close() functions in the corresponding fields of the data stream structure. If necessary, the open function also allocates a private data structure containing "device"-specific information. In the case of the MEDS-TPCE message stream, the private data could consist solely of the message stream handle returned by stan_open() (part of a function package I wrote for Stan that sends and receives MEDS messages across a network link).

In the case of the AP's annotation and timecode "files", the private data would be more extensive and include pointers to global structures (e.g., the segment directory table), the current "seek" position, and so on.

Reading, Seeking, Writing, ...

The remaining EIO functions simply call the functions whose entry points were stored in the data stream handle. eio_read() is a simple one-liner, for example:

    int  eio_read (eioStream stream, void *buffer,
                   int length, double timeout)
    {
        return (stream->read (stream, buffer, length, timeout)) ;
    }

If the extra function call is too much overhead, eio_read() and the others could be defined as macros.

The I/O functions for different "devices" would vary in complexity. In the case of the MEDS message streams, tpce_read() and tpce_write() could simply call stan_read() and stan_write(), respectively. In the case of the AP data streams, reading data would involve navigating the various tables to locate the given source's data at the desired "seek" position; cached information about the previous read might improve performance.

An Example: APDSKIN

To further illustrate the EIO utilities, consider the APDSKIN task which (in my admittedly limited understanding) receives annotation and timecode messages from the service processor and saves them sequentially to disk. Rewritten to use the EIO utilities, APDSKIN would call eio_open() to open two output files: /ap/annotation and /ap/timecodes. For each file, ap_open() would map to the global segment directory header and allocate a private data structure containing a pointer to the segment directory as well as other, AP-specific information about the data stream. This information might include a flag indicating the stream type (annotation or timecode), the type-specific record size, the next free location on disk, etc. (Alternatively, this information could be accessed directly in the segment directory header.)

When APDSKIN receives a message from the service processor, it calls eio_write() to log the annotation or timecode record on disk. eio_write() calls ap_write(), which retrieves the location of the next free block from the segment directory header and writes the record out to disk at that location ("broadcasting" it to multiple disks if necessary). ap_write() then increments the location of the next free block by the record size in blocks before returning to the application.

Note that the new APDSKIN simply calls eio_open() to open the annotation and timecode "files" and eio_write() to write the incoming messages to disk. All the details of accessing the segment directory, of determining where to write the next record on disk, of advancing the file "cursor", and so on, is buried in the ap_xxxx() routines, where it can be (i) reused by other applications and (ii) changed without affecting APDSKIN.

Note also that the EIO utilities can be used to provide multiple views of the same data. For example, APDSKIN opens a file, /ap/annotation to write a mixed stream of annotation records from all sources sequentially to disk. The other AP tasks could open files such as /ap/source_999.anno in order to access annotation records from a single source. ap_read() would then have to be able to locate the next record from a specified source within the original, sequential file of mixed sources. Again, the AP software already has this capability, but the logic could now be isolated in the ap_xxxx() functions and reused by other applications.

Building Our Tower on Shifting Sands

The layered approach to I/O used by the EIO package has several benefits:

Reuseability - was discussed in the previous section.
Performance - enhancements (e.g., buffering, different disk layouts, etc.) could be implemented and tested without changes to the applications.
Flexibility - the current layouts of data on disk could be used - or - a real file system could be used, a different disk type could be used (e.g., HIPPI rather than SCSI), a different medium could be used (e.g., RAM disk rather than magnetic disk), etc. All without changes to the applications. Configuration decisions could even be deferred until run-time. In one of Dave's scenarios, for example, different data types were stored on different disk types (e.g., annotation data on SCSI and timecode data on HIPPI).
Portability - is probably not a high priority, although the experience of porting from PDOS to VxWorks should give one pause for thought. In my original MEDS-TPCE example for Stan, I showed TPCE opening its side of the communications link using the EIO functions. And a production version of the AP software could conceivably be run on a UNIX workstation, with the EIO functions accessing the annotation and timecode data via NFS or a VxWorks-based, custom disk server.

Naming Conventions

The file and function names used in the examples presented earlier were just that: examples. Some thought must be given to file names so that (i) they are aesthetically pleasing (i.e., they don't degenerate into numeric identifiers) and (ii) they minimize the number of I/O packages that need to be written. For example, the single- and multi-source access of annotation and timecode data by the AP could be handled by the same I/O package, given an appropriate naming scheme (and an appropriate I/O package!). The names of the stream-specific I/O functions should have a prefix of some kind to prevent the EIO utilities from infringing on the applications' name spaces; e.g., eio_ap_open() instead of ap_open().

Conclusion

The EIO utilities present applications with a generic, high-level, byte-stream interface to data. The mapping of logical byte offsets into physical disk block locations is hidden in a lower I/O level. The use of the EIO facility would appear beneficial, but application experts would be the best judges of how well the EIO interface fits their disk access patterns.

If you have any questions, concerns, or suggestions, please let me know!

Alex Measday / E-mail