|
|
|
After an informal discussion of the LZP disk interface, I mailed Stan a message proposing a generic I/O interface for LZP data streams. The applications interface I proposed has several strong points:
The weak points? I'm hoping you'll tell me. Stan never read my message, so I never got any feedback, one way or another!
The proposed package, the Extended I/O (EIO) Utilities, would provide a common application interface to LZP logical data streams. In our case, these streams are files, but the EIO utilities could just as easily be used to access other types of streams, such as network connections to TPCE, shared memory objects, etc. A UNIX-like, hierarchical naming scheming is used to identify different streams.
The EIO utilities would include the following public functions:
status = eio_open (name, options, timeout, &stream) ; status = eio_read (stream, buffer, length, timeout) ; status = eio_write (stream, buffer, length, timeout) ; status = eio_seek (stream, position) ; status = eio_poll (stream, &length) ; status = eio_fileno (stream, &fd) ; status = eio_fork (stream1, &stream2) ; status = eio_close (stream, options) ;
and one semi-private function:
status = eio_parse (name, options, timeout, stream) ;
An application accessing a data stream would call eio_open()
to open the stream, eio_read()
and eio_write()
to
perform I/O, and, if non-sequential access is desired,
eio_seek()
to reposition the point of next access within the
stream. The other functions, except for eio_close()
, are
intended for non-file stream types.
eio_open()
opens the logical data stream identified by the
name argument and returns a handle that is used in calls to the other
functions. The stream handle is the address of an EIO stream structure:
typedef struct _eioStream { char *name ; /* Stream name. */ int debug ; /* Debug switch. */ int (*read)() ; /* Pointer to READ function. */ int (*write)() ; /* Pointer to WRITE function. */ int (*seek)() ; /* Pointer to SEEK function. */ int (*poll)() ; /* Pointer to POLL function. */ int (*fileno)() ; /* Pointer to FILENO function. */ int (*fork)() ; /* Pointer to FORK function. */ int (*close)() ; /* Pointer to CLOSE function. */ void *private ; /* Pointer to private data. */ } _eioStream, *eioStream ;
eio_open()
is a very simple function: it allocates the stream
structure, stores the stream name in the structure, converts the options
string into an ARGC/ARGV array, and calls eio_parse()
to parse
the stream name. Without error handling, the function would look as
follows:
int eio_open (const char *name, const char *options, double timeout, eioStream *stream) { char **argv ; int argc, status ; *stream = malloc (sizeof (_eioStream)) ; (*stream)->name = str_dupl (name, -1) ; ... Initialize other fields to zero or NULL ... opt_create_argv ("eio_open", options, &argc, &argv) ; status = eio_parse (name, argc, argv, timeout, *stream) ; opt_delete_argv (argc, argv) ; return (status) ; }
A data stream name is a UNIX-like pathname consisting of a sequence of
"directory" specifications followed by a file name; e.g.,
/ap/session_999/source_999.anno
. The algorithm
for parsing the data stream name is very simple. eio_parse()
simply takes the first component of the pathname and calls that
"directory"'s open function, passing it the remainder of the pathname. In
our example, eio_parse()
would extract /ap
and
call ap_open()
, passing it
/session_999/source_999.anno
and the stream
structure created by eio_open()
.
The coding of eio_parse()
is fairly trivial:
int eio_parse (const char *name, int argc, const char *argv[], double timeout, eioStream stream) { char *rest_of_name ; int (*open_function)() ; /* Pointer to open function. */ ... Callstrtok(3)
to get the first component of the pathname; e.g., "ap". ... Callsprintf(3)
to construct the name of that component's open function; e.g., "ap_open". ... CallsymFindByName(2)
to look up the open function's entry point in the system symbol table. ... Determine the start of the rest of the pathname; e.g., "/session_999/source_999.anno". /* Call the open function. */ return (open_function (rest_of_name, arg, argv, timeout, stream)) ; }
By using my dynamic object module loading (DYMPL) functions instead of
symFindByName(2)
, eio_parse()
could be written in
an operating system-independent way, making it usable under both VxWorks
and UNIX.
Associated with each LZP stream type would be a package of functions that perform I/O on that type of stream:
int type_open () ; static int type_read () ; static int type_write () ; static int type_seek () ; static int type_close () ; ...
The open function is dynamically loaded and called by
eio_parse()
and it must be public; the remaining functions
need only be known to the open function and can thus be private (static).
The open function has the same interface as eio_parse()
and
can do one of two things:
eio_parse()
recursively in order to pass the
open operation down another level, or
In a detailed example I gave to Stan, opening
/meds/tpce/status
(a network-based message stream between the
MEDS controller and TPCE) resulted in an intermediate call to
meds_open()
and a final call to tpce_open()
.
The open function that actually opens the data stream is responsible for
initializing the remaining fields in the data stream structure created by
eio_open()
. In the TPCE example above,
tpce_open()
would store the addresses of the
tpce_read()
, tpce_write()
,
tpce_seek()
, and tpce_close()
functions in the
corresponding fields of the data stream structure. If necessary, the open
function also allocates a private data structure containing
"device"-specific information. In the case of the MEDS-TPCE message
stream, the private data could consist solely of the message stream handle
returned by stan_open()
(part of a function package I wrote
for Stan that sends and receives MEDS messages across a network link).
In the case of the AP's annotation and timecode "files", the private data would be more extensive and include pointers to global structures (e.g., the segment directory table), the current "seek" position, and so on.
The remaining EIO functions simply call the functions whose entry points
were stored in the data stream handle. eio_read()
is a simple
one-liner, for example:
int eio_read (eioStream stream, void *buffer, int length, double timeout) { return (stream->read (stream, buffer, length, timeout)) ; }
If the extra function call is too much overhead, eio_read()
and the others could be defined as macros.
The I/O functions for different "devices" would vary in complexity. In the
case of the MEDS message streams, tpce_read()
and
tpce_write()
could simply call stan_read()
and
stan_write()
, respectively. In the case of the AP data
streams, reading data would involve navigating the various tables to locate
the given source's data at the desired "seek" position; cached information
about the previous read might improve performance.
To further illustrate the EIO utilities, consider the APDSKIN task
which (in my admittedly limited understanding) receives annotation and
timecode messages from the service processor and saves them sequentially to
disk. Rewritten to use the EIO utilities, APDSKIN would call
eio_open()
to open two output files:
/ap/annotation
and /ap/timecodes
. For each file,
ap_open()
would map to the global segment directory header and
allocate a private data structure containing a pointer to the segment
directory as well as other, AP-specific information about the data stream.
This information might include a flag indicating the stream type
(annotation or timecode), the type-specific record size, the next free
location on disk, etc. (Alternatively, this information could be accessed
directly in the segment directory header.)
When APDSKIN receives a message from the service processor, it calls
eio_write()
to log the annotation or timecode record on disk.
eio_write()
calls ap_write()
, which retrieves the
location of the next free block from the segment directory header and
writes the record out to disk at that location ("broadcasting" it to
multiple disks if necessary). ap_write()
then increments the
location of the next free block by the record size in blocks before
returning to the application.
Note that the new APDSKIN simply calls eio_open()
to
open the annotation and timecode "files" and eio_write()
to
write the incoming messages to disk. All the details of accessing the
segment directory, of determining where to write the next record on disk,
of advancing the file "cursor", and so on, is buried in the
ap_xxxx()
routines, where it can be (i) reused by other
applications and (ii) changed without affecting APDSKIN.
Note also that the EIO utilities can be used to provide multiple views of
the same data. For example, APDSKIN opens a file,
/ap/annotation
to write a mixed stream of annotation records
from all sources sequentially to disk. The other AP tasks could
open files such as /ap/source_999.anno
in order to
access annotation records from a single source.
ap_read()
would then have to be able to locate the next record
from a specified source within the original, sequential file of mixed
sources. Again, the AP software already has this capability, but the logic
could now be isolated in the ap_xxxx()
functions and reused by
other applications.
The layered approach to I/O used by the EIO package has several benefits:
The file and function names used in the examples presented earlier were
just that: examples. Some thought must be given to file names so that (i)
they are aesthetically pleasing (i.e., they don't degenerate into numeric
identifiers) and (ii) they minimize the number of I/O packages that need to
be written. For example, the single- and multi-source access of annotation
and timecode data by the AP could be handled by the same I/O package, given
an appropriate naming scheme (and an appropriate I/O package!). The names
of the stream-specific I/O functions should have a prefix of some kind to
prevent the EIO utilities from infringing on the applications' name spaces;
e.g., eio_ap_open()
instead of ap_open()
.
The EIO utilities present applications with a generic, high-level, byte-stream interface to data. The mapping of logical byte offsets into physical disk block locations is hidden in a lower I/O level. The use of the EIO facility would appear beneficial, but application experts would be the best judges of how well the EIO interface fits their disk access patterns.
If you have any questions, concerns, or suggestions, please let me know!