GEONius.com

28-May-2003

Non-Blocking I/O, One Way or Another

October 28, 1991

This memo discusses some problems (and solutions) related to network I/O:

Broken pipe signals - this section should be read by everyone who writes programs that perform network I/O.

Non-blocking I/O - this section describes a new TPOCC library package, QIO_UTIL, that makes asynchronous, non-blocking, network I/O relatively painless. The QIO_UTIL package allows a program to queue output requests to a network socket without having to wait for the actual writes to complete. Most TPOCC programs and TPOCC-based applications need this capability, whether they know it or not. For example, momentary bottlenecks in the events subsystem should not block a real-time, EKG-monitoring program in the middle of logging an event message. This section should at least be read by everyone who interfaces with the data server.

Broken Pipes

"I don't feel like telling you everything because I don't feel like telling you anything."
- My 5-year old's response to the question, "How was school today?"

If, in the midst of your debugging, you come across a problem that could conceivably occur in programs other than your own, and, better yet, if you come up with a solution, please share it with everyone else.

For some time now, Steve has been having problems with the data server mysteriously disappearing while he was debugging the display program. Steve always figured it was the data server's fault; I always figured it was Steve's fault. Just my luck - Steve gets the gold star this time! We eventually figured out that, if the data server was in the middle of writing data to the display program when the display program was terminated (by exiting the debugger), the data server would exit because of a SIGPIPE signal.

More generally, attempting to write to a network connection that has gone down (e.g., because the process on the other side broke the connection) causes a SIGPIPE (broken pipe) signal to be generated. SIGPIPE is one of those signals that, if not handled, silently aborts a program. If your server program must survive broken connections to clients, it should install a handler function to catch SIGPIPE signals. Installing a signal handler is accomplished by a call to signal(2) in your main routine:

    #ifdef VXWORKS
    #    include  <sigLib.h>		/* Signal definitions. */
    #else
    #    include  <signal.h>		/* Signal definitions. */
    #endif
    extern  void  my_handler () ;	/* External functions. */

    ...

    main () {

        ...

    /* Install a handler function to field broken pipe signals. */

        signal (SIGPIPE, my_handler) ;

        ...

    }

The signal handler function itself can be very simple:

    #ifdef VXWORKS
    #    include  <sigLib.h>		/* Signal definitions. */
    #else
    #    include  <signal.h>		/* Signal definitions. */
    #endif

    void  my_handler (sig, code, scp, addr)
        int  sig, code ;
        struct  sigcontext  *scp
        char  *addr ;
    {
    #ifdef SYSV				/* Reinstall the signal handler. */
        signal (sig, my_handler) ;
    #endif
    }

Note that, under System V UNIX (e.g., HP/UX), a signal handler must be reinstated each time its signal is raised.

You shouldn't depend upon SIGPIPE signals to detect broken connections. SIGPIPEs are only generated when a write(2) is attempted on a broken connection. Trying to read(2) a broken connection doesn't generate a signal; it simply returns zero bytes of input. This information is currently used by TPOCC and TPOCC-based programs to detect broken network connections: if a select(2) call indicates the connection has data to read, but read() can't find anything, the connection must have gone down.

Why hasn't the broken pipe problem been more evident? First, programs typically spend most of their time waiting for input; consequently, broken connections are more likely to be detected during a select()/read() sequence than during a write(). Second, the problem has probably been occurring more frequently than we realize. TSTOL has been known to quietly exit on occassion; in retrospect, the symptoms point to broken pipe signals.

The subject of broken pipes rang a bell in my head - Steve thinks I have lots of bells ringing in my head - so I ran grep(1) on the TPOCC source directory tree, looking for SIGPIPE. Sure enough, I found it. The programs in the events subsystem all have broken pipe signal handlers. A lot of hair-pulling and hand-wringing on Steve's part could have been avoided (Pete: "Can we vote on this?") had this information been more widely publicized.

Non-Blocking I/O

"The seaweed's always greener in someone else's lake."
- Sebastian the Crab, in The Little Mermaid.

One of the benefits of working with UNIX is that you find out that the seaweed really is greener in the VMS lake. (For those of you who don't know any better, VMS is the operating system of choice for VAX computers.) We recently discovered that the display program and the data server are prone to deadlock when several pages are up on the screen and Display tries to bring up another one. The data server is too busy outputting DATA packets to read ADD requests, while Display is too busy outputting ADD requests to read DATA packets. They both block on write(2)s when their network buffers fill up:

                                    DATA
                      Display   <----------    Data
                      Process    ---------->  Server
                                    ADD

The smug grin spreading across my face was cut short by the realization that knowing that VMS provides excellent support for asynchronous, non-blocking I/O doesn't make up for UNIX's shortcomings. To remedy the situation, a new package of routines, qio_util.c, has been added to TPOCC's libutilgen library. This package simulates the VMS queued I/O (QIO) facility. Although the functions in TPOCC's QIO package perform raw network I/O, they are easily layered underneath XDR for applications that communicate using that protocol.

VAX/VMS Queued I/O

Under VMS, writing to a channel (e.g., a file or a network connection) results in a write request being queued up to the device driver:

     Device           I/O      I/O           I/O          Application
     Driver  <----  Request  Request  ...  Request  <----   Program
                       ^                      ^
                       |                      |
                    Front of               Rear of
                     Queue                  Queue

User-level function calls add read and write requests to the end of the queue. When finished with an I/O request, the device driver pops the next one from the front of the queue and begins processing it.

Once an I/O request is issued by a program (through a FORTRAN WRITE statement, for example), the program can do one of two things:

Synchronous I/O - wait for the I/O request to complete before continuing with processing.

Asynchronous I/O - continue processing immediately.

In the case of asynchronous I/O, a program may be notified of the completion of an I/O request by the setting of an event flag (similar to a UNIX semaphore) or by the invocation of a user-specified asynchronous trap (AST) function. An AST is like a UNIX signal; an AST function for asynchronous I/O is like a UNIX SIGIO handler. AST functions are specified on a per-request basis, so a program can have different AST functions for different channels (or even for different requests queued to the same channel). UNIX's SIGIO signal, on the other hand, is raised when I/O on any channel completes; a SIGIO handler must typically poll all of a program's open channels to find out which one generated the signal.

Queued I/O on TPOCC

Okay, now that you know more about VMS than both Steve and Paul put together, how does this apply to TPOCC? Suppose the data server could queue up output requests for DATA packets and go on about its business, such as reading ADD commands from Display. Likewise, suppose Display could queue up ADD commands and get back to what it has to do: read and display DATA packets received from the data server. The result: no blocking and no deadlock in the scenario described at the beginning of this section.

The new TPOCC QIO utilities provide some of the aforementioned VMS capabilities and are implemented so that they can be incorporated into existing programs with only minimal changes to the target programs. For example, the data server was modified to use the QIO utilities by:

Adding a call to qio_init() to the main routine.
Adding a call to qio_flush() in the server's select(2) loop.
Adding a call to qio_configure() after a new client's connection request is answered.
Replacing the call to xll_write() in the data server's low-level XDR output routine by a call to qio_write().

15 minutes work for someone on a VT 320; it may take longer on an X Windows console.

Now, when a client of the data server (e.g., Display) cannot immediately read what the data server is sending, the sampled data packets pile up in the data server's output queue. Once the client begins reading again, the pent-up packets are flushed to the network.

Since the TPOCC QIO utilities are new code, they probably need to be subjected to a formal walkthrough. Just one problem - who is qualified to pass judgement on this VMS-inspired code? Don "DEC? Bleecch!@#?!" Slater and Steve "if not Gould then HP" Gibson are misfits - oops, I mean - unfit for this task. Linda, Miriam, and Pete are possible candidates; Pete would qualify on his Amiga experience alone.

Using the TPOCC QIO Package

Making use of TPOCC's new QIO utilities is very easy. You need to:

qio_init()
Initialize the QIO package's internal data structures. This step is optional, since these structures are pre-initialized, static variables.
qio_configure()
After you open a connection (via net_answer() or net_call(), for instance) that is to use QIOs, configure the connection for buffered, non-blocking I/O.
qio_write()
Rather than directly writing to a configured connection, queue up a write request on that connection.
qio_flush()
Periodically check for and attempt to complete I/O requests that haven't been completed yet.

For example, the following program queues up and writes 100 messages to the tpocc_display network server:

    #include  <stdio.h>           	/* Standard I/O definitions. */

    main ()

    {    /* Local variables. */
        char  buffer[64] ;
        int  connection, i ;

    /* Contact the network server and configure the connection for
       buffered, non-blocking I/O. */

        net_call (NULL, "tpocc_display", &connection) ;
        qio_configure (connection, 1, 1, -1) ;

    /* Queue up 100 messages for output. */

        for (i = 0 ;  i < 100 ;  i++) {
            sprintf (buffer, "%d Motifs do not an Open Look make.", i) ;
            qio_write (connection, buffer, strlen (buffer), NULL, NULL) ;
        }

    /* Wait until all the messages have been output to the network. */

        while (qio_pend () > 0)
            qio_flush () ;

    }

qio_pend() is a function that returns the number of pending I/O requests. Error checking is not shown in the example above, but most of the QIO functions (as well as net_call()) return function values of zero if no errors occurred and ERRNO if one did.

Note that qio_flush() must be called periodically in order to flush any uncompleted I/O requests. This was no problem for the data server, which has to wake up every tenth of a second anyway; the call to qio_flush() was just added to the data server's select() loop. Other programs may find it a little more difficult.

When you use the QIO functions, be sure and read their prologs (in qio_util.c in TPOCC's libutilgen library). qio_configure() deserves special mention here. It has 4 arguments:

channel
is the UNIX file descriptor for the output device (e.g., a network connection).
is_nonblocking
specifies whether or not I/O on this channel will block if a read or write is attempted and the channel is not ready.
is_buffered
specifies whether or not qio_write() should buffer the data for your requests.
max_outstanding
specifies the maximum number of outstanding I/O requests the channel's queue will hold. -1 means there is no limit.

If you designate a channel as non-blocking, qio_configure() will automatically configure the UNIX file descriptor for non-blocking I/O via an ioctl(2) system call. The QIO package also works with blocking I/O, although there is a performance penalty, since qio_flush() must call select() to see if a connection is ready before attempting a read() or write(). (Incidentally, VxWorks supports non-blocking I/O; furthermore, the QIO utilities have been tested under VxWorks, HP/UX, and SunOS.)

If a write request is queued to a channel marked as "buffered", qio_write() will malloc(3) a "system" buffer for the user's data; the caller can then reuse the buffer passed into qio_write(), without having to wait for the QIO to complete. If a connection is "unbuffered", the user must perform his or her own buffering. If the dynamics of your system are such that buffering could lead to excessive memory usage, the max_outstanding parameter can be used to limit the number of uncompleted I/O requests in a queue.

QIOs and XDR

The sample program shown on the previous page performed raw write()s to the network. Most of the TPOCC-based programs, however, use the XDR protocol to exchange data. Fortunately, the QIO package can be "layered in" underneath the XDR calls our programs make. To show how this is accomplished, the changes to the data server are shown below.

First, whenever a new network connection is opened, you must call qio_configure() to configure the socket for buffered, non-blocking I/O. In the data server, this is done immediately after the net_answer() in new_client.c:

		/* Answer connection request from client. */

    if (net_answer (server, 99, &sock, &clnt_sock)) {
        vsend_event (SCKT_ANSW_ERR, "answering", NULL) ;
        return (NULL) ;
    }
		/* Configure connection for queued I/O. */

    if (qio_configure (clnt_sock, 1, 1, -1)) {
        vsend_event (SCKT_ANSW_ERR, "configuring", NULL) ;
        close (clnt_sock) ;
        return (NULL) ;
    }

Next, the low-level write routine called by the XDR functions must be changed to issue QIOs instead of writing directly to the network. The data server's low-level write function, writedstcp() (defined in new_client.c), originally looked as follows:

    static  int  writedstcp (client, buf, len)
        struct  client_data  *client ;
        char  *buf ;
        int  len ;
    {
        return (xll_write (client->client_sock, buf, len, 0,
                           &client->client_error.re_status,
                           &client->client_error.re_errno)) ;
    }

In the new version of the function, the call to xll_write() was replaced by a call to qio_write():

    static  int  writedstcp (client, buf, len)
        struct  client_data  *client ;
        char  *buf ;
        int  len ;
    {

        if (qio_write (client->client_sock, buf, len, NULL, NULL)) {
            vperror ("(writedstcp) Error queueing write request for %d bytes to channel %d.\nqio_write: ",
                     len, client->client_sock) ;
            client->client_error.re_status = RPC_CANTSEND ;
            client->client_error.re_errno = errno ;
            return (-1) ;
        }

        client->client_error.re_status = RPC_SUCCESS ;
        client->client_error.re_errno = 0 ;

        return (len) ;

    }

Now, whenever an XDR record is ready to be sent, QIOs are issued for the data. As mentioned earlier, the data server periodically calls qio_flush() to actually write the data to the network. The call to qio_flush() occurs in the data server's select() loop (in data_server.c):

    while (TRUE) {
        ... construct read mask and set timeout for 1/10 second ...
        switch (select (FD_SETSIZE, &read_mask, ...)) {
        case 0:
            ... sample and send any data that is ready to be sent ...
            qio_flush () ;
            continue ;
        case -1:
            ... error ...
        }
        ... check for, read, and process commands from clients ...
    }

The QIO Functions

In addition to the QIO functions shown in the previous sections, the QIO package has a number of other public functions; a complete list is presented below. qio_read() and qio_seek() were added for the sake of completeness; they are untested and I'm not sure how or if qio_read() could be used in conjunction with XDR. qio_termc() should be called when a connection is closed.

qio_init()
initializes QIO_UTIL's internal data structures.
qio_configure()
configures a previously-opened channel for buffered/unbuffered, blocking/non-blocking I/O.
qio_read()
adds a read request to a channel's I/O queue.
qio_seek()
adds a seek request to a channel's I/O queue.
qio_write()
adds a write request to a channel's I/O queue.
qio_flush()
attempts to complete QIOs pending on any channel.
qio_flushc()
attempts to complete pending QIOs in a specific channel's I/O queue.
qio_pend()
returns the number of QIOs pending on all channels.
qio_pendc()
returns the number of QIOs pending on a specific channel.
qio_term()
deletes all of the I/O queues and any pending QIOs in those queues.
qio_termc()
deletes a specific channel's I/O queue and any pending QIOs in that queue.
qio_ast()
is a sample AST function, invoked when an I/O request completes.
qio_dump()
dumps the contents of the I/O queues to standard output.

A global debug flag, qio_util_debug, can be set to enable debug output from these functions, including a dump of all data read from or written to the network. qio_init() resets the debug flag, so be sure and enable debug after calling qio_init().

Asynchronous Trap (AST) Functions

Under VAX/VMS, when an I/O request completes, either successfully or unsuccessfully, the program can be notified in one of two ways: by an asynchronous trap (AST) or by the setting of an event flag. An AST is basically a software interrupt that invokes an interrupt handler. The TPOCC QIO package supports an AST-like mechanism by allowing a program to specify an AST function when qio_read(), qio_write(), or qio_seek() are called. For example, a write request is issued by the following call:

    extern  void  AST_function () ;
    ...
    qio_write (channel, buffer, length, AST_function, AST_argument) ;

AST_function is a pointer to an AST function, which can be:

NULL, if no AST function is to be invoked, or
a generic AST function which handles all I/O completions, or
different AST functions for different QIOs, for different operations (read or write), or for different channels.

AST_argument is an arbitrary, user-specified argument, cast as a VOID * pointer, which will be passed to the AST function. When the write requested by the call above is completed, qio_flush() calls AST_function, passing it AST_argument, as well as several other arguments. An AST function should be defined as follows:

    void  AST_function (error_code, channel, operation, buffer, length, AST_argument)
        char  *buffer ;
        int  channel, error_code, length, operation ;
        void  *AST_argument ;
    {
        ...
    }

Descriptions of the arguments can be found in the prolog for qio_ast(), a sample AST function in qio_util.c which prints out a debug message when an I/O operation completes. An AST function is free to do pretty nearly anything it wants to do. "Event flags" can be simulated by specifying an AST function that signals a semaphore.

The Data Services Library, Display, and Non-Blocking I/O

Clients such as the TPOCC display program interface with the data server through TPOCC's data services library, libds. submit_cmd() is called to send data requests to the data server; request_data() is called to receive sampled data from the data server. Although the data server has been modified to use queued I/O for outputting data, it might be a good idea if data clients themselves used QIOs for sending commands to the data server. For example, the display program will block trying to write ADD commands if the data server is not reading them fast enough; this bodes ill for user responsiveness.

Fortunately, there is a solution. The data services library has been upgraded to allow an application to choose between:

blocking I/O - as in earlier versions of the library, and

non-blocking I/O - queued and buffered.

Selecting one or the other is as simple as setting a global flag, libds_use_qios, to true (non-zero) or false (zero). Of course, if an application decides to use QIOs for data services, it must remember to periodically call qio_flush().

To illustrate the modifications that need to be made to a program, the following discussion shows how TPOCC's display program could be modified to use QIOs when sending commands to the data server. First, in xtpdsp.c (Display's main routine), the following lines need to be added to set the "use QIOs" flag:

    extern  int  libds_use_qios ;
    ...
    libds_use_qios = 1 ;	/* Tell the data services library to use QIOs. */

Next, qio_flush() must be called periodically. Periodic(), in Display's DataComm library, is the obvious place for this call, nestled in between the calls to XtpPollData() and XFlush():

    void  Periodic (display, id)
        ...
    {
        XtpPollData () ;
        qio_flush () ;
        ...
        XFlush (display) ;
        ...
    }

Done! And in less time than it takes an X Windows programmer to tell you how busy he is! Note that the call to qio_flush() in the periodic function will not affect display programs that don't use QIOs; since there would be no I/O queues, qio_flush() would return immediately.

QIOs to Files

They don't work. Actually, the QIO utilities work fine with files, it's just that non-blocking I/O doesn't seem to mean anything with respect to files. If you queue up many writes to a file, a single call to qio_flush() will flush them all. Sorry! (The new HP/UX adds an asynchronous file I/O capability.)

Future Directions

What else can be done with the QIO utilities? The event logging interface (vsend_event(), etc.) ought to be converted to use QIOs for writing messages to the event logger. There are only two ways of doing this, as far as I can see:

Require all programs that use the event logging facility to periodically call qio_flush().
Have vsend_event() call qio_flush() after "sending" an event message.

Lots of programs would have to be modified to support Option #1. Option #2 would not affect existing programs, but, if an event message could not be sent on one call to vsend_event(), it would not get transmitted until the next call.

Since stream_svr(), in TPOCC's libutilgen library, is so widely used, I haven't yet modified it to use QIOs. This function already has a "non-blocking?" argument which could be used to enable queued I/O. Any objections? (stream_svr()'s non-blocking feature is #ifdefed for Sun OS only; I don't think anyone has ever used it.)

Bill Stratton wrote some non-blocking XDR functions (xnb_util.c) that, unlike the QIO functions, throw away XDR records that can't be written without blocking. This capability could possibly be simulated by a function, qio_prune(), which simply deletes all I/O requests except the current one from a queue. As with the XNB utilities, there is no way to guarantee that a write request covers a whole record and nothing but the record. By the way, Bill, we're glad to see you doing something useful again - it's been a long time since we've seen the likes of your flat list and hash routines!

The QIO utility package was written kind of off the cuff, so if you have any suggestions, comments, or questions, please let me know.

Alex Measday / E-mail