|
|
|
This article is adapted from my 1991 "Naming Conventions" memo. The techniques suggested in this article were gradually refined into a simple, but successful, form of object-based programming in C, as described briefly in my software library's design and coding conventions. Nothing original on my part, just practices shaped by seeing others' good code (especially Meng Lin's event logging library, as mentioned in my memo) and reading books like Bertrand Meyer's Object-Oriented Software Construction (see my OOP tutorial). The article is also available at the C/C++ Users Journal web site.
In an effort to bring NASA into the space age, our company was asked to build a generic satellite control center, readily adaptable to new missions. The POCC (Payload Operations Control Center) we designed and built makes use of X Windows to provide an up-to-date operator interface and networked UNIX workstations to provide some measure of hardware and software portability. Naturally, the POCC's software is written almost exclusively in C.
Naming conventions are a necessity on a large software project and, as part of the quality assurance program they set up for the customer, the main contractor (we're a subcontractor) established a set of standards for file and function names in our software. The standards can be boiled down into two basic rules: (i) each function has a 3-character prefix that identifies the subsystem of which it is a part and (ii) the name of a source file should match the name of the function within. The standards document, while acknowledging the common C practice of grouping related functions in a single file, encouraged programmers to only store one function per file.
There are some very good reasons, however, for placing more than one C function in a source file, reasons that touch on the "nature of C", as Art Shipman calls it (The C Users Journal, "Questions & Answers", February 1991).
Encapsulation and data hiding are important techniques in software engineering for decreasing the coupling between modules in a program. The weaker the coupling between two modules, the less one will be affected by changes to the other. These techniques are exemplified in Ada, which hides the implementation of a capability in a package, thus shielding clients of the capability from changes to the implementation. An Ada package consists of two parts, a package specification whose declarations constitute the public interface to the package and a package body which hides the package's actual implementation.
A C source file is analogous to an Ada package body. Static, non-local variables (i.e., those not declared in the scope of a function) in a C source file are like variables declared in the body of an Ada package: client modules have no knowledge of such variables and no access to them, except through declared procedures. Static functions in a C source file are like Ada procedures which are defined in the body of a package but not in the package specification: internal to the package, these procedures cannot be called by client modules.
For example, Listing 1 shows an Ada package for
managing a symbol table. The table is implemented as a simple list of
name/value pairs; an internal variable, SYMBOL_LIST
, points to
the list.
The representation of the list is not important; it could be a fixed-size
array, a dynamically-allocated array, a linear linked list, a binary tree,
a hash table, or even a skip list! SYM_ADD()
is a procedure
that adds a symbol to the symbol table; SYM_DELETE()
deletes a
symbol from the table. SYM_LOOKUP()
is a function that
returns the value assigned to a symbol. All 3 functions call an internal
function, SYM_LOCATE()
, that locates a symbol in the list and
returns a pointer to the symbol's list node. SYM_UTIL_DEBUG
is a global debug switch that a program can set to turn on debug output in
the SYM_UTIL
functions.
A comparable C "package", stored in a single source file, is shown in
Listing 2. The symbol_list
pointer
and the sym_locate()
function are declared static
,
so they are unknown outside of this file. The remaining functions and the
global debug flag are all accessible to the public.
Clients (users) of the symbol table package (in either language)
cannot reference the symbol_list
variable, have no knowledge of
the structure of list nodes, and cannot call the internal
procedure, sym_locate()
. These restrictions are not just a
matter of the design methodology you follow - they are enforced
by the compiler. Breaking the C functions out into separate
files would require that the static variables be made global,
that the structure of list nodes be made common knowledge, and
that the sym_locate()
function become callable from anywhere in a
program. You can, of course, trust to people's good intentions
and ignore the potential for malicious access to these "hidden"
variables and functions, but what about programs that access them
out of necessity or as a shortcut? Any changes to the
implementation of the symbol table could have a major impact on
such closely-coupled software.
Particularly effective use of the C package concept is exhibited in our
POCC's event logging utilities. An application program's access to the
event logging facility is only possible through two routines,
evt_init()
and evt_send()
, found in one file,
evt_util.c
. evt_init()
initializes the interface
to the event logger. The fact that evt_init()
loads event
message information from 3 database files and establishes a network
connection to the event logger is immaterial to the application program.
evt_init()
could just as well be opening a disk file for the
event log and the texts of event messages could be hard-coded in a string
array. The details of how evt_send()
looks up the text of an
event message, formats the message arguments, and writes the event packet
out on the network are also of no consequence to the application program.
The internal implementation of the event logging utilities could be
completely revamped without affecting any of the applications software; the
applications would have to be relinked to the updated library, but
recompilation would not be necessary.
Our data services library took the opposite approach. This library, which
manages network connections to multiple data servers, stores each of its
functions in a different source file. Although the functions' calling
sequences shield client applications from implementation details to a
certain extent, the "internal" data structures are all global. The lack of
a function for building an I/O selection mask for the managed connections
forced application programs themselves to scan the library's list of
connected servers. This kludge produced an unhealthy dependence of an
application on the internals of the data services library. A new function,
ds_mask()
, was added that obviated the need for the kludge,
but tracking down and eliminating the use of such kludges could be a major
maintenance headache in some cases.
C packages are, in general, a good thing. However, several caveats should
be noted. First, don't lock the door and throw away the key. It's
possible for a package to be closed up too tightly. For example, as our
project progressed, we found the need for certain applications to switch to
a different event logger (on another computer) in mid-stream. Doing so
requires the application program to close the network connection to the
current event logger and to open a connection to the new event logger.
Hidden inside the events package, the file descriptor for the event logger
connection is inaccessible to application programs. Adding a new function,
evt_reconnect()
, easily solved the problem, but software
changes may not, for organizational or configuration reasons, always be an
available option.
Another word to the wise: don't hide what shouldn't be hidden. The UNIX
hashing functions, hsearch(3)
, for example, manage a single
hash table. While the details of the hash table are commendably hidden
from the calling program, the calling program cannot have multiple tables
in use simultaneously. The program must destroy one table before creating
another. Rather than storing the hash table within the package, as it
does, hcreate()
would be better off returning an opaque,
void *
pointer to each hash table it creates. This "handle"
could then be passed into hsearch()
for adding and recalling
entries from that particular table. Different hash tables would have
different handles and could coexist peacefully.
Using C packages can be viewed as a primitive form of object-oriented programming. (Ken Pugh alluded to this in his "Questions & Answers" column, CUJ, February 1991.) In the object-oriented approach, a program is composed of objects. An object consists of instance variables, that represent the state of the object, and methods, which are functions used to modify or query the object's state. Objects communicate by passing messages to each other; a message specifies a method to be executed by the receiving object and arguments, if any, for the method.
In a C "object", the static, non-local variables are instance variables and the functions are the methods. Calling a function is logically equivalent to sending a message to a method. Figure 1 illustrates this object-oriented view of our symbol table package. The global debug switch, not shown in the figure, might be considered a class variable; i.e., a variable common to all instances of a particular object type (class).
The preceding discussion has shown how the C language supports good software design principles. Rather than being encouraged to put separate functions in separate files, C programmers should be encouraged to encapsulate functionality and data in C function "packages".
About me: I am a programmer/analyst at Integral Systems, Inc. (Lanham, MD), which builds satellite ground systems for NASA, NOAA, and the Air Force. I've been a professional programmer for about 9 years, developing satellite image processing systems in VAX/VMS FORTRAN, automated test equipment for satellite components in PL/M-286 (I wish C had "based" variables!), and, currently, satellite control center software in C under UNIX. I can be reached at 1100 West Street, Laurel MD 20707. (301) 497-2413.
package SYM_UTIL is -- Global debug switch. SYM_UTIL_DEBUG: BOOLEAN := FALSE ; -- Add symbol to table. procedure SYM_ADD (NAME: STRING, VALUE: INTEGER) -- Delete symbol from table. procedure SYM_DELETE (NAME: STRING) ; -- Lookup a symbol. function SYM_LOOKUP (NAME: STRING) return INTEGER ; end SYM_UTIL ; package body SYM_UTIL is -- Internal variables. type SYMBOL_NODE is record ... end record ; SYMBOL_LIST: access SYMBOL_NODE := null ; -- Public functions. procedure SYM_ADD (NAME: STRING, VALUE: INTEGER) is begin ... adds NAME/VALUE pair to the symbol table ... end SYM_ADD ; procedure SYM_DELETE (NAME: STRING) is begin ... deletes NAME from the symbol table ... end SYM_DELETE ; function SYM_LOOKUP (NAME: STRING) return INTEGER is begin ... returns NAME's value from the symbol table ... end SYM_LOOKUP ; -- Internal function called -- by the other functions. function SYM_LOCATE (NAME: STRING) return access SYMBOL_NODE is begin ... locates NAME's node in the symbol list ... end SYM_LOCATE ; end SYM_UTIL ;
int sym_util_debug = 0 ; /* Global debug switch. */ /* Internal variables. */ typedef struct symbol_node { ... } symbol_node ; static symbol_node *symbol_list = NULL ; /* Public functions. */ void sym_add (), sym_delete () ; int sym_lookup () ; /* Internal functions. */ static symbol_node *sym_locate () ; void sym_add (name, value) char *name ; int value ; { ... adds NAME/VALUE pair to the symbol table ... } void sym_delete (name) char *name ; { ... deletes NAME from the symbol table ... } int sym_lookup (name) char *name ; { ... returns NAME's value from the symbol table ... } /* Internal function called by the other functions. */ static symbol_node *sym_locate (name) char *name ; { ... locates NAME's node in the symbol list ... }