|
|
|
TPOCC began in 1987 as a small project to build a UNIX workstation-based satellite control center for NASA. Originally there were 4 programmers and our manager. Within about a year, TPOCC was taken under the umbrella of a large software contract that Computer Sciences Corporation had with NASA. After a couple more years, CSC decided to use TPOCC as the basis for a number of satellite control centers it was building for NASA and they ramped up the number of people who had to become familiar with and work with TPOCC.
One employee, new to C and Unix, didn't like the look of our software and he wrote a memo proposing naming conventions and some other coding rules. Submitted to the high-level CSC manager overseeing all the projects, the memo was duly passed on to me. The naming conventions were cosmetic; while not aesthetically pleasing, the conventions were not burdensome. (And they were loose enough to have a little fun with!) In my response, below, I focused on the functionally more important one-function/one-file recommendation. I explained why there were good software engineering reasons for grouping multiple, related C functions in a single file. I tied the whole thing in with Ada, first, because there were parallels in this instance between the languages and, second, because I figured that including an Ada example might carry a little more weight with CSC than relying solely on a C example.
My article in the June 1992 issue of The C Users Journal, "C Packages: Implementing Ada-style Packages in C", was adapted from this memo.
The memo on naming conventions discussed the grouping of related functions in a single C source file. The following statements:
"There is no prohibition against putting individual C functions into separate files ..."
"To strike a balance ... while maintaining a manageable file size."
"... a source file should not exceed N lines ..."
seem to imply that multiple functions are put into a single file as a matter of convenience only. This is a misconception that, unfortunately, seems to be spreading; most of the newly-written software I've seen on TPOCC and the TPOCC-based systems is following the one-function/one-file rule. There happens to be a very good reason for putting more than one function in a C source file, as the following discussion should make clear.
Encapsulation and data hiding are important techniques in software engineering for decreasing the coupling between modules in a program. The weaker the coupling between two modules, the less one will be affected by changes to the other. Ada hides the implementation of a capability in a package, thus shielding clients of the capability from changes to the implementation.
A C source file is analogous to an Ada package body. Static, non-local variables (i.e., those not declared in the scope of a function) in a C source file are like variables declared in the body of an Ada package: client modules have no knowledge of such variables and no access to them, except through declared procedures. Static functions in a C source file are like Ada procedures which are defined in the body of a package but not in the package specification: internal to the package, these procedures cannot be called by client modules.
For example, a package for accessing system variables might appear as follows in Ada:
package SYV_UTIL is -- Global debug switch. SYV_UTIL_DEBUG: BOOLEAN := FALSE ; -- Get list of discrete states. function SYV_GET_STATES (MNEMONIC: STRING) return STATE_LIST ; -- Load system variables. procedure SYV_LOAD (MISSION: STRING) ; -- Lookup a system variable. function SYV_LOOKUP (MNEMONIC: STRING) return ADDRESS ; end SYV_UTIL ; package body SYV_UTIL is -- Internal variables. SHMEM_ADDRESS: ADDRESS := null ; type LIST_NODE is record ... end record ; SYSVAR_LIST: access LIST_NODE := null ; -- Public functions. function SYV_GET_STATES (MNEMONIC: STRING) return STATE_LIST is begin ... returns list of discrete states defined for mnemonic ... end ; procedure SYV_LOAD (MISSION: STRING) is begin ... loads system variable database ... end SYV_LOAD ; function SYV_LOOKUP (MNEMONIC: STRING) return ADDRESS is begin ... returns address of system variable ... end SYV_LOOKUP ; -- Internal function called by -- SYV_GET_STATES and SYV_LOOKUP. function SYV_LOCATE (MNEMONIC: STRING) return access LIST_NODE is begin ... locates mnemonic's node in system variable list ... end SYV_LOCATE ; end SYV_UTIL ;
(Disclaimer: My Ada knowledge has been collecting cobwebs, so please excuse any bloopers in the code above.) A comparable package in C, stored in a single file, would look as follows:
int syv_util_debug = 0 ; /* Global debug switch. */ /* Internal variables. */ static void *shmem_address = NULL ; typedef struct list_node { ... } list_node ; static list_node *sysvar_list = NULL ; /* Public functions. */ state_list syv_get_states () ; void syv_load (), *syv_lookup () ; /* Internal functions. */ static list_node *syv_locate () ; state_list syv_get_states (mnemonic) char *mnemonic ; { ... returns list of discrete states defined for mnemonic ... } void syv_load (mission) char *mission ; { ... loads system variable database ... } void *syv_lookup (mnemonic) char *mnemonic ; { ... returns address of system variable ... } /* Internal function called by SYV_GET_STATES and SYV_LOOKUP. */ static list_node *syv_locate (mnemonic) char *mnemonic ; { ... locates mnemonic's node in system variable list ... }
Clients (users) of the system variable package (in either language) cannot
reference the shmem_address
or sysvar_list
variables, have no knowledge of the structure of list nodes, and cannot
call the internal procedure, syv_locate()
. These restrictions
are not just a matter of the design methodology you follow - they are
enforced by the compiler. Breaking the C functions out into separate files
would require that the static variables be made global, that the structure
of list nodes be made common knowledge, and that the
syv_locate()
function become callable from anywhere in a
program. You can, of course, trust to people's good intentions and ignore
the potential for malicious access to these "hidden" variables and
functions, but what about programs that access them out of necessity or as
a shortcut? Any changes to the implementation of system variables could
have a major impact on such closely-coupled software.
Particularly effective use of the C package concept is exhibited in the
TPOCC event logging utilities written by Meng Lin. An application
program's access to the event logging facility is only possible through 2
routines, evt_init()
and vsend_event()
, found in
one file, events_util.c
. evt_init()
initializes
the interface to the event logger. The fact that evt_init()
loads 3 database files and establishes a network connection to the event
logger is immaterial to the application program. evt_init()
could just as well be opening a disk file for the event log and the texts
of event messages could be hard-coded in a string array. The details of
how vsend_event()
looks up the text of an event message,
formats the message arguments, and writes the event packet out on the
network are also of no consequence to the application program. The
internal implementation of the event logging utilities could be completely
revamped without affecting any of the applications software; the
applications would have to be relinked to the TPOCC library, but no
recompiles would be required.
The data services library, on the other hand, stores each of its functions
in a different file. Although the functions' calling sequences shield
client applications from implementation details to a certain extent, the
"internal" data structures are all global. The lack of a function for
building an I/O selection mask for data server connections forced
application programs themselves to scan the library's list of connected
servers. This kludge, documented in the Release 1 edition of the TPOCC
Implementation Guide and currently used in the TPOCC-based report
generators, produces an unhealthy dependence of an application on the
internals of the data services library. A new (Release 2) function,
ds_mask()
, obviates the need for the kludge, but tracking down
and eliminating the use of such kludges could be a major maintenance
headache in some cases.
I hope the preceding discussion has shown how the C language supports good software design principles. Rather than being encouraged to put separate functions in separate files, C programmers should be encouraged to encapsulate functionality and data in C function "packages".