Naming Conventions

February 5, 1991

TPOCC began in 1987 as a small project to build a UNIX workstation-based satellite control center for NASA. Originally there were 4 programmers and our manager. Within about a year, TPOCC was taken under the umbrella of a large software contract that Computer Sciences Corporation had with NASA. After a couple more years, CSC decided to use TPOCC as the basis for a number of satellite control centers it was building for NASA and they ramped up the number of people who had to become familiar with and work with TPOCC.

One employee, new to C and Unix, didn't like the look of our software and he wrote a memo proposing naming conventions and some other coding rules. Submitted to the high-level CSC manager overseeing all the projects, the memo was duly passed on to me. The naming conventions were cosmetic; while not aesthetically pleasing, the conventions were not burdensome. (And they were loose enough to have a little fun with!) In my response, below, I focused on the functionally more important one-function/one-file recommendation. I explained why there were good software engineering reasons for grouping multiple, related C functions in a single file. I tied the whole thing in with Ada, first, because there were parallels in this instance between the languages and, second, because I figured that including an Ada example might carry a little more weight with CSC than relying solely on a C example.

My article in the June 1992 issue of The C Users Journal, "C Packages: Implementing Ada-style Packages in C", was adapted from this memo.

The memo on naming conventions discussed the grouping of related functions in a single C source file. The following statements:

"There is no prohibition against putting individual C functions into separate files ..."
"To strike a balance ... while maintaining a manageable file size."
"... a source file should not exceed N lines ..."

seem to imply that multiple functions are put into a single file as a matter of convenience only. This is a misconception that, unfortunately, seems to be spreading; most of the newly-written software I've seen on TPOCC and the TPOCC-based systems is following the one-function/one-file rule. There happens to be a very good reason for putting more than one function in a C source file, as the following discussion should make clear.

Encapsulation and data hiding are important techniques in software engineering for decreasing the coupling between modules in a program. The weaker the coupling between two modules, the less one will be affected by changes to the other. Ada hides the implementation of a capability in a package, thus shielding clients of the capability from changes to the implementation.

A C source file is analogous to an Ada package body. Static, non-local variables (i.e., those not declared in the scope of a function) in a C source file are like variables declared in the body of an Ada package: client modules have no knowledge of such variables and no access to them, except through declared procedures. Static functions in a C source file are like Ada procedures which are defined in the body of a package but not in the package specification: internal to the package, these procedures cannot be called by client modules.

For example, a package for accessing system variables might appear as follows in Ada:

    package SYV_UTIL is
                                -- Global debug switch.
                                -- Get list of discrete states.
            return STATE_LIST ;
                                -- Load system variables.
        procedure SYV_LOAD (MISSION: STRING) ;
                                -- Lookup a system variable.
        function SYV_LOOKUP (MNEMONIC: STRING) return ADDRESS ;
    end SYV_UTIL ;

    package body SYV_UTIL is
                                -- Internal variables.
        SHMEM_ADDRESS: ADDRESS := null ;
        type LIST_NODE is record
        end record ;
        SYSVAR_LIST: access LIST_NODE := null ;
                                -- Public functions.
            return STATE_LIST is
            ... returns list of discrete states defined for mnemonic ...
        end ;

        procedure SYV_LOAD (MISSION: STRING) is
            ... loads system variable database ...
        end SYV_LOAD ;

        function SYV_LOOKUP (MNEMONIC: STRING) return ADDRESS is
            ... returns address of system variable ...
        end SYV_LOOKUP ;
                                -- Internal function called by
                                -- SYV_GET_STATES and SYV_LOOKUP.
        function SYV_LOCATE (MNEMONIC: STRING)
            return access LIST_NODE is
            ... locates mnemonic's node in system variable list ...
        end SYV_LOCATE ;

    end SYV_UTIL ;

(Disclaimer: My Ada knowledge has been collecting cobwebs, so please excuse any bloopers in the code above.) A comparable package in C, stored in a single file, would look as follows:

    int  syv_util_debug = 0 ;   /* Global debug switch. */

                                /* Internal variables. */
    static  void  *shmem_address = NULL ;
    typedef  struct  list_node {
    } list_node ;
    static  list_node  *sysvar_list = NULL ;

                                /* Public functions. */
    state_list  syv_get_states () ;
    void  syv_load (), *syv_lookup () ;
                                /* Internal functions. */
    static  list_node  *syv_locate () ;

    state_list  syv_get_states (mnemonic)
        char  *mnemonic ;
        ... returns list of discrete states defined for mnemonic ...

    void  syv_load (mission)
        char  *mission ;
        ... loads system variable database ...

    void  *syv_lookup (mnemonic)
        char  *mnemonic ;
        ... returns address of system variable ...
                                /* Internal function called by
                                   SYV_GET_STATES and SYV_LOOKUP. */
    static  list_node  *syv_locate (mnemonic)
        char  *mnemonic ;
        ... locates mnemonic's node in system variable list ...

Clients (users) of the system variable package (in either language) cannot reference the shmem_address or sysvar_list variables, have no knowledge of the structure of list nodes, and cannot call the internal procedure, syv_locate(). These restrictions are not just a matter of the design methodology you follow - they are enforced by the compiler. Breaking the C functions out into separate files would require that the static variables be made global, that the structure of list nodes be made common knowledge, and that the syv_locate() function become callable from anywhere in a program. You can, of course, trust to people's good intentions and ignore the potential for malicious access to these "hidden" variables and functions, but what about programs that access them out of necessity or as a shortcut? Any changes to the implementation of system variables could have a major impact on such closely-coupled software.

Particularly effective use of the C package concept is exhibited in the TPOCC event logging utilities written by Meng Lin. An application program's access to the event logging facility is only possible through 2 routines, evt_init() and vsend_event(), found in one file, events_util.c. evt_init() initializes the interface to the event logger. The fact that evt_init() loads 3 database files and establishes a network connection to the event logger is immaterial to the application program. evt_init() could just as well be opening a disk file for the event log and the texts of event messages could be hard-coded in a string array. The details of how vsend_event() looks up the text of an event message, formats the message arguments, and writes the event packet out on the network are also of no consequence to the application program. The internal implementation of the event logging utilities could be completely revamped without affecting any of the applications software; the applications would have to be relinked to the TPOCC library, but no recompiles would be required.

The data services library, on the other hand, stores each of its functions in a different file. Although the functions' calling sequences shield client applications from implementation details to a certain extent, the "internal" data structures are all global. The lack of a function for building an I/O selection mask for data server connections forced application programs themselves to scan the library's list of connected servers. This kludge, documented in the Release 1 edition of the TPOCC Implementation Guide and currently used in the TPOCC-based report generators, produces an unhealthy dependence of an application on the internals of the data services library. A new (Release 2) function, ds_mask(), obviates the need for the kludge, but tracking down and eliminating the use of such kludges could be a major maintenance headache in some cases.

I hope the preceding discussion has shown how the C language supports good software design principles. Rather than being encouraged to put separate functions in separate files, C programmers should be encouraged to encapsulate functionality and data in C function "packages".

Alex Measday  /  E-mail