GEONius.com

14-Apr-2023

`comx_util` - CORBA-Lite Marshaling Utilities

The CORBA-Lite COMX utilities are used to convert the primitive and some basic constructed data types to and from the Common Data Representation (CDR) encodings defined for the General Inter-ORB Protocol (GIOP) in Chapter 15 of the CORBA specification.

As discussed in the "History and Caveats" section below, the COMX, GIMX, and IIOP utilities were developed as a lightweight, low-level implementation of CORBA TCP/IP communications. As such, these packages have provided a means of (i) quickly writing test clients and servers and of (ii) linking legacy applications to CORBA services - without the complexity and overhead of a full-blown CORBA implementation.

Marshaling Channels

The COMX package is patterned after Sun's eXternal Data Representation (XDR) package. A marshaling channel is created for a memory buffer containing (or that will contain) the CDR-encoded data. Three operations can be performed on a marshaling channel:

MxDECODE - decode CDR-encoded data in the channel's buffer and store it in host CPU variables or structures. In some cases (e.g., strings), memory is dynamically allocated for host CPU values.

MxENCODE - encode host CPU variables or structures in CDR format and store the encoded data in the channel's buffer.

MxERASE - deallocate host CPU values that were dynamically allocated by the MxDECODE operation.

Decoding Input Data

After reading a CORBA message, an application creates a marshaling channel for the buffer containing the received message body; the channel operation defaults to MxDECODE:

    bool  byteOrder ;
    ComxChannel  channel ;
    octet  *body ;
    unsigned  long  size ;
    Version  version ;

    ... read message header and body;
        grab GIOP version, byte order, and message size from header ...

    comxCreate (version, byteOrder, 12, body, size, &channel) ;

    ... decode the data in the message body ...

Subsequent calls to comxDataType() functions will step through the message body, decoding the specified data types.

Encoding Output Data

Before writing a CORBA message, an application creates a marshaling channel for a buffer in which to store the CDR-encoded message body. By specifying a NULL buffer, the application can let the COMX functions take care of allocating a buffer and extending its size when necessary; in this case, the channel operation is automatically set to MxENCODE:

    ... set GIOP version explicitly ...

    comxCreate (version, false, 12, NULL, 0, &channel) ;

    ... encode the data into the message body ...

    body = comxBuffer (channel, false) ;
    size = comxSkip (channel, 0) ;

    ... write message header and body ...

The byte-order flag is ignored in MxENCODE mode; the host CPU byte order is assumed. The comxBuffer() call retrieves the address of the encoded data before writing the message. The comxSkip(zero) call retrieves the number of bytes that have been encoded; i.e., the message size.

CDR Data Alignment

In both examples above, a "virtual" buffer offset of 12 bytes was specified when a channel was created. This is the size of the IIOP message header that precedes the message body when the message is transferred over a network connection. CDR requires that primitive data values be aligned on "even" boundaries of their size; e.g., 2 bytes for a short integer, 4 bytes for a long integer, and 8 bytes for a float. Even boundaries are determined relative to the beginning of the message header; hence, when skipping (MxDECODE) or inserting (MxENCODE) padding to achieve CDR alignment, the COMX utilities must take into the account the "virtual" 12-byte header. When decoding or encoding a message body, always specify an offset of 12. The COMX utilities can't simply assume an offset of 12, since encapsulated CDR data uses an offset of 0 - see comxEncapsule() for more information.

Conversion Functions

The actual conversion functions have a regular calling sequence:

    ComxChannel  channel ;
    int  status ;
    DataType  value ;

    status = comxDataType (channel, &value) ;

The address of the value is always passed in. When decoding data, the CDR-encoded data in the channel's buffer is converted to host CPU format and stored in the value. When encoding data, the value is converted from host CPU format to CDR and stored in the channel's buffer. The status returned by the conversion function is zero if the marshaling operation succeeded and an ERRNO code if the operation failed.

The conversion functions for the CDR primitive data types automatically insert/skip the padding required for data alignment, extend the size of a full buffer when encoding, and advance an internal pointer past the decoded/encoded value in the buffer when the operation is complete. The next call to a conversion function will begin at the new location of the pointer. (Alignment continues to be determined relative to the beginning of the buffer minus the virtual offset.)

Constructed Data Types

Conversion functions for more complex data types (constructed data type in CDR parlance) are easily implemented using the primitive conversion functions (or other complex conversion functions) without regard to the channel operation, padding, etc. For example, a GIOP 1.2 ReplyHeader structure contains a long request ID, an enumerated reply status, and a service context list. A ReplyHeader marshaling function is very simple:

    typedef  enum  ReplyStatusType {		// Enumeration declaration.
        ...
    }  ReplyStatusType ;

    typedef  struct  ReplyHeader {		// Structure declaration.
        unsigned  long  request_id ;
        ReplyStatusType  reply_status ;
        ServiceContextList  service_context ;
    }  ReplyHeader ;

    int  gimxReplyHeader (			// Conversion function.
        ComxChannel  channel,
        ReplyHeader  *value)
    {
        comxULong (channel, &value->request_id) ;
        comxEnum (channel, &value->reply_status) ;
        gimxServiceContextList (channel, &value->service_context) ;
        return (0) ;
    }

[The actual gimxReplyHeader() function also checks for error returns from the marshaling functions it calls. The field references are enclosed in NULL_OR() macros that check if the value pointer is NULL. By passing in a NULL value pointer, a value can be decoded and discarded; i.e., not returned to the caller. I originally thought this capability might be useful, but I've never had occasion to use it.]

Marhshaling functions for unions must be explicitly coded, although doing so is trivial. I had hoped to write a comxUnion() function similar to XDR's xdr_union(), but the discriminant of a CORBA union can be of any data type and a general-purpose marshaling function would probably require the caller to jump through hoops (e.g., supplying marshalling and comparison functions for the discriminant). If only discriminants were limited to enumerated types ...

Special CDR Data Types

The COMX package includes marshaling functions for some common constructed data types:

String is a NUL-terminated (char *) character string. When decoding a string, the comxString() function dynamically allocates space for the string using malloc(3). A subsequent call to free(3) or comxErase() is necessary to deallocate the string.

WString is a NUL-terminated (wchar_t *) wide-character string. When decoding a wide-character string, the comxWString() function dynamically allocates space for the string using malloc(3). A subsequent call to free(3) or comxErase() is necessary to deallocate the string.

OctetSeq is a sequence of octets; i.e., a byte array. When decoding an octet sequence, the comxOctetSeq() function dynamically allocates the array using malloc(3). A subsequent call to comxErase() is necessary to deallocate the sequence.

Version is a GIOP version structure, containing a major version number and a minor version number.

Three other data types require special-case marshaling functions:

Array is an array of some data type. Unlike the sequence below, a CDR array does not include information about the number of elements in the array; the applications encoding and decoding the array are assumed to know the number of elements a priori.

Encapsulation is encoded data encapsulated in an octet sequence. The comxEncapsule() function is passed a list of conversion functions and addresses of values. In MxDECODE mode, the values are decoded from an octet sequence (previously decoded from a message body). In MxENCODE mode, the values are encoded into an octet sequence (to be encoded into a message body). Data alignment is relative to the beginning of the octet sequence.

Sequence is an array of some data type. A sequence is represented in host CPU format as a structure with two fields: the number of elements in the array and a pointer to the array of elements. When decoding a sequence, the array is dynamically allocated; a subsequent call to comxErase() is necessary to deallocate the sequence. The comxSequence() function is passed the conversion function for the given data type.

Erasing Data Values

When decoding strings, sequences, etc., the COMX functions dynamically allocate space for the multi-element values using malloc(3). Once used, the values can be deallocated the correct way or the easy way. The correct way is to change the marshaling channel operation from MxDECODE to MxERASE and re-call the conversion function(s):

    OctetSeq  object ;
    ... create marshaling channel ...
    comxOctetSeq (channel, &object) ;		// Decode octet sequence.
    ... use object value ...
    comxSetOp (channel, MxERASE) ;
    comxOctetSeq (channel, &object) ;		// Erase octet sequence.

The easy way is to ignore the original marshaling channel and call comxErase():

    comxErase ((ComxFunc) comxOctetSeq, &object) ;

The second method may silently have problems with values that have GIOP version-specific memory allocation; I haven't encountered this problem yet.

History and Caveats

The COMX, GIMX, and IIOP packages were written out of desperation one weekend and refined with use over subsequent weeks. I had to interface a legacy application to some CORBA servers. TAO was too heavy-weight and ORBit's wide-character string functions wouldn't compile. MICO worked, but the documentation was out-of-date. Some MICO-based test clients (correctly) threw exceptions when trying to talk to our TAO-based servers. I put this code together quickly to intercept, dump and analyze what our TAO-based clients were sending to the servers.

The code has been compiled and tested under Linux (RedHat 6.2/7.2), under various versions of Solaris, and under Windows 98/NT4.0 (in a slightly modified C++ version).

Some known shortcomings:

Host CPUs are assumed to be little-endian or big-endian. PDP-11s were "mixed-endian" and I understand that the ARM or MIPS processor stores 64-bit IEEE floats with 32-bit words one-endian and the bytes within the words other-endian.
The standard C integer types are used as the host representation of the CDR integer types. The host CPU types are assumed to be at least as wide as the corresponding CDR types (2, 4, and 8 bytes). If the host CPU types are wider, decoded values are sign- or zero-extended as necessary and encoded values truncate the most significant bytes (of the host CPU representation).
The fixed-point decimal type is not implemented.
The host CPU is assumed to represent floating-point numbers in IEEE 754 format and in the same byte order as integers. It is assumed that the host CPU supports 32-bit floats and 64-bit doubles. The host CPU's long doubles are assumed to be full or abbreviated versions of CDR's 128-bit long doubles with the same or fewer bits in the mantissa; this may not be a valid assumption, so test before using!
The host CPU's wchar_t wide-character type is assumed to hold UNICODE characters, an assumption that is not necessarily true and that is at odds with the C standard and the UNICODE standard.
Wide characters are partially supported. Our existing CORBA interfaces use wide strings, but not individual wide characters, so this area needs more work and testing. For GIOP versions 1.0 and 1.1, the Transmission Code Set (TCS-W) is assumed to be UTF-16: 16-bit characters transmitted as unsigned shorts. For GIOP version 1.2 and later, the TCS-W is again assumed to be UTF-16, but each character is encoded in 3 octets. The byte-order marker (BOM) allowed in GIOP 1.2 is not currently recognized.
Wide strings are partially supported, enough for my needs. For all GIOP versions, the TCS-W is assumed to be UTF-16: 16-bit characters transmitted as unsigned shorts. Surrogate pairs are supported in accordance with RFC 2781, although they obviously will not work on a system whose wide characters are themselves 16 bits wide (e.g., Windows) or less.
... more if I think of anything else ...

On a lighter note, I was a better speller in third grade than I am now. Spelling "marshaling" and "marshaled" with one "l" (as the CORBA specification and, more importantly, Henning and Vinoski do) bothers me; I prefer two "l"s and the dictionary lists both forms. However, I came across a rule that says you double the final consonant in a multi-syllable base word if the stress is on the last syllable. MAR-shal: I humbly admit defeat by the forces marshaled against me and I will henceforth use one "l"! (Of course, Google searches for "marshaling" and "marshalling" [back in the early 2000s] returned 47,400 and 101,000 hits, respectively ...)

NOTE: As a big fan of Victorian novels at the time I wrote the codswallop above (and still a big fan), I suppose I must have gotten so used to British English spelling that I failed to recognize American English spelling! Ahh, the good old days — when one wrote of "to-day", "to-morrow", and "e-mail"; when "just deserts" were just that; when "careen" was not a synonym for "career"; and when you could "discuss" a nice hot cup of cocoa and a digestive biscuit ...

Public Procedures

comxBuffer() - gets a marshaling channel's buffer.

comxCreate() - creates a marshaling channel.

comxDestroy() - destroys a marshaling channel.

comxErase() - erases a decoded data structure.

comxExtend() - extends a marshaling channel's buffer.

comxGetOp() - gets the current marshaling mode.

comxGetVersion() - gets the GIOP version number.

comxReset() - resets the current location to the beginning of the buffer.

comxSetOp() - configures a channel for decoding, encoding, or erasing.

comxSkip() - advances the current location in the channel's buffer.

comxToHost() - converts numbers to host-byte order.

comxBoolean() - decode/encode CDR primitive types.

comxChar()

comxDouble()

comxEnum()

comxFloat()

comxLong()

comxLongDouble()

comxLongLong()

comxOctet()

comxShort()

comxULong()

comxULongLong()

comxUShort()

comxWChar()

comxArray() - decodes/encodes/erases a CDR array.

comxEncapsule() - decodes/encodes/erases a CDR encapsulation.

comxSequence() - decodes/encodes/erases a GIOP sequence.

comxString() - decodes/encodes/erases a GIOP string.

comxWString() - decodes/encodes/erases a GIOP wstring.

comxOctetSeq() - decodes/encodes/erases a GIOP octet sequence.

comxVersion() - decodes/encodes/erases a GIOP version number.

comxBooleanSeq() - decode/encode/erase sequences of CDR primitive types.

comxCharSeq()

comxDoubleSeq()

comxEnumSeq()

comxFloatSeq()

comxLongSeq()

comxLongLongSeq()

comxShortSeq()

comxStringSeq()

comxULongSeq()

comxULongLongSeq()

comxUShortSeq()

comxWCharSeq()

comxWStringSeq()

Source Files

comx_util.c

comx_util.h

Alex Measday / E-mail

comx_util - CORBA-Lite Marshaling Utilities