|
|
|
comx_util
- CORBA-Lite Marshaling UtilitiesThe CORBA-Lite COMX utilities are used to convert the primitive and some basic constructed data types to and from the Common Data Representation (CDR) encodings defined for the General Inter-ORB Protocol (GIOP) in Chapter 15 of the CORBA specification.
As discussed in the "History and Caveats" section below, the COMX, GIMX, and IIOP utilities were developed as a lightweight, low-level implementation of CORBA TCP/IP communications. As such, these packages have provided a means of (i) quickly writing test clients and servers and of (ii) linking legacy applications to CORBA services - without the complexity and overhead of a full-blown CORBA implementation.
The COMX package is patterned after Sun's eXternal Data Representation (XDR) package. A marshaling channel is created for a memory buffer containing (or that will contain) the CDR-encoded data. Three operations can be performed on a marshaling channel:
MxDECODE - decode CDR-encoded data in the channel's buffer and store it in host CPU variables or structures. In some cases (e.g., strings), memory is dynamically allocated for host CPU values.
MxENCODE - encode host CPU variables or structures in CDR format and store the encoded data in the channel's buffer.
MxERASE - deallocate host CPU values that were dynamically allocated by the MxDECODE operation.
After reading a CORBA message, an application creates a marshaling channel for the buffer containing the received message body; the channel operation defaults to MxDECODE:
bool byteOrder ; ComxChannel channel ; octet *body ; unsigned long size ; Version version ; ... read message header and body; grab GIOP version, byte order, and message size from header ... comxCreate (version, byteOrder, 12, body, size, &channel) ; ... decode the data in the message body ...
Subsequent calls to comxDataType()
functions will step
through the message body, decoding the specified data types.
Before writing a CORBA message, an application creates a marshaling channel for a buffer in which to store the CDR-encoded message body. By specifying a NULL buffer, the application can let the COMX functions take care of allocating a buffer and extending its size when necessary; in this case, the channel operation is automatically set to MxENCODE:
... set GIOP version explicitly ... comxCreate (version, false, 12, NULL, 0, &channel) ; ... encode the data into the message body ... body = comxBuffer (channel, false) ; size = comxSkip (channel, 0) ; ... write message header and body ...
The byte-order flag is ignored in MxENCODE mode; the host CPU byte order is
assumed. The comxBuffer()
call retrieves the address of the
encoded data before writing the message. The
comxSkip(zero)
call retrieves the number of bytes that
have been encoded; i.e., the message size.
In both examples above, a "virtual" buffer offset of 12 bytes was specified
when a channel was created. This is the size of the IIOP message header
that precedes the message body when the message is transferred over a
network connection. CDR requires that primitive data values be aligned
on "even" boundaries of their size; e.g., 2 bytes for a short integer,
4 bytes for a long integer, and 8 bytes for a float. Even boundaries
are determined relative to the beginning of the message header; hence,
when skipping (MxDECODE) or inserting (MxENCODE) padding to achieve CDR
alignment, the COMX utilities must take into the account the "virtual"
12-byte header. When decoding or encoding a message body, always specify
an offset of 12. The COMX utilities can't simply assume an offset of 12,
since encapsulated CDR data uses an offset of 0 - see
comxEncapsule()
for more information.
The actual conversion functions have a regular calling sequence:
ComxChannel channel ; int status ; DataType value ; status = comxDataType (channel, &value) ;
The address of the value is always passed in. When decoding data, the CDR-encoded data in the channel's buffer is converted to host CPU format and stored in the value. When encoding data, the value is converted from host CPU format to CDR and stored in the channel's buffer. The status returned by the conversion function is zero if the marshaling operation succeeded and an ERRNO code if the operation failed.
The conversion functions for the CDR primitive data types automatically insert/skip the padding required for data alignment, extend the size of a full buffer when encoding, and advance an internal pointer past the decoded/encoded value in the buffer when the operation is complete. The next call to a conversion function will begin at the new location of the pointer. (Alignment continues to be determined relative to the beginning of the buffer minus the virtual offset.)
Conversion functions for more complex data types (constructed data
type in CDR parlance) are easily implemented using the primitive
conversion functions (or other complex conversion functions) without regard
to the channel operation, padding, etc. For example, a GIOP 1.2
ReplyHeader
structure contains a long
request ID, an enumerated reply status, and a service context list.
A ReplyHeader
marshaling function is very simple:
typedef enum ReplyStatusType { // Enumeration declaration. ... } ReplyStatusType ; typedef struct ReplyHeader { // Structure declaration. unsigned long request_id ; ReplyStatusType reply_status ; ServiceContextList service_context ; } ReplyHeader ; int gimxReplyHeader ( // Conversion function. ComxChannel channel, ReplyHeader *value) { comxULong (channel, &value->request_id) ; comxEnum (channel, &value->reply_status) ; gimxServiceContextList (channel, &value->service_context) ; return (0) ; }
[The actual gimxReplyHeader()
function also checks for error returns from the marshaling functions it
calls. The field references are enclosed in NULL_OR()
macros
that check if the value pointer is NULL. By passing in a NULL value
pointer, a value can be decoded and discarded; i.e., not returned to the
caller. I originally thought this capability might be useful, but I've
never had occasion to use it.]
Marhshaling functions for unions must be explicitly coded, although doing
so is trivial. I had hoped to write a comxUnion()
function
similar to XDR's xdr_union()
, but the discriminant of a CORBA
union can be of any data type and a general-purpose marshaling function
would probably require the caller to jump through hoops (e.g., supplying
marshalling and comparison functions for the discriminant). If only
discriminants were limited to enumerated types ...
String is a NUL-terminated (
char *
) character string. When decoding a string, thecomxString()
function dynamically allocates space for the string usingmalloc(3)
. A subsequent call tofree(3)
orcomxErase()
is necessary to deallocate the string.WString is a NUL-terminated (
wchar_t *
) wide-character string. When decoding a wide-character string, thecomxWString()
function dynamically allocates space for the string usingmalloc(3)
. A subsequent call tofree(3)
orcomxErase()
is necessary to deallocate the string.OctetSeq is a sequence of octets; i.e., a byte array. When decoding an octet sequence, the
comxOctetSeq()
function dynamically allocates the array usingmalloc(3)
. A subsequent call tocomxErase()
is necessary to deallocate the sequence.Version is a GIOP version structure, containing a major version number and a minor version number.
Three other data types require special-case marshaling functions:
Array is an array of some data type. Unlike the sequence below, a CDR array does not include information about the number of elements in the array; the applications encoding and decoding the array are assumed to know the number of elements a priori.
Encapsulation is encoded data encapsulated in an octet sequence. The
comxEncapsule()
function is passed a list of conversion functions and addresses of values. In MxDECODE mode, the values are decoded from an octet sequence (previously decoded from a message body). In MxENCODE mode, the values are encoded into an octet sequence (to be encoded into a message body). Data alignment is relative to the beginning of the octet sequence.Sequence is an array of some data type. A sequence is represented in host CPU format as a structure with two fields: the number of elements in the array and a pointer to the array of elements. When decoding a sequence, the array is dynamically allocated; a subsequent call to
comxErase()
is necessary to deallocate the sequence. ThecomxSequence()
function is passed the conversion function for the given data type.
When decoding strings, sequences, etc., the COMX functions dynamically
allocate space for the multi-element values using malloc(3)
.
Once used, the values can be deallocated the correct way or the easy way.
The correct way is to change the marshaling channel operation from MxDECODE
to MxERASE and re-call the conversion function(s):
OctetSeq object ; ... create marshaling channel ... comxOctetSeq (channel, &object) ; // Decode octet sequence. ... use object value ... comxSetOp (channel, MxERASE) ; comxOctetSeq (channel, &object) ; // Erase octet sequence.
The easy way is to ignore the original marshaling channel and call
comxErase()
:
comxErase ((ComxFunc) comxOctetSeq, &object) ;
The second method may silently have problems with values that have GIOP version-specific memory allocation; I haven't encountered this problem yet.
The COMX, GIMX, and IIOP packages were written out of desperation one weekend and refined with use over subsequent weeks. I had to interface a legacy application to some CORBA servers. TAO was too heavy-weight and ORBit's wide-character string functions wouldn't compile. MICO worked, but the documentation was out-of-date. Some MICO-based test clients (correctly) threw exceptions when trying to talk to our TAO-based servers. I put this code together quickly to intercept, dump and analyze what our TAO-based clients were sending to the servers.
The code has been compiled and tested under Linux (RedHat 6.2/7.2), under various versions of Solaris, and under Windows 98/NT4.0 (in a slightly modified C++ version).
Some known shortcomings:
Host CPUs are assumed to be little-endian or big-endian. PDP-11s were "mixed-endian" and I understand that the ARM or MIPS processor stores 64-bit IEEE floats with 32-bit words one-endian and the bytes within the words other-endian.
The standard C integer types are used as the host representation of the CDR integer types. The host CPU types are assumed to be at least as wide as the corresponding CDR types (2, 4, and 8 bytes). If the host CPU types are wider, decoded values are sign- or zero-extended as necessary and encoded values truncate the most significant bytes (of the host CPU representation).
The fixed-point decimal type is not implemented.
The host CPU is assumed to represent floating-point numbers in IEEE 754
format and in the same byte order as integers. It is assumed that the host
CPU supports 32-bit float
s and 64-bit double
s.
The host CPU's long double
s are assumed to be full or
abbreviated versions of CDR's 128-bit long double
s with the
same or fewer bits in the mantissa; this may not be a valid assumption,
so test before using!
The host CPU's wchar_t
wide-character type is assumed
to hold UNICODE characters, an
assumption that is not necessarily true and that is at odds with the
C standard and the UNICODE standard.
Wide characters are partially supported. Our existing CORBA interfaces
use wide strings, but not individual wide characters, so this area needs
more work and testing. For GIOP versions 1.0 and 1.1, the Transmission
Code Set (TCS-W) is assumed to be UTF-16: 16-bit characters transmitted
as unsigned short
s. For GIOP version 1.2 and later, the
TCS-W is again assumed to be UTF-16, but each character is encoded in
3 octets. The byte-order marker (BOM) allowed in GIOP 1.2 is not
currently recognized.
Wide strings are partially supported, enough for my needs. For all GIOP
versions, the TCS-W is assumed to be UTF-16: 16-bit characters transmitted
as unsigned shorts
. Surrogate pairs are supported in
accordance with RFC
2781, although they obviously will not work on a system whose wide
characters are themselves 16 bits wide (e.g., Windows) or less.
... more if I think of anything else ...
On a lighter note, I was a better speller in third grade than I am now. Spelling "marshaling" and "marshaled" with one "l" (as the CORBA specification and, more importantly, Henning and Vinoski do) bothers me; I prefer two "l"s and the dictionary lists both forms. However, I came across a rule that says you double the final consonant in a multi-syllable base word if the stress is on the last syllable. MAR-shal: I humbly admit defeat by the forces marshaled against me and I will henceforth use one "l"! (Of course, Google searches for "marshaling" and "marshalling" [back in the early 2000s] returned 47,400 and 101,000 hits, respectively ...)
NOTE: As a big fan of Victorian novels at the time I wrote the codswallop above (and still a big fan), I suppose I must have gotten so used to British English spelling that I failed to recognize American English spelling! Ahh, the good old days — when one wrote of "to-day", "to-morrow", and "e-mail"; when "just deserts" were just that; when "careen" was not a synonym for "career"; and when you could "discuss" a nice hot cup of cocoa and a digestive biscuit ...
comxBuffer()
- gets a marshaling channel's buffer.comxCreate()
- creates a marshaling channel.comxDestroy()
- destroys a marshaling channel.comxErase()
- erases a decoded data structure.comxExtend()
- extends a marshaling channel's buffer.comxGetOp()
- gets the current marshaling mode.comxGetVersion()
- gets the GIOP version number.comxReset()
- resets the current location to the beginning
of the buffer.comxSetOp()
- configures a channel for decoding, encoding,
or erasing.comxSkip()
- advances the current location in the channel's
buffer.comxToHost()
- converts numbers to host-byte order.comxBoolean()
- decode/encode CDR primitive types.comxChar()
comxDouble()
comxEnum()
comxFloat()
comxLong()
comxLongDouble()
comxLongLong()
comxOctet()
comxShort()
comxULong()
comxULongLong()
comxUShort()
comxWChar()
comxArray()
- decodes/encodes/erases a CDR array.comxEncapsule()
- decodes/encodes/erases a CDR encapsulation.comxSequence()
- decodes/encodes/erases a GIOP sequence.comxString()
- decodes/encodes/erases a GIOP string.comxWString()
- decodes/encodes/erases a GIOP wstring.comxOctetSeq()
- decodes/encodes/erases a GIOP octet sequence.comxVersion()
- decodes/encodes/erases a GIOP version number.comxBooleanSeq()
- decode/encode/erase sequences of CDR
primitive types.comxCharSeq()
comxDoubleSeq()
comxEnumSeq()
comxFloatSeq()
comxLongSeq()
comxLongLongSeq()
comxShortSeq()
comxStringSeq()
comxULongSeq()
comxULongLongSeq()
comxUShortSeq()
comxWCharSeq()
comxWStringSeq()
comx_util.c
comx_util.h