CARDFILE DATA FILE FORMAT

WHAT IS CARDFILE

CARDFILE.EXE is a small, simple, database program that Microsoft included in Windows 3. It has a fixed database format consisting of a 39 character title, an optional text field of 440 characters, and a single optional OLE object. The title is also used as the sort key. The OLE objects can be pretty much anything that can be accessed via OLE -- a picture (BMP format only?), a sound, a program link, etc.

The cardfile program was ignored by most people, but a few people have found uses for it. Although CARDFILE was only shipped with Windows 3 (and Windows NT rel 3.51?), it runs fine with Windows 9, Me, 2000, and XP. It runs after a fashion on WINE under Linux but there are serious font problems and the OLE functions do not work. The WINE problems may or may not be insurmountable. (I'm looking into it Note, Aug 2010 -- never found a satisfactory answer).

The cardfile data format is extrordinarily complex for a rather simple program. For a wonder, Microsoft has actually documented the format http://support.microsoft.com/default.aspx?scid=kb%3Ben-us%3B99340 Support Doc 99340. Regretably, the description document, while possibly correct, is quite incomprehensible. I had need to convert a cardfile data base to another format. I tried to use the MS documentation to do so and found that decoding the document was not really possible. At least I can't do it. So, I wrote my own description document

WHAT IS THE FORMAT OF THE CARDFILE DATA FILE?

The cardfile file can be viewed as having three sections. There is a short header. The header is followed by a section containing the index data as a set of 52 byte records. The index data is followed by a set of variable length records containing object and/or text data for each index entry. The Windows 3 version of cardfile creates and uses eight bit ASCII encoding of text. The MS documentation implies that there is also a 16 bit Unicode varient of the file, but gives no real clue as to which fields are expanded to 16 bits per character, and whether or not there are other changes to the file format.
SectionContent
HeaderSignature,Last Object ID,Number of Cards
Indices39 character titles, pointer into Data section, and some trash
DataOne Object and/or ASCII text field for each Index. In same order as Indices

In the following discussion, all numbers describing the document are decimal counting from 1 unless otherwise stated. Pointers within the file are, of course binary, and presumably count from zero. Thus, for example, there is a data pointer (which will be binary, zero-based) in bytes 7-10 (decimal, counting from byte 1) of each index section entry. Technically, ASCII is a seven bit code with an eighth, undefined, leading bit. It is not clear whether the 8th (leading) bit of characters -- often used to encode non-ASCII characters -- is preserved. The MS documentation implies that it might not be, but it would take explicit effort not to preserve it, so it possibly is carried along. A quick (and not very thorough) test indicates that 8 bit are in fact preserved in both the title and text fields. The extra 128 characters may have to be input using right-Alt and the keypad. The exact character displayed will depend on the choice of code page used on the target PC.

Non-ASCII data storage appears to be in the usual unpleasant to work with Wintel Least Significant Byte First format.

Cardfile data files normally have the extension .CRD.

Known limits on cardfile. Most limits resulting from the format.
ValueMaximum
Index Text Size39 Bytes
Data Text Size440 bytes (End of Line can be embedded, but each EOL consumes two bytes)
Number of cards in file65535
Maximum length of the fileAbout 4.29GB, but the OS will very likely limit it to less

THE CARDFILE HEADER

The Cardfile header consists of a signature. This is normally the characters RRG, but MS implies that it could be DKO or MGC. It is unclear whether the signature has anything to do with the possible use of Unicode characters. The RRG signature files seem to contain one more four byte word in the header than do the MGC files. Aug 2010 See Afternote (somewhere below) for more information on MGC files
CARDFILE HEADER RECORD
BytesContentsNotes
1 thru 3The ASCII characters 'RRG' ('525247' hex)Might be 'DKO' or 'MGC'. It's not clear which signatures the format described here applies to.
4 thru 7ID of Last ObjectIt's not clear what, if anything, this is good for. I suspect that this might be used to create the next "Unique Object ID" in the Object data record by adding one to the current value. It appears not to be present in the MGC signature file format.
8 and 9 and (10 and 11)Number of 'cards' (entries) in the fileThis is binary (of course) and appears to count from 1. And it's four bytes not two.
(12-15)Four bytes. Always 0?. Not documented

THE INDEX SECTION

The Index section contains one fixed length record (52 byte) for each card (entry) in the file. An Index entry consists of a 33 byte text description of the item, a 32 bit pointer to the rest of the data, and some trash. The cards presented to the user are ordered by the text description which is sorted as an ASCII string. The text descriptions are zero terminated strings and do not have to be a full 33 bytes. The remainder of the string will be filled with whatever was there previously -- a small security risk. The risk is small, not because failing to blank or zero fill the unused data space is a good idea, but because it isn't very likely that the inadvertantly exposed data will happen to be safe combinations, passwords, etc.
INDEX SECTION RECORD FORMAT
BytesContentsNotes
1 thru 6Unused bytes ('Reserved')Should be '000000 hex'
7 thru 10Pointer to Card's Data (i.e. object and/or data)This appears to be binary relative to the start of the file.
11'Flag Byte'This is always 0. I infer that the byte is probably not used
12-51Index TextASCII Text--Zero terminated
52Null Byte (always 0)I infer this is here in order to make sure that an improperly terminated Index string doesn't cause trouble
(53)Null Byte? (always 0?)There appears to be one additional undocumented zero byte here

THE DATA SECTION

The cardfile data section consists of variable length records in the same order as the Index section. These records may contain an object and/or a text field (up to 440 bytes) or neither. The object, if it is present, is stored first. The object formats are apparently in some standard Microsoft format that can be passed to the appropriate Windows API logic simply by passing a pointer to the first byte. There are three different object formats. To further add to the fun, it is necessary to skip over the object (if it is present) in order to get to the text data. And none of the three object formats contains anything as mundane as an explicit indication of the length of the object.

Needless to say the "format" of the Data Section is tedious to describe and frustrating to deal with. I'm going to treat it as a series of records each of which contains three subrecords. e.g.
Data Section Record 1Data Section SubRecord 1 - Header
Data Section SubRecord 1 - Object (Optional)
Data Section SubRecord 1 - Text
Data Section Record 2Data Section SubRecord 2 - Header
Data Section SubRecord 2 - Object (Optional)
Data Section SubRecord 2 - Text
.... ....

DATA SECTION SUBRECORD HEADER

The header for each data section record consists of a two byte flag which will be non-zero if an objectrecord is present. A text record is assummed to exist. If no text is present, the text record will consist of two zero bytes.
DATA SECTION SUBRECORD HEADER
BytesContentsNotes
1 and 2Object flag00hex = No Object
Anything other than 00hex -- Object Present (So far, the only values I have seen here are 0 and 1)

DATA SECTION SUBRECORD -- OBJECT

The optional object is an Embedded, Linked, or Static OLE object whose format is purportedly fully described in Appendix C to the "Object Linking and Embedding Programmer's Reference (Version 1.0). The format is also purportedly described in the Windows SDK "Programmer's Reference, Volume 1: Overview, Chapter 6, Object Storage Format". Since I, like most users, do not intend to use or alter the object object contents, I did not verify these references. However, I believe that the Windows SDK is available from Microsoft as a half gigabyte (compressed!) download.

What is presented here is what Microsoft talks about in their card file description modified by what I can observe in file dumps. This is just enough information to skip over, read, or restore an unaltered OLE object. It is possible that a properly formed call to OleLoadFromStream in the Windows API passing the appropriate pointer to the Object will read (?) and/or invoke the object.

The layout format of the object data given in MS99340 appears to contradict the format implied by the Algorithm's Section of the same document. I suspect that the two are not actually contradictory, but that in order to make them consistent, one needs to look at the world from some very unobvious perspective. Not being clairvoyant, I am using the Alogorithm implied layout which I (perhaps mistakenly) believe that I can understand. Strictly speaking, it's correct, but somewhat unhelpful because it fails to explicitly define a few minor things like existence and layout of the first twelve bytes of the object entry.

DATA SECTION RECORD -- OBJECT
BytesContentsNotes
1 thru 4Unique Object IDThis is described as the "Unique Object ID". It seems to be set to a small 32 bit integer. I don't really know what this is good for.
5 thru 8OLE Version IDPer MS99340 this is Version 1.0 -- '01 00'. In practice, it seems always to be 01 05 00 00. (I may have combined two 16 bit fields into one 32 bit number here)
9 thru 12Object FormatSo far, I have been unable to create any format other than 2=Embedded Object. I suspect that in practice, Cardfile really can't create Linked or Static Objects. Therefore, I have described the Embedded Object format first then described the differences that can be expected in the other two formats.

Object Format:

  • 1 = Linked Object
  • 2 = Embedded Object
  • 3 = Static Object

This seems inconsistent with the encoding in the layout section (0 = embedded, 1 = linked, 2 = static). Look folks -- I didn't design this. I'm only the messenger here.

13 thru 16Length of the 'Class String' which turns out to be the number of characters in the string + 1
17 thru n where n = the length of the Class String + 1The Class String -- e.g. "Package"The Class String is not only "counted" with a prefixed 32 bit byte count, but is terminated by a byte containing 0. The zero byte is included in the prefixed count
n+1 thru n+4Four Bytes of 00No clue what this is
n+5 thru n+8Four Bytes of 00No clue what this is
n+9 thru n+12Four Bytes of Object data sizeSize of the Object Data in bytes
n+13 thru pObject DataThis is specific to the object type. (i.e. an embedded link package is layed out quite differently internally from an embedded bitmap object but we know the length so we can skip over the content and its details.)
'Presentation Object' -- It's not 100% clear what a Presentation Object is, but it is a separate object that immedately follows each data object (Perhaps it is the icon displayed on the card for inserted objects)
p thru p+3OLE Version IDPer MS99340 this is Version 1.0 -- '01 00'. In practice, it seems always to be 01 05 00 00. (I may have combined two 16 bit fields into one 32 bit number here) OTOH, this field may not exist.
p+4 thru p+705 00 00 00At a guess this is the object type for a Presentation Object
p+8 thru p+11Length of the 'Class String' which turns out to be the number of characters in the string + 1
p+12 thru q where p = the length of the Class String + 1The Class String -- e.g. "Package"The Class String is not only "counted" with a prefixed 32 bit byte count, but is terminated by a byte containing 0. The zero byte is included in the prefixed count
q thru q+3Four Bytes of Object data sizeSize of the Object Data in bytes
q+4 thru rObject DataThis is specific to the object type. (i.e. an embedded link package is layed out quite differently internally from an embedded bitmap object but we know the length so we can skip over the content and its details.)



5 thru n (n=2 + the length of the Class Screen Text)Class StringThis appears to be a 'counted string' -- two bytes of length (LSB first) followed by non-zero terminated ASCII. The Class String seems to be a formalized description of the object: BITMAP, METAFILEPICT,...etc .

At this point, I'm going to break out the three formats as separate tables. There is some additional logic embedded in the tables based on presentation object types that I did not use separate tables for. That's logically inconsistent, but I feared that the number of tables I'd end up with if I did that would add even more confusion to this horrendous data structure.


For Linked Object (Format =1)
n+1 thru m (where m=n+2+the length of the Network Name)Network NameThis appears to be a 'counted string' -- two bytes of length (LSB first) followed by non-zero terminated ASCII.
m+1 to m+2Network Type and Network Driver VersionEncoding of the information into the 16 bit value allocated is not specified in MS99340.
m+3 to m+4Link Update OptionsEncoding of the information into the 16 bit value allocated is not specified in MS99340.
m+5 to p (where p=m+4+length of the presentation object)Presentation ObjectThis appears to be a second object tacked onto the end of Linked and Embedded (but not Static) OLE objects.
m+5 to m+8Presentation Object 'Unique ID'The Presentation Object Version ID ('0100 hex') and Format (which, fortuitously, is ignorable when skipping the object). See bytes 1-4 description (above) for further information on the Unique ID.
m+9 thru q (where q=m+11+length of this Class String)Class StringThis appears to be a 'counted string' -- two bytes of length (LSB first) followed by non-zero terminated ASCII. Unlike the Class string in bytes 5-n, we need to evaluate this value
For Presentation Object (Embedded in Link or Embedded Object) with Class String=METAFILEPICT,BITMAP or DIB
q+1 to q+2Character WidthCharacter width in mmhimetric (whatever that is)
q+3 to q+4Character HeightCharacter height in mmhimetric
q+5 to p (where p=q+7+length of Presentation Object)Presentation ObjectA 'counted variable' consisting of a 16 bit byte count and the object itself
For Presentation Object (Embedded in Link or Embedded Object) with Class String other than METAFILEPICT,BITMAP or DIB
q+1 to q+2Clipboard FormatA 16 bit integer. Encoding is not given in MS99340, but I infer that 0=NULL
For Presentation Object (Embedded in Link or Embedded Object) with Class String other than METAFILEPICT,BITMAP or DIB and Clipboard format = NULL
q+3 to p (where p=q+7+length of Presentation Object)Presentation ObjectA 'counted variable' consisting of a 16 bit byte count and the object itself
For Presentation Object (Embedded in Link or Embedded Object) with Class String other than METAFILEPICT,BITMAP or DIB and Clipboard format not equal Null
q+3 to r (where r=q+3+length of Clipboard Format Name)Clipboard Format NameA 'counted String' consisting of a 16 bit byte count and the non-zero terminated string.
r to p (where p=r+2+length of Presentation Object)Presentation ObjectA 'counted variable' consisting of a 16 bit byte count and the object itself

For Embedded Object (Format =2)
n+1 thru m (where m=n+2+the length of the Native Data)Native DataThis appears to be a 'counted variable' -- two bytes of length (LSB first) followed by a block of data.
m to p (where p=m+4+length of the presentation object)Presentation ObjectThis appears to be a second object tacked onto the end of Linked and Embedded (but not Static) OLE objects.
m to m+4Presentation Object 'Unique ID'The Presentation Object Version ID ('0100 hex') and Format (which, fortuitously, is ignorable when skipping the object). See bytes 1-4 description (above) for further information on the Unique ID.
m+5 thru q (where q=m+7+length of this Class String)Class StringThis appears to be a 'counted string' -- two bytes of length (LSB first) followed by non-zero terminated ASCII. Unlike the Class string in bytes 5-n, we need to evaluate this value
For Presentation Object (Embedded in Link or Embedded Object) with Class String=METAFILEPICT,BITMAP or DIB
q+1 to q+2Character WidthCharacter width in mmhimetric (whatever that is)
q+3 to q+4Character HeightCharacter height in mmhimetric
q+5 to p (where p=q+7+length of Presentation Object)Presentation ObjectA 'counted variable' consisting of a 16 bit byte count and the object itself
For Presentation Object (Embedded in Link or Embedded Object) with Class String other than METAFILEPICT,BITMAP or DIB
q+1 to q+2Clipboard FormatA 16 bit integer. Encoding is not given in MS99340, but I infer that 0=NULL
For Presentation Object (Embedded in Link or Embedded Object) with Class String other than METAFILEPICT,BITMAP or DIB and Clipboard format = NULL
q+3 to p (where p=q+7+length of Presentation Object)Presentation ObjectA 'counted variable' consisting of a 16 bit byte count and the object itself
For Presentation Object (Embedded in Link or Embedded Object) with Class String other than METAFILEPICT,BITMAP or DIB and Clipboard format not equal Null
q+3 to r (where r=q+3+length of Clipboard Format Name)Clipboard Format NameA 'counted String' consisting of a 16 bit byte count and the non-zero terminated string.
r to p (where p=r+2+length of Presentation Object)Presentation ObjectA 'counted variable' consisting of a 16 bit byte count and the object itself

For Static Object (Format =3)
n+1 thru m (where m=n+2+the length of the Native Data)Native DataThis appears to be a 'counted variable' -- two bytes of length (LSB first) followed by a block of data.
m to p (where p=m+4+length of the presentation object)Presentation ObjectThis appears to be a second object tacked onto the end of Linked and Embedded (but not Static) OLE objects.
m to m+4Presentation Object 'Unique ID'The Presentation Object Version ID ('0100 hex') and Format (which, fortuitously, is ignorable when skipping the object). See bytes 1-4 description (above) for further information on the Unique ID.
m+5 thru q (where q=m+7+length of this Class String)Class StringThis appears to be a 'counted string' -- two bytes of length (LSB first) followed by non-zero terminated ASCII. Unlike the embedded and link formats, we can ignore this value because we seem to know that a presentation value follows
q+1 to q+2Character WidthCharacter width in mmhimetric
q+3 to q+4Character HeightCharacter height in mmhimetric
q+5 to p (where p=q+7+length of Presentation Object)Presentation ObjectA 'counted variable' consisting of a 16 bit count and the object itself

DATA SECTION SUBRECORD -- TEXT

The data section text subrecords consist of a byte count and the text which is not zero terminated. If no text record exists, the byte count is set to 0.
DATA SECTION SUBRECORD -- TEXT
BytesContentsNotes
1 and 2Text Byte Count0 to 440. 0 indicates no text
3 to x (maximum 442)TextThe card's text field. Up to 440 bytes. End Of Lines (CR-LF) can be embedded in the text.

AFTERNOTE ON MGC FORMAT CARDFILE

The following is the text from a message I received in August 2010 from Steve Metter in Dayton, OH regarding MGC format card files:
* This describes the "MGC" style of .CRD file. All values are little
endian.
 * File contains three sections: header, index, and data.
 * Header is 11 bytes, Index is 52 bytes times number of entries in file,
 * appearing in physical sort order, and Data is remainder of file.
 *
 * 0 - 2        MGC signature
 * 3 - 6        Number of cards in file (1250 / 4E2 (4 226) in Chuck's case)
 * 7 - 10       Four bytes of indeterminate meaning, 0 in my single example.
 * 11           Beginning of Index Entry table: Each entry is 52 bytes long:
 *                              +0 - +3 Absolute byte offset to data entry
                                        for this index entry.
 *                              +4 - +51 Null terminated string used for sort.
                                         (seems like a lot of wasted space),
                                         but I don't know what it is.
 *                      Each Data Entry:
 *                              +0 - +1 No idea what these bytes mean - both
                                        are null in my single example.
 *                              +2 - +3 Length of data in bytes - my example
                                        contains ascii characters, where each
                                        line is terminated by 13 10.
 *                              +4 - +n Data
 *

From Steve Metter, Dayton, OH, August 2010