Introduction
Fast Infoset, an international standard (ITU-T Rec. X.891 and ISO/IEC 24824-1), is a binary representation of the information contained in an XML document — the XML Information Set. Fast Infoset documents are generally smaller in size and faster to read and write than equivalent XML documents. The information contained in any valid XML document can be represented in a Fast Infoset document. While all information is preserved when converting an XML document into a Fast Infoset document, the exact formatting of the XML document (such as the ordering of attributes within an element) is lost. Depending on the content of an XML document, the equivalent Fast Infoset document can be one quarter the size of the original XML document, or even less.
How Fast Infoset Works
Fast Infoset uses the following measures to compress the XML information set and to reduce parsing performance:
- Element, attribute and tag names, as well as namespace names are indexed and only the first occurrence of such a name is stored in the document. For further occurences of the name only a numerical ID referencing the name is stored.
- Only start tags, but not end tags are stored. End tags are represented by a generic end tag marker.
- Character data (attribute values and element content) can be indexed as well. Whether indexing makes sense depends on the actual character data content.
- Certain character data, such as lists of integers and floating point numbers, can be transformed in an equivalent binary representation, using so-called encoding algorithms.
- Binary data can be directly stored in a Fast Infoset document, without the need for text-based encodings like Base64 or HexBinary.
The Applied Informatics FastInfoset library uses the same programming interfaces as the XML library from the POCO C++ Libraries. Therefore, it is easy to add Fast Infoset support to an existing XML application using the POCO XML Library.
Fast Infoset Performance
Fast Infoset parsing performance is between a factor of two to four better than XML parsing performance. The exact performance improvement depends on a number of factors, though. Most important is the content and size of the XML information set. For documents containing lots of character data and only a few elements and attributes, the improvements will be minimal. However, for documents containing lots of elements with short, repeating character data strings in between, improvements both in resulting Fast Infoset document size and parsing speed will be great.