Class XMLObjectSerializer

java.lang.Object
io.permazen.util.AbstractXMLStreaming
io.permazen.core.util.XMLObjectSerializer

public class XMLObjectSerializer extends AbstractXMLStreaming
Utility methods for serializing and deserializing Database objects in a Transaction to/from XML.

XML Structure

The overall XML format looks like this:


  <database>
      <schemas>
          ...
      </schemas>
      <objects>
          ...
      </objects>
  </database>
 
The <schemas> tag contains the database's schema definitions. Providing these definitions makes it possible to import into a completely empty Database, i.e., one with no prior knowledge of the objects' schema(s).

The <objects> tag contains the actual object data. Each object may belong to a different schema.

Object XML Format

There are two supported XML formats for <object> tags, plain and custom.

The plain object format uses standardized XML element names and identifies object types and fields with a "name" attribute. Because Permazen requires object type and field names to contain only characters that are valid in XML attributes, this format supports all possible database object and field names. Example:


  <object type="Person" id="64a971e1aef01cc8">
      <field name="name">George Washington</field>
      <field name="wasPresident">true</field>
      <field name="attributes">
          <entry>
              <key>teeth</key>
              <value>wooden</value>
          </entry>
      </field>
      <field name="spouse">c8b84a08e5c2b1a2</field>
  </object>
 

The custom object format uses the object type and field names as XML element names, and is therefore more readable. However, it doesn't support object type and field names that are not valid XML element names. Equivalent example:


  <Person id="64a971e1aef01cc8">
      <name>George Washington</name>
      <wasPresident>true</wasPresident>
      <attributes>
          <entry>
              <key>teeth</key>
              <value>wooden</value>
          </entry>
      </attributes>
      <spouse>c8b84a08e5c2b1a2</spouse>
  </Person>
 
When parsing input, the format is auto-detected on a per-XML element basis, depending on whether or not there is a type="..." attribute (for objects) or a name="..." attribute (for fields). When generating an output element in the custom format, the plain format is used for any element which would otherwise result in invalid XML. You can also configure all elements to be generated in the plain format for uniformity.

Schema Determination

On input, the schema against which each <object> element is interpreted is determined as follows:

  • If the <object> tag has a schema attribute, use the previously defined SchemaModel having the specified SchemaId.
  • Otherwise, if the containing <objects> tag has a schema attribute, use that SchemaId
  • Otherwise no explicit schema is specified, so use the schema associated with the Transaction being written into.
In all cases, the selected schema must either be defined in the XML in the <schemas> section, or already exist in the target Transaction.

Object ID Generation

Any object ID (including the "id" attribute) may have the special form generated:TYPE:SUFFIX, where TYPE is the object type name and SUFFIX is an arbitrary string. In this case, a random, unassigned object ID is generated on the first occurrence, and on subsequent occurences the previously generated ID is recalled. This facilitates automatically generated input (e.g., using XSL's generate-id() function), including forward references.

When using object ID generation, the configured GeneratedIdCache keeps track of generated IDs.

Storage ID's and Portability

Permazen assigns storage ID's to schema elements (object types, fields, indexes, etc.) dynamically when a schema is first registered in the database. When exporting schemas and objects from an existing database, you have a choice whether to include or exclude these storage ID assignments in the export.

If you include them, then the XML file is able to exactly reproduce the keys and values in the underlying key/value database. In particular, each object will be assigned the same object ID it had when it was originally exported, and on import if an object with a given object ID already exists, it will be replaced. However, the XML file will be incompatible with (i.e., fail to import into) any database that has different storage ID assignments for any of the object types or fields being imported.

You may instead omit storage ID assignments from the export. This means the data can be imported freely into any database, but it will no longer create the original object ID's, keys and values. In this scenario, object ID's are exported in the form generated:TYPE:SUFFIX, and so a new object is created as each <object> is imported.

In general you should include storage ID's if you are exporting data that will later be imported back into the same database (an "edit" operation) or an empty database (a "copy" operation), and exclude storage ID's if you are exporting data that will later be imported into some other, existing database (a "merge" operation).

Other Details

  • The "id" attribute may be omitted from an <object> tag; in this case, a random, unassigned ID is generated. In this case, the object will not be referenced by any other object.
  • Simple fields that are equal to their default values and complex fields that are empty may be omitted
  • XML element and annotation names are in the null XML namespace; elements and annotations in other namespaces are ignored
  • It is allowed for a custom XML tag to have a redundant name attribute, as long as the name matches
  • Field Details

    • NS_URI

      public static final String NS_URI
      The supported XML namespace URI.

      Currently this is XMLConstants.NULL_NS_URI, i.e., the null/default namespace.

      XML tags and attributes whose names are in other namespaces are ignored.

      See Also:
    • ELEMENT_TAG

      public static final QName ELEMENT_TAG
    • ENTRY_TAG

      public static final QName ENTRY_TAG
    • FIELD_TAG

      public static final QName FIELD_TAG
    • KEY_TAG

      public static final QName KEY_TAG
    • OBJECTS_TAG

      public static final QName OBJECTS_TAG
    • OBJECT_TAG

      public static final QName OBJECT_TAG
    • DATABASE_TAG

      public static final QName DATABASE_TAG
    • SCHEMAS_TAG

      public static final QName SCHEMAS_TAG
    • VALUE_TAG

      public static final QName VALUE_TAG
    • ID_ATTR

      public static final QName ID_ATTR
    • NAME_ATTR

      public static final QName NAME_ATTR
    • NULL_ATTR

      public static final QName NULL_ATTR
    • SCHEMA_ATTR

      public static final QName SCHEMA_ATTR
    • TYPE_ATTR

      public static final QName TYPE_ATTR
  • Constructor Details

  • Method Details

    • getFieldTruncationLength

      public int getFieldTruncationLength()
      Get the maximum length (number of characters) of any written simple field.

      By default, this value is set to -1, i.e., truncation is disabled.

      Returns:
      maximum simple field length, or zero for empty simple fields, or -1 if truncation is disabled
      See Also:
    • setFieldTruncationLength

      public void setFieldTruncationLength(int length)
      Set the maximum length (number of characters) of any written simple field.

      Simple field values longer than this will be truncated. If set to zero, all simple field values are written as empty tags. If set to -1, truncation is disabled.

      Truncation is mainly useful for generating human-readable output without very long lines. Obviously, when truncation is enabled, the resulting output, although still valid XML, will be missing some information and therefore cannot successfully be read back in by this class.

      Parameters:
      length - maximum simple field length, or zero for empty simple fields, or -1 to disable truncation
      Throws:
      IllegalArgumentException - if length < -1
    • isOmitDefaultValueFields

      public boolean isOmitDefaultValueFields()
      Get whether to omit fields whose value equals the default value for the field's type.

      Default true.

      Returns:
      whether to omit fields with default values
    • setOmitDefaultValueFields

      public void setOmitDefaultValueFields(boolean omitDefaultValueFields)
      Set whether to omit fields whose value equals the default value for the field's type.

      Default true.

      Parameters:
      omitDefaultValueFields - true to omit fields with default values
    • getUnresolvedReferences

      public ObjIdMap<ReferenceField> getUnresolvedReferences()
      Get all unresolved forward object references.

      When read() is invoked with allowUnresolvedReferences = true, unresolved forward object references do not trigger an exception; this allows forward references to span multiple invocations. Instead, these references are collected and made available to the caller in the returned map. Callers may also modify the returned map as desired between invocations.

      Returns:
      mapping from unresolved forward object reference to some referring field
    • getGeneratedIdCache

      public GeneratedIdCache getGeneratedIdCache()
      Get the GeneratedIdCache associated with this instance.
      Returns:
      the associated GeneratedIdCache
    • setGeneratedIdCache

      public void setGeneratedIdCache(GeneratedIdCache generatedIdCache)
      Set the GeneratedIdCache associated with this instance.
      Parameters:
      generatedIdCache - the GeneratedIdCache for this instance to use
      Throws:
      IllegalArgumentException - if generatedIdCache is null
    • read

      public int read(InputStream input) throws XMLStreamException
      Import objects pairs into the Transaction associated with this instance from the given XML input.

      This is a convenience method, equivalent to:

       read(input, false)
       
      Parameters:
      input - XML input
      Returns:
      the number of objects read
      Throws:
      XMLStreamException - if an error occurs
      IllegalArgumentException - if input is null
    • read

      public int read(InputStream input, boolean allowUnresolvedReferences) throws XMLStreamException
      Import objects pairs into the Transaction associated with this instance from the given XML input.

      The input format is auto-detected for each <object> based on the presence of the "type" attribute.

      Can optionally check for unresolved object references after reading is complete. If this checking is enabled, an exception is thrown if any unresolved references remain. In any case, the unresolved references are available via getUnresolvedReferences().

      Parameters:
      input - XML input
      allowUnresolvedReferences - true to allow unresolved references, false to throw an exception
      Returns:
      the number of objects read
      Throws:
      XMLStreamException - if an error occurs
      IllegalArgumentException - if input is null
      DeletedObjectException - if allowUnresolvedReferences is true and any unresolved references remain when loading is complete
    • read

      public int read(XMLStreamReader reader) throws XMLStreamException
      Import objects into the Transaction associated with this instance from the given XML input. This method expects to see an opening <objects> as the next event (not counting whitespace, comments, etc.), which is then consumed up through the closing </objects> event. Therefore this tag could be part of a larger XML document.

      The input format is auto-detected for each <object> based on the presence of the "type" attribute.

      Parameters:
      reader - XML reader
      Returns:
      the number of objects read
      Throws:
      XMLStreamException - if an error occurs
      IllegalArgumentException - if reader is null
    • write

      public long write(OutputStream output) throws XMLStreamException
      Export all objects from the Transaction associated with this instance to the given output using the default configuration.

      Equivalent to: write(output, OutputOptions.builder().build()).

      Parameters:
      output - XML output; will not be closed by this method
      Returns:
      the number of objects written
      Throws:
      XMLStreamException - if an error occurs
      IllegalArgumentException - if output is null
    • write

      public long write(Writer writer) throws XMLStreamException
      Export all objects from the Transaction associated with this instance to the given writer using the default configuration.

      Equivalent to: write(writer, OutputOptions.OutputOptions.builder().build()).

      Parameters:
      writer - XML output; will not be closed by this method
      Returns:
      the number of objects written
      Throws:
      XMLStreamException - if an error occurs
      IllegalArgumentException - if writer is null
    • write

      public long write(OutputStream output, XMLObjectSerializer.OutputOptions options) throws XMLStreamException
      Export all objects from the Transaction associated with this instance to the given output.
      Parameters:
      output - XML output; will not be closed by this method
      options - output options
      Returns:
      the number of objects written
      Throws:
      XMLStreamException - if an error occurs
      IllegalArgumentException - if either parameter is null
    • write

      public long write(Writer writer, XMLObjectSerializer.OutputOptions options) throws XMLStreamException
      Export all objects from the Transaction associated with this instance to the given writer.
      Parameters:
      writer - XML output; will not be closed by this method
      options - output options
      Returns:
      the number of objects written
      Throws:
      XMLStreamException - if an error occurs
      IllegalArgumentException - if either parameter is null
    • writeObjects

      public long writeObjects(XMLStreamWriter writer, XMLObjectSerializer.OutputOptions options, Stream<? extends ObjId> objIds) throws XMLStreamException
      Export the specified objects from the Transaction associated with this instance to the given XML output.

      This method writes a start element as its first action, allowing the output to be embedded into a larger XML document. Callers not embedding the output may with to precede invocation of this method with a call to writer.writeStartDocument().

      Parameters:
      writer - XML writer; will not be closed by this method
      options - output options
      objIds - object IDs
      Returns:
      the number of objects written
      Throws:
      XMLStreamException - if an error occurs
      IllegalArgumentException - if any parameter is null
    • next

      protected QName next(XMLStreamReader reader) throws XMLStreamException
      Skip forward until either the next opening tag is reached, or the currently open tag is closed. This override ignores XML tags that are not in our namespace.
      Overrides:
      next in class AbstractXMLStreaming
      Parameters:
      reader - XML input
      Returns:
      the XML opening tag found, or null if a closing tag was seen first
      Throws:
      XMLStreamException - if no opening tag is found before the current tag closes