Candle Data Model Reference

Version : Candle 0.10
Published date : Nov 12, 2011

1. Introduction

Candle data model is closely based on XQuery and XPath Data Model (XDM) and XML Schema.

Data types in Candle can be divided into 4 major categories:

2. Candle Data Type Hierarchy

Unlike XML, which is only semi-structured when there is no schema, Candle's literal data is always strongly typed. The type information of an item can be derived from the literal syntax. The detailed syntax of the Candle literal values is specified in Candle Markup Reference.

Most of the Candle types are based on XML Schema types. Below is the type hierarchy of Candle data model:

Candle Type Hierarchy
Figure 1: Candle Type Hierarchy

(Note that some of the data types, including error, duration, type, directory and map, are not supported in current beta release).

2.1 Data Type Categories

Candle data types can be divided into 4 major categories:

3. Data Type Characteristics and Components

3.1 Data Type Characteristics

When data model is concerned, we should differentiate data types by unique characteristics in their data values, rather than the syntax. Some of the important characteristics of the data types are: cardinality, node identity, node reference and document order.

Cardinality and Sequences

The cardinality of a data instance is the count of data items in the instance. If the instance is an item, either atomic or a node, then its cardinality is 1. If the instance is a sequence, then the cardinality is the count of the items in the sequence.

A sequence in Candle must have at least 2 items; otherwise, it is an item, not a sequence. Sequences never contain other sequences; if sequences are combined, the result is always a flattened sequence. In other words, appending (d, e) to (a, b, c) produces a sequence of cardinality 5: (a b c d e). It does not produce a sequence of cardinality 4: (a b c (d e)), such a nested sequence never occurs.

Node Identity and Node Reference

The most important characteristic that tells atomic types apart from node types is node identity.

Atomic values do not have identity. Every instance of the value 5 as an integer is identical to every other instance of the value 5 as an integer.

Each node has a unique identity, whether its constructed dynamically or loaded from some data source. Every node in an instance of the data model is unique: identical to itself, and not identical to any other node.

Node reference is a value used to refer to a node. In Candle, node reference is always expressed as URI, following schemes like IDREF or XPointer. Different node references may refer to the same node.

Only nodes loaded from data sources can be addressed through node reference. Atomic values and nodes constructed dynamically cannot be addressed through any node reference.

Document Order

[Definition: A document order is defined among all the nodes accessible during a given query or transformation. Document order is a total ordering, although the relative order of some nodes is implementation-dependent. Informally, document order is the order in which nodes appear in the serialization of a document.] [Definition: Document order is stable, which means that the relative order of two nodes will not change during the processing of a given query or transformation, even if this order is implementation-dependent.]

Within a tree, document order satisfies the following constraints:

  1. The root node is the first node.

  2. Every node occurs before all of its children and descendants.

  3.  Attribute Nodes associated with that element immediately follow the element. The relative order of Attribute Nodes is stable but implementation-dependent.

  4. The relative order of siblings is the order in which they occur in the children property of their parent node.

  5. Children and descendants occur before following siblings.

The relative order of nodes in distinct trees is stable but implementation-dependent, subject to the following constraint: If any node in a given tree, T1, occurs before any node in a different tree, T2, then all nodes in T1 are before all nodes in T2.

3.2 Data Type Components and Accessors

Besides the important characteristics of the data types, data types also have components. Data components are values stored in a data type. Accessors are defined to retrieve various components stored in a data type.

Even an atomic type can have multiple data components. When we say a type is atomic, we actually means it is non-hierarchical, or flat, in contrast to a node type; an type is atomic does not mean there's only one value in it. For example, a datetime instance 't:003-01-02T11:30:00-05:00' have several value components in it, (2003, 1, 2, 11, 30, 0.0, -PT05H00M).

The accessors defined in Candle are:
Accessor Prototype Semantic
string-value accessor ?string as string - returns the string value of the context item;
typed-value accessor ?value as atomic* - returns the atomized value(s) of the context item;
qualified-name accessor ?qname as qname - returns the qualified name of the context item;
namespace accessor ?namespace as uri - returns the namespace URI of the context item as uri;
kind accessor ?kind as qname - returns the built-in type name of the context item;
type-name accessor ?type as qname - returns the type name of the context item;
root node accessor ?root as node - returns the root node containing the context item;
parent node accessor ?parent as node - returns the parent node containing the context item;
previous node accessor ?previous as node - returns the previous node of the context item;
next node accessor ?next as node - returns the next node of the context item;
attributes accessor ?attributes as node* - returns the attribute nodes of the context item;
children accessor ?children as node* - returns the child nodes of the context item;
URI accessor ?uri as uri - returns the URI that can be used to address the context item;
number accessor ?number as number - returns the numeric value if the type is a numeric type or a measure type;
unit accessor ?unit as qname returns the unit of a measure as qname;
datetime components accessor ?year, ?month, ?day, ?week-day, ?hour,
?minute, ?second,
?millisecond as integer
- returns the corresponding component of the datetime type as int;
color components accessor ?r, ?g, ?b as integer - returns the r, g, or b component of a color as int;
source accessor ?source as string - returns the serialized source text of the context item;

(Some accessors, including ?root, ?parent, ?attributes, ?children, ?uri are not implemented in current beta release).
To be consistent, the accessors are defined on all node types, while they may return empty value on certain data types.

4. Data Model Serialization

Documents in various formats (XML, XHTML, HTML, JSON, MIME, CSV, plain text, etc.) are mapped into and processed under the unified data model during script evaluation. Candle's data model can be seen as the superset of all these source formats. The input format differences only matter during serialization and deserialization.

An input document is serialized back into its original format by default. Candle ensures there's no data loss in the data model during the serialization and deserialization. However, non-significant syntaxtual information (like whitespaces, case of HTML element names) might be normalized during the process.

Candle supports transformation of one document format to another, but there might be data lose in such transformation, due to the limitations of the format. For example, namespace information is lost when you serialize into HTML format.

Some of the Candle data types cannot be serialized into a markup document:


A. Candle Data Types vs. XQuery Data Types

While Candle data types are closely based on XQuery and XML Schema data types, there are substantial differences between them. After the comparison, you'll see that Candle's data model is cleaner and more general. For your convinience, the diagram of XQuery Type Hierarchy is included here:

xquery type hierarchy
Figure 2. XQuery Type Hierarchy

Types that have direct correspondence between Candle and XQuery are:
Though these types have same data model, but their literal syntax in Candle and XQuery can be different. You should refer to Candle Markup Reference for the detailed syntax.

There are a few new types introduced by Candle that do not have corresponding XQuery and XML Schema data types: Some data types defined in XQuery and XML Schema are excluded by Candle: