Candle Markup Reference

Version : Candle 0.13
Published date : Jun 13, 2013
Table of contents
1. Introduction
2. Namespace Declaration
3. Literal Value
4. Text, Data and Comment Node
5. Element and Attribute Node
6. Object Notation
7. Markup Document
8. Summary
Appendices
A. References
B. Candle Markup vs. MicroXML
C. Candle Markup vs. HTML5
D. Candle Markup vs. JSON
E. Candle Markup vs. YAML
F. Candle Markup vs. Several Alternatives
G. A List of Candle Markup Examples

1. Introduction

Candle Markup is a subset of the Candle language that is used as a document format for static data. The syntax of Candle Markup is designed based on XML, but have many differences, which are documented in detailed in the following document.

The major advantages of Candle Markup over XML are:
Candle Markup is an ideal format for general-purpose data serialization. It works well for both structured object data and mixed text content. It has a terse and readable syntax, as well as, a clean and strongly-typed data model, much better than many existing textual serialization formats like XML, JSON, YAML.

2. Namespace Declaration

namespace-declaration  =  "namespace", (default-namespace-declaration |
prefixed-namespace-declaration),
(",", prefixed-namespace
-declaration)*, ";" ;
default-namespace-declaration  =  fully-expanded-namespace | candle-prefixed-namespace | uri;
prefixed-namespace-declaration  =  name, "=", fully-expanded-namespace | prefixed-namespace | uri;
fully-expanded-namespace  =  "ns", (":", name)+;
candle-prefixed-namespace  =  "candle", (":", name)+;
prefixed-namespace   name, (":", name)+;
name  = ("_" | letter | name-escape), ("_" | "-" | "." | letter | digit | name-escape)*
name-escape  = ("%", hex-digit, hex-digit)+;    !! similar to URI escape sequence

Here's an example of namespace declaration statement:
namespace ns:default:name:space, sty=candle:style, svg='http://www.w3.org/2000/svg';

The default namespace declaration does not have any prefix, and must proceed any prefixed namespace declaration. The prefixed namespace declaration has two parts, the namespace prefix and the namespace value, separated by the '=' token.

Namespace value in Candle can take two syntax forms. The recommended form has a syntax similar to Java namespace. It is a hierarchy of names, starting with ns and separated by token ':'. Candle does not mandate domain names to be used after the top level name ns, but for the names to be globally unique, you should do so. The top level name ns stands for namespace; it allows us to easily differentiate fully expanded name from prefixed name. The other namespace form accepted is the URI syntax. This is for backward compatibility with XML. The top level name candle is just a reserved prefix that expands to ns:org:candlescript.

Qualified names in Candle are hierarchical as well. Besides the prefixed syntax, Candle Qname can also use the fully expanded syntax, starting with ns.

qname  =  fully-expanded-qname | prefixed-qname | name;
fully-expanded-qname  = "ns", (":", name)*, ":", name;
prefixed-qname = prefix, (":", name)*, ":", name;
prefix  = name - ("ns" | "xmlns");

Here's an example element in the same document containing the above namespace declaration:
<foo bar=nil:baz svg:d=(M,100,50,L,75,110,Z) sty:fee:foe=123>
ns:org:candlescript:style:qux
</foo>

Qualified names in a markup document are resolved based on the following simple rules:
The above rules apply to any Qname in the document, whether it is element name, or attribute name, or Qname in the literal value.

When a Candle Markup document is exported as XML document, the Qname in the new namespace notation can be easily mapped back into XML URI syntax:
So a Qname like ns:org:candlescript:style:qux in Candle is mapped into namespace URI 'ns:org:candlescript:style' and local name qux in XML. The top level name ns can be considered as an unregistered URI scheme name.

Several namespace prefixes are reserved in Candle, and you cannot use them as general namespace prefixes. They are: Candle namespace declaration differs from XML in a few ways:

3. Literal Value

Whitespace and lexical comment:

space  =  ("&sp;" | "&tb;" | "&cr;" | "&lf;" | line-comment | block-comment)+;
line-comment  =  "!!", (char - ("&cr;" | "&lf;"))*, ("&cr;" | "&lf;");
block-comment  =  "!*', ((char - ("!" | "*")), ("!", char - "*") | ("*", char - "!") | block-comment)*, ("!" | "*")?, "*!";

Besides the normal whitespace, Candle Markup also support lexcial comments similar to comments in most scripting languages. These lexical comments are stripped during parsing and are not preserved in the data model. These lexical comments are allowed anywhere normal whitespace characters are allowed, e.g.: <element !*comment*! attribute=value/>. This can be useful, for example, to comment out an attribute.

There are two types of lexical comment, one is line comment that starts with !! and extends to the end of the line, and the other is the block comment that starts with !* and ends with *!. And block comments can be nested.

The literal values in Candle is an unification of those defined in XML Schema and CSS. They are:
literal = empty | boolean | integer | decimal | double | float | measure | string |
literal-qname | uri | color | id | datetime | binary;
empty    :=    ("(", ")") | ("'", "'");
boolean = "true" | "false";
integer = ("+" | "-")?, ("0" | ((digit - "0"), digit*)), integer-type-indicator?;
integer-type-indicator = "B" | "UB" | "S" | "US" | "I" | "UI" | "L" | "UL";
decimal = ("+" | "-")?, ( (digits, ".", digit* ) | (".", digits) );
double = "nan" | "infinity" | (decimal, ("e" | "E"), ("+" | "-")?, digits);
float = double, float-type-indicator;
float-type-indicator = "F";
string = "&dq;", (char-span | entity)*, "&dq;";
char-span = (char - ("&amp;" | "&dq;"))+;
entity = "&amp;", (name | ("#", digits) | ("#x", hex-digits)), ";";
uri = "'", (char - "'")*, "'";
literal-qname = qname;
measure = (integer | decimal | double), unit;
unit = "px" | "pt" | "pc" | "em" | "ex" | "cm" | "mm" | "in" |
"deg" | "grad" | "rad" |
"ms" | "s" | "min" | "h" | "d" | "wk" | "mo" | "yr" |
"%";  !! token "%" is for type percentage
color = "#", hex-digits;
!! 3, 4, 6 or 8 hex-digits
datetime = "!", (date-time | date | time | year-month | year |
month-day | month | day)
, "!";
!! datetime syntax is closely based on XML Schema Datatime
date-time = "-"?, yyyy, "-", mo, "-", dd, ("T"|"t"|" "), time;
!! we allow T, t and single space ' ' between date and time
date = "-"?, yyyy, "-", mo, "-", dd, timezone?;
time = hh, ":", mm, (":", ss)?, (".", ms)?, timezone?;
!! seconds in time is optional
year-month = yyyy, "-", mo;
year = yyyy, ("y" | "Y" | timezone);
month-day = "--", mo, "-", dd, timezone?;
month = "--", mo, timezone?;
day = "---", dd, timezone?;
timezone = (("+" | "-") hh ":" mm) | "Z" | "z";
!! we allow Z and z
yyyy = digit, digit, digit, digit;
mo = digit, digit;               !! month range 1~12
dd = digit, digit;               !! day range 1~31
hh = digit, digit;               !! hour range 0~24
mm = digit, digit;               !! minute range 0~59
ss = digit, digit;               !! second range 0~59
ms = digit, digit, digit;        
!! we support only milliseconds in this beta release
binary = "!", s?, hex-digit*, s?, "!";

Some literal value examples:
Suffixes are used to denote different integer and floating-point types:
B - byte, i.e. 8-bit integer;
UB - unsigned byte;
S - short, i.e. 16-bit integer;
US - unsigned short;
I - int, i.e. 32-bit integer;
UI - unsigned int;
L - long, i.e. 64-bit integer;
UL - unsigned long;
F - 32-bit floating point

Named Character Entities

Candle supports 3 groups of predefined named character entities:

4. Text, Data and Comment Node

text  =  "&dq;", (char-spanentity)*, "&dq;";
cdata = "<<<", ( cdata )*, ">>>";        !! any char span excluding ">>>" and not ending with ">"
comment = "<!--", ( comment )*, "-->";     !! any char span excluding "--"
data = literal;

Candle has 5 types of nodes: text, data, comment, attribute and element node.

Text Node

Text nodes in Candle must be explicitly quoted with double quote character ". In this way, Candle does not suffer from the whitespace-ambiguity problem in XML.

Consecutive text nodes in Candle are merged as one text node as in XML. For example, the element below contains only one text node:
<element> "this is a text node" "this is the second span of the same text node" </element>

CData text node in Candle uses triple-quote syntax, which is similar to Python. The equivalent of XML CData <![CDATA[text]]> in Candle is <<<text>>>.

Data Node

Data nodes are atomic values wrapped in element.

Text node is actually a special data node which has string literal value.

Comment Node

Unlike lexical comments, Candle comment node is a node presented in Candle data model, after parsing the document.

Processing instructions in XML documents are treated as comment nodes in Candle.

5. Element and Attribute Node

element  :=  start-tag, id-attr?, attribute*, ("/>" | (">", element-content, end-tag, ">"));
start-tag  =  "<", qname;
end-tag = "</", qname;
attribute := qname, "=", attribute-value; 
id-attr = "#", name;
attribute-value = literal | literal-sequence | literal-array | element | object | anonymous-object;
literal-sequence := "(", (literal | literal-sequence), (",", (literal | literal-sequence))*, ")";
literal-array := "[", (literal | literal-sequence), (",", (literal | literal-sequence))*, "]";
element-content := (element | objecttextcdatacommentdata)*;

The element and attribute syntax in Candle is similar to XML, except that:
Take note that the underlying data model for attribute and element are slightly different. For attribute with a list of literal values, it is still considered one node. Whereas for the list of literal values contained in an element, each literal value is a data node by itself. For example, the element below contains 7 data nodes, instead of one data node:
<path> M 100 50 L 75 110 Z </path>

To avoid syntax ambiguity, anonymous object syntax is only allowed as attribute value but not allowed as child node. You have to write the object name when used as child node, e.g. element { candle:core:object { "..." } }.

Array as Attribute Value

Candle 0.13 release introduced built-in array support as attribute value. The difference between sequence and array is that empty items are collapsed in sequence but not in array. For example, sequence (1, (), 2) contains 2 items, but array [1, (), 2] contains 3 items. Array can be useful to pass a list of values as parameters to a function, where collapsing of empty items are not desirable. When array is used as attribute value, it cannot contain nested array.

ID Attribute and ID Shorthand

In Candle, ID on an element or object has to be defined using the reserved attribute candle:core:id, e.g. <element candle:core:id=ident /> or object { candle:core:id=ident }. The value of the attribute should be a qname, and is normally under nil namespace.

A syntax shorthand is supported to define ID on an element or object, e.g. <element #ident /> or object #ident {}. Its semantics is the same as the full ID attribute.

6. Object Notation

In the latest release of Candle, a new object notation is introduced:
object  =  normal-object;
normal-object  :=  qname, id-attr?, ("{", attribute*element-content, "}") - ("{", "}");

Here's an example of the new object notation:
New Object Notation Equivalent Element Notation Similar JavaFX Object Notation
Customer {
firstName = "John"
lastName = "Doe"
phoneNum = "9555-0101"
address = Address {
street = "1 Main Street"
city = "Santa Clara"
state = "CA"
zip = "95050"
}
"a text node"
}
<Customer
firstName = "John"
lastName = "Doe"
phoneNum = "9555-0101"
address = <Address
street = "1 Main Street"
city = "Santa Clara"
state = "CA"
zip = "95050" />
>
"a text node"
</Customer>
Customer {
firstName: "John";
lastName: "Doe";
phoneNum: "9555-0101";
address: Address {
street: "1 Main Street";
city: "Santa Clara";
state: "CA";
zip: "95050";
}
/* text node not supported*/
}
This new object notation is just an alternative syntax to the element notation. Its advantage over element notation is that it is terser. Thus can be more readable for usecases like configuration files, passing options for functions, where there's only structured data but no mixed text content.

Data model wise, an object is treated exactly the same as element.

The new object notation is also similar to some existing object notations, like JavaFX literal object and Groovy Markup.

Complex Attribute Content

With this new release, attribute can directly hold an element or object as its content. You can consider such attribute to be an element node with an extra attribute name attached to it.

One of the limitation of XML is that attribute cannot hold complex content, thus it forces XML applications to use abstract syntax or use element as the work-around. Such work-around sometimes can be unnatural and inefficient.  XML Schema specifies some convention for list of values to be used as attribute and element value. However, it has severe syntax ambiguity and is not usable in a general manner.

With Candle's extended model, application designers have the flexibility to choose between attribute and element and can use the most appropriate language construct during modeling.

In current XML applications, you can easily find many instances where the data can be better expressed using this extended attribute notation. For example, the inline CSS style attribute in HTML, the path, transform, fill and stroke attributes in SVG, and many attributes in X3D and scene description languages. Generally, if the application is about structured data rather than mixed text content, then complex attribute content is almost inevitable.
As you can see, in XML syntax, the d or style attribute use an abstract syntax that can only be understood by the SVG or CSS application, but not by the general XQuery or XML Schema processors. Whereas in Candle, the d and style attribute use a general syntax and have a well-defined data model that can be processed by general Candle query and schema processor.

Anonymous Object

anonymous-object  :=  "{", attribute*element-content, "}") - ("{", "}";

Anonymous object is a syntax sugar to allow you to write an object without a name. This can be convenient espacially when used as attribute content. It can also be used as an map to hold a list of name-value pairs. Such data structure is also called assosiative array or dictionary in other programming languages.

Note that an anonymous object is only 'anonymous' in syntax. In the data model, it is still given a default object name as candle:core:object.

Unified Data Model

Candle's new data model cleanly unifies the markup data model in XML and the object data model in OOP. It can actually be seen as the superset of the two data models. Comparing to XML data model, it allows attribute to have complex content; whereas comparing to OOP object model, it allows child nodes to immediately follow the attribute nodes.

Object notations like JSON and JavaFX literal objects can be considered as subset of Candle Object Notation. Every JSON object can be directly mapped into an object in Candle. You can refer to the appendix on the advantages of Candle Object Notation over JSON.

7. Markup Document

root  :=  markup-signature, namespace-declaration?, element-content;
markup-signature = "<?cmk1.0?>";

Candle Markup document has an opening signature <?cmk1.0?>.  Candle markup documents are normally stored with extension ".cmk", but this is not required. The MIME type to be used for Candle Markup should be 'text/x-candle'.

Unlike XML, which requires a root element, Candle allows a sequence of nodes (except attribute node) to appear at the root level.

8. Summary

In general, we can say that Candle Markup is an ideal format for general-purpose data serialization, whether it is structured object data or mixed text content. It has a terse and readable syntax, as well as, a clean and strongly-typed data model, making it much better than many textual serialization formats like XML, HTML, JSON, YAML.

Appendices

A. References


B. Candle Markup vs. MicroXML

James Clark's blog triggered a tsunami of discussion on the XML format. And people are now seriously rethinking XML.

The 3 approaches suggested by James Clark - XML 2.0, XML.next, MicroXML put forward a solid framework to approach the issue. Candle is definitely along the XML.next line. Candle is not compatible with XML; Candle can be considered as a superset of XML; Candle goes beyond XML to unify it with JSON and OOP object data model.

MicroXML shares many common features with Candle on measures to clean up the mess of XML:
Critical things that have been sorted out by Candle Markup, but not by MicroXML are:
Of course some of these features might not be the design goals of MicroXML at all, as it is just a subset of XML.

C. Candle Markup vs. HTML5

Candle Markup is design with HTML in mind from the beginning. HTML character entities are directly supported in Candle. CSS basic data types are built-in data types in Candle. Literal values in Candle Markup are not quoted, making it looks more like HTML than XML.

Candle Markup is actually better than both the HTML and XML syntax of the HTML5 language. Comparing to HTML syntax, Candle's advantages are:

D. Candle Markup vs. JSON

JSON can be considered a poort man's object notation to serialize simple structured data. JavaFX's literal object notation is probably much better than JSON for more formal data representation.

JSON and JavaFX literal object are just subset of Candle Object Notation. Every JSON object can be directly mapped into an object in Candle. The advantages of Candle Object Notation are:
You can refer to one of the examples for a comparison:
Example 3: XML vs. JSON vs. Candle Object Notation

E. Candle Markup vs. YAML

YAML's syntax is similar to MIME. It's strength is in its readability. Candle Markup's syntax is based on XML and some prior object notions.

Advantages of Candle Markup over YAML are:

F. Candle Markup vs. Several Alternatives

In the following is a feature comparison of Candle Markup against other alternative textual serialization formats, including XML, JSON, YAML and JavaFX object notation. These selected alternatives are not exhaustive, but sufficiently representative:

XML JSON YAML JavaFX Literal Object Candle
Specific Features
Unicode Support yes yes yes yes yes
Whitespace non-ambiguity no yes yes yes yes
Strongly typed literal values needs Schema yes yes yes yes
Extended literal values
(like datetime, uri, qname)
needs Schema no needs type annotation needs to use object constructor yes
Namespace support yes
(but messy)
no partial
(only the type annotation is namspaced)
yes
(clean hierarchical ns)
yes
(clean hierarchical ns)
Complex attribute content no
(XML Schema defines a general value list syntax, but is highly ambiguous, and not usable at all)
yes yes yes yes
Child node support yes no
(no direct support)
no
(no direct support)
no
(no direct support)
yes
Formal data model yes
(but messy)
yes yes yes yes
Schema language yes
(XML Schema is over-complicated; RELAX NG is cleaner, but less used)
no no yes
(attribute only, no child content model support)
yes
(similar to RELAX NG)
Embeddable in programming languages
(as structured nodes not as quoted string)
yes
(.Net, Scala, etc.)
yes
(JavaScript)
no yes
(JavaFx)
yes
(Candle)
Advanced processing
(path language, query and update language)
yes
(but with overlapping and conflicting features)
no no limited
(not as high-level as XPath, XQuery)
yes
(unified query language)
General Features
Readability good for mixed text content,
but verbose for structured data
good for structured data good for structured data good for structured data good for both
(you have object notation; and literal values do not need to be quoted)
Cross platform yes yes yes yes yes
Open source yes yes yes promised
(but not delivered yet)
yes
Lightweight runtime yes (if you only uses XML);
no (if you starts to use XML Schema, XSLT, XQuery, WS, etc.)
yes yes no yes
(entire runtime is only 2MB when compressed)
Standards status W3C standard RFC standard no Oracle only
(might become Java standard in future)
not yet
Good for structured data no yes yes yes yes
Good for mixed text content yes no no no yes

Generally, YAML can be seen as a superset of JSON, and JavaFX literal object can be seen as superset of YAML. These 3 formats are good for structured data exchange. Candle can be seen as a superset of XML (excluding DTD) and the 3 object formats.

G. A List of Candle Markup Examples

Below are examples showing how Candle Markup can effectively express different kinds of structured data, and how it compares with other formats like XML, HTML, JSON, etc.

Example 1: A Simple Note
Example 2: An iCanlendar Record
Example 3: XML vs. JSON vs. Candle Object Notation
Example 4: A Configuration File in Lua
Example 5: An HTML5 SVG Example
Example 6: An MathML Example
Example 7: An Example in POV-Ray SDL
Example 8: An Example in RDF Turtle Notation