Candle Pattern Reference

Version : Candle 0.10
Published date : Nov 12, 2011

1. Introduction

One unique feature of Candle is that it provides a unified pattern language that can match 3 major types of data model. The 3 types of patterns in Candle are:
The commonality among these 3 types of pattern is that they all share the same context-free grammar. The grammar has only a few easy-to-understand constructs: rule reference, choice, concatenation, exclusion and repetition. But when they are combined together, they can defined very complex pattern.

The differences in these 3 types of patterns are just the terminals in the grammar. The terminals in sequence patterns are items. The terminals in node patterns are nodes. And the terminals in string patterns are characters.

2. String Pattern and Grammar

Candle's grammar is self-hosted. Candle's grammar notation is closely based on EBNF.

There are two types of text grammar rules: lexical rules and syntax rules. Each rule starts with a qualified name of the rule, then followed by '=' or ':=' token, then followed by the detailed production of the rule, and finally terminated with token ';'.
Name = ("_" | letter), ("_" | "-" | "." | letterdigit)*;
Prologs := (NamespaceProlog | ImportProlog | ExternalRoutine | NativeRoutine)*,
        (Grammar | Schema | Structure | Class | Function | Template | Method | GlobalVarDeclaration)*;


Below is the grammar of Candle grammar:

<?csp1.0?>
!! the grammar rules of Candle grammar
grammar candle-grammar {
root = grammar-root;
space = (
s | line-comment | block-comment)*;
grammar-root := "grammar", qname, "{", (grammar-rule)+, "}" ;
grammar-rule := lexical-rule | syntax-rule;
lexical-rule := 
qname, "=", string-pattern, ";" ;
syntax-rule := 
qname, ":=", string-pattern, ";" ;
string-pattern := (choice-pattern | concatenation-pattern | exclusion-pattern)*;
choice-pattern := repetition-pattern, "|", repetition-pattern;
concatenation-pattern := repetition-pattern, ",", repetition-pattern;
exclusion-pattern :=repetition-pattern, "-", repetition-pattern;
repetition-pattern := pattern-term, ("*" | "?" | "+")?;
pattern-term := string | rule-reference | pattern-group;
rule-reference := qname;
pattern-group := "(", string-pattern, ")";
!! rule productions, like 
string, line-comment, block-comment, are omitted
}

If you understand EBNF or RELAX NG, the grammar above should be self-explanatory. There are several things to take note:
LineComment = "!!", (char - ("&cr;" | "&lf;"))*;
If some parts of the Candle language are not implemented in this release, they are marked with gray color in the grammar.

3. Node Pattern and Schema

A schema is a collection of node pattern rules.

schema    :=    "schema", qname, "{", (schema-rule)+, "}" ;
schema-rule := node-pattern;
node-pattern := (node-choice-pattern | node-concatenation-pattern | node-exclusion-pattern)*;
node-choice-pattern := node-repetition-pattern, "|", node-repetition-pattern;
node-concatenation-pattern := node-repetition-pattern, ",", node-repetition-pattern;
node-exclusion-pattern := node-repetition-pattern, "-", node-repetition-pattern;
node-repetition-pattern := node-pattern-term, ("*" | "?" | "+")?;
node-pattern-term := text-term | comment-term | data-term | attribute-term | element-term | node-rule-reference | node-pattern-group;
text-term := "text", ("{", string-pattern, "}")?;
comment-term := "comment", ("{", string-pattern, "}")?;
data-term := "data", ("{", sequence-pattern, "}")?;
attribute-term := "attribute", (qname, ("{", sequence-pattern, "}")? )?;
element-term := "element", (qname, ("{", node-pattern, "}")? )?;
node-rule-reference := qname;
node-pattern-group := "(", node-pattern, ")";
An example schema is shown below:
<?csp1.0>
namespace c = 'go:candle.style';
schema c:style-document {
    root = c:element-folio | c:element-window | c:element-scene;
    c:element-folio = element sty:folio { c:element-window | c:element-scene };
    c:element-window = element sty:window { c:block-content };
    c:element-scene = element sty:scene { c:block-content };
    c:block-content = (element sty:div { text? } | element sty:br)*;
}

4. Sequence Pattern

Sequence pattern is a pattern defined on a sequence of items.

seq-pattern := (seq-choice-pattern | seq-concatenation-pattern | seq-exclusion-pattern)*;
seq-choice-pattern := seq-repetition-pattern, "|", seq-repetition-pattern;
seq-concatenation-pattern := seq-repetition-pattern, ",", seq-repetition-pattern;
seq-exclusion-pattern := seq-repetition-pattern, "-", seq-repetition-pattern;
seq-repetition-pattern := seq-pattern-term, ("*" | "?" | "+")?;
seq-pattern-term := type-termseq-rule-reference | seq-pattern-group;
type-term := "empty" | "error" | "boolean" | "byte" | "ubyte" | "short" | "int" | "uint" | "long" | "ulong" | "float" | "double" | "measure" | "datetime" | string-term | "binary" | "qname" | "id" | "uri" | "atomic" | "sequence" | text-term | comment-term | data-term | attribute-term | element-term;
string-term := "string", ("{", string-pattern, "}")?;
seq-rule-reference := qname;
seq-pattern-group := "(", seq-pattern, ")";

Sequence pattern is used to to match some node content (like the content of data and attribute node), and also in the match expression.

At the moment, string is the only atomic type that can have an optional body, which matches the characters of the string.

<?csp1.0?>
namespace c='';
grammar c:sample-grammar {   
    root = s?, c:float | c:integer | c:uri, s?;
    c:integer = ("+" | "-")?, ("0" | ((digit - "0"), digit*));
    c:float = ("+" | "-")?, ( (digits, ".", digit* ) | (".", digits) );
    c:uri = "'", (char - "'")*, "'";
}
function main() {
   { "'uri'" match string { c:uri } } !! true
   { "uri" match string { c:uri } } !! false
   { "+3.57" match string { c:float } } !! true
   { ".57pt" match string { c:float } } !! false
}

Appendices

A. References