AJS's Software Development Blog: 2014

Sand is a programming language that I introduced in a previous article. Sand rules are not completely fleshed out, but here are the primary design goals:

To be as similar to perl 6 rules as possible
To reduce complexity where it does not yield substantial benefit and where reduction in complexity does not come at tremendous cost to compatibility
Speed of parsing and execution

Specifically, these are the top-level elements:

A regex is a sequence of atomic assertions about patterns very similar to Perl 5 regular expressions.
An assertion is embedded code within a regex which returns true or false and matches on true.
A subexpression is a reference from one named regex to another.
A token is a regex which matches in a single pass (no backtracking) or fails.
A rule is a regex which defaults to considering whitespace significant.
Tokens, rules and bare regexes are all optionally named, allowing subexpressions to use those names for reference.
All of the above may be referred to as elements of a grammar and are technically methods. They must either be declared as part of a class that implements (does) the role, "grammar" or must be implicitly associated with the global "_regex" grammar (e.g. this code defines a method on _regex implicitly and invokes it: if "foo" ~~ regex{f} {...})

here are some examples:

class Sand :does(grammar) {
method same_category($letter, $category by reference) {
my $letter_cat = $letter.unicode_block_category();
if $category {
assert($letter_cat == $category);
} else {
$category = $letter_cat;
}
}
method is_token_unicode() -> (!bool) {
my $number = undef;
my $other = undef;
for self.match(0).str().list() -> ($letter) {
if $letter.isdigit() {
self.same_category($letter, $number);
} else {
self.same_category($letter, $other);
}
}
return True;
}
token identifier {
[ <alpha> | '_' ] <alphanum>* { .is_token_unicode() }
}
rule scalar { '$'<identifier> }
}

This code defines the "identifier" and "scalar" regexes that are part of Sand's grammar. Notice that a grammar class can define methods like any other class and those methods can be invoked from within the body of a regex.

AJS's Software Development Blog

Tuesday, January 21, 2014

Sand: Rules

The Sand programming language