- To be as similar to perl 6 rules as possible
- To reduce complexity where it does not yield substantial benefit and where reduction in complexity does not come at tremendous cost to compatibility
- Speed of parsing and execution
- A regex is a sequence of atomic assertions about patterns very similar to Perl 5 regular expressions.
- An assertion is embedded code within a regex which returns true or false and matches on true.
- A subexpression is a reference from one named regex to another.
- A token is a regex which matches in a single pass (no backtracking) or fails.
- A rule is a regex which defaults to considering whitespace significant.
- Tokens, rules and bare regexes are all optionally named, allowing subexpressions to use those names for reference.
- All of the above may be referred to as elements of a grammar and are technically methods. They must either be declared as part of a class that implements (does) the role, "grammar" or must be implicitly associated with the global "_regex" grammar (e.g. this code defines a method on _regex implicitly and invokes it: if "foo" ~~ regex{f} {...})
class Sand :does(grammar) {
method same_category($letter, $category by reference) {
my $letter_cat = $letter.unicode_block_category();
if $category {
assert($letter_cat == $category);
} else {
$category = $letter_cat;
}
}
method is_token_unicode() -> (!bool) {
my $number = undef;
my $other = undef;
for self.match(0).str().list() -> ($letter) {
if $letter.isdigit() {
self.same_category($letter, $number);
} else {
self.same_category($letter, $other);
}
}
return True;
}
token identifier {
[ <alpha> | '_' ] <alphanum>* { .is_token_unicode() }
}
rule scalar { '$'<identifier> }
}
This code defines the "identifier" and "scalar" regexes that are part of Sand's grammar. Notice that a grammar class can define methods like any other class and those methods can be invoked from within the body of a regex.