Tuesday, January 21, 2014

Sand: Rules

Sand is a programming language that I introduced in a previous article. Sand rules are not completely fleshed out, but here are the primary design goals:

  • To be as similar to perl 6 rules as possible
  • To reduce complexity where it does not yield substantial benefit and where reduction in complexity does not come at tremendous cost to compatibility
  • Speed of parsing and execution
Specifically, these are the top-level elements:

  • A regex is a sequence of atomic assertions about patterns very similar to Perl 5 regular expressions.
  • An assertion is embedded code within a regex which returns true or false and matches on true.
  • A subexpression is a reference from one named regex to another.
  • A token is a regex which matches in a single pass (no backtracking) or fails.
  • A rule is a regex which defaults to considering whitespace significant.
  • Tokens, rules and bare regexes are all optionally named, allowing subexpressions to use those names for reference.
  • All of the above may be referred to as elements of a grammar and are technically methods. They must either be declared as part of a class that implements (does) the role, "grammar" or must be implicitly associated with the global "_regex" grammar (e.g. this code defines a method on _regex implicitly and invokes it: if "foo" ~~ regex{f} {...})
here are some examples:


class Sand :does(grammar) {
    method same_category($letter, $category by reference) {
        my $letter_cat = $letter.unicode_block_category();
        if $category {
            assert($letter_cat == $category);
        } else {
            $category = $letter_cat;
        }
    }
    method is_token_unicode() -> (!bool) {
        my $number = undef;
        my $other = undef;
        for self.match(0).str().list() -> ($letter) {
            if $letter.isdigit() {
                self.same_category($letter, $number);
            } else {
                self.same_category($letter, $other);
            }
        }
        return True;
    }
    token identifier {
        [ <alpha> | '_' ] <alphanum>* { .is_token_unicode() }
    }
    rule scalar { '$'<identifier> }
}

This code defines the "identifier" and "scalar" regexes that are part of Sand's grammar. Notice that a grammar class can define methods like any other class and those methods can be invoked from within the body of a regex.

The Sand programming language

The following is a direct cut-and-paste from my specification of the Sand programming language on my Wiki (which is now down and may never be revived, depending on how much free time I have). This copy was fetched from Google's cache.