Tuesday, July 5, 2016

Perl 6: What's that you've got there? On stringification.

In Perl 5 and Python, the two languages I'm most familiar with outside of Perl 6, at this point, strings are pretty simple things. When you have an object and you want to print it, for example, here's the Perl 5:

    say $thing;
And here's the Python:

    print thing
Notice that there's no hinting being given to the compiler as to what this "thing" is or how to render it to the output. In Perl 5, this is an implicit feature of all scalar values. They have a string representation which can always be fetched. It might not be useful (and in many cases isn't) but it'll be there.

In Python, print automatically converts its parameters to strings as if you had written:

    print str(thing)
though it can be more complex than that, due to encoding issues. The default stringification for objects is the __repr__ method on the object's class, which is (as in Perl 5) not terribly useful, but good object authors know to override it and/or the stringification method (__str__) with something more useful.

In Perl 6, the picture is both more complicated and more functional out of the box.

There are three kinds of stringification:
  1. Coercing to str (usually via the ~ unary or binary operator)
  2. The .gist method
  3. The .perl method
The first is a straightforward "what are you as a string"? That has no implication of preserving all aspects of an object. So, for example, when matching a regular expression and getting back a match object, you might want to print the whole match. This is done by coercing the match to a string:

    given "fool" {
        say ~ m/fo+/
    }
Will print "foo". Notice that there's quite a lot of information thrown away by this operation. There's the position in the base string, any sub-matches, etc. all available in the match object, but when stringified, that's all thrown out the window, and the "most salient" string elements are printed.

The second form of stringification is called a "gist" and it's accessed by using the .gist method. Here's an example:

    given "fool" {
        say m/f(o+)/.gist
    }
prints:

「foo」
 0 => 「oo」
In a gist, the stringified version of an object tries to strike a balance between preserving internal information necessary for debugging or other analysis and readable text. The general rule appears to be:

  • Throw away internal state that's not relevant to the textual understanding of the object (e.g. the "from" and "to" attributes of a regular expression match
  • Use the corner-brackets (「 」) from Asian scripts to enclose internal string value(s)
  • Recursively descend data structures, calling each of their gist methods
Finally there's the third option, which is exactly like Perl 5's Data::Dumper and a bit like Python's repr except that it aggressively attempts to determine how to represent code as executable Perl. Unless there's ephemeral state involved, evaluating the output of the .perl method on an object should yield a copy of the object.

Languages, in general are moving towards this sort of "structured intuition" about the representation of objects, rather than some more on-demand way of representing data. It allows authors to carefully control how their objects are represented, but also gives simple objects all of the tools that they need out of the box.