Tuesday, April 30, 2013

The frustration of Unicode in the Perl command-line

It's a small nit, but Perl is such an amazing command-line tool that I find it frustrating: you can't use Unicode brace-like tokens as delimiters for quote-like operators on the Perl command-line. Let me back up and explain...

"qq" is a Perl operator that's called a "quote-like operator" that does the same thing as double-quotes with two differences:

  • Any token can follow it (except whitespace) and it will match up to the next occurrence (or to a matching balanced token like {} or () or []).
  • If you use a balanced token after it, it will grab everything up to the next, balanced close-token, so "qq{you can put {} inside}"
I use qq a lot on the command-line because it doesn't conflict with the shell's quoting. For example:

perl -le 'eval qq{print qq{You should type: perl -le "print qq{Hello, world}"}}'
There's also a q (single-quote, which is pretty much the same as all Python quotes), qr (regular expression, which is similar to re.compile(r'...') in Python) and qx (capturing shell command, ala $(...) in shell and subprocess.Popen in Python)

Okay, so back to my gripe:

You can use unicode matching brace-like tokens in Perl ala qq《foo》but the program must have "use utf8" on a line before the Unicode program text, so you cannot do this on the command-line, where even with multiple -e flags, the program is read all at the same time.

It's a minor loss, since you can nest qq{qq{qq{}}} and it works just fine, plus you have many balanced tokens to work with: qq{qq(qq[qq<>])}, so it's mostly just annoying.

No comments:

Post a Comment