Tuesday, June 19, 2012

An ORM for Perl 6

I've been using sqlalchemy in Python quite a bit of late, but as I play around more and more with Perl 6 it seems to me that the sqlalchemy model of ORM for Perl 6 would be incorrect. Certainly, the basic concept that you tie objects to database tables and rows still makes sense, but the way you gain access to those objects is probably going to want to be very different in the two languages.

Here's a bit of sqlalchemy from an online tutorial:

mary = session.query(User).selectfirst(users.c.name=='Mary')

So, we have a session object. I think that makes sense for both languages. Then there's a query to which we pass our User table spec. Perfectly sane. Then there's this "selectfirst" thing, which is really shorthand for "select" with the given arguments and then the "first" method called on that result.

In this example, we're not actually comparing "users.c.name" to "Mary". Instead, users.c.name overrides comparison and returns an object that describes to selectfirst that it should perform an equality comparison in the where clause between the "name" column and the string "Mary".

This is about as close as Python can get to a mini-language because you can't override the parser. However, Perl 6 lets you do just that...

method mary($session) is parsed(::ORM::first) {
  select($session)
  from self
  where name == "Mary"  
}

But, isn't the point of an ORM to remove all of that ugly SQL?! Well, no. The point of an ORM is to give you native language controls over databases in a portable way. Our mini-language might look like SQL, but is actually just Perl with a different parsing filter, essentially the same as the filter provided to the "selectfirst" command in the Python example.

If you were to write this out in Perl 6, it might then turn into something like:

method mary($session) {
  select($session, :source(self),
    :filter(:lhs(self!coldef<name>), :rhs("Mary"), :op("==")).first);
}


But if we're talking to an SQL database, why not use something that looks more like SQL? I'm still noodling with this and thinking about what makes sense, but when I get to sitting down and thinking about an ORM for Perl 6 (if someone else hasn't already), then this will definitely be my starting point.

Monday, April 16, 2012

Python list comprehension: so close

Perl has some very nice list management features, but Python's list comprehension is clearly a more powerful approach for simple list transformation. For example, in Perl, when you want to get the list of the "apple" key from a list of hashes:

  @apples = map { $_->{apple} } @fruitstuff;

In Python that looks so much cleaner as a list comprehension:

  apples = [ d['apple'] for d in fruitstuff ]

However, Python starts to get messy when you begin to contemplate pruning the result. Again, the Perl for trimming out undefined values:

  @apples = grep { defined $_ }
            map { $_->{apple} } @fruitstuff;

and the Python:

  apples = [ d['apple'] for d in fruitstuff
             if d['apple'] is not None ]

That's one way to do it, but really, it's not what the Perl is doing. This code is re-indexing fruitstuff to get the value after first indexing it to check that it's defined. The Perl, on the other hand is indexing the hash only once, getting all mapped values and then trimming out the undefined entries. In this example, either approach might work, but it's significant in some cases where indexing might be expensive or have side-effects. So, what does the actually equivalent Python look like?

  apples = [ x for x in [ d['apple'] for d in fruitstuff ]
                   if x is not None ]

Now, functionally these two approaches are identical, but when I approach the Perl, I get a certain amount of clear-headedness by observing that the first function has a name ("grep") a block, functor, closure or whatever you want to call it, and then a parameter which is, itself a function ("map") a block and a list.

When I look at the Python, I have to unpeel it visually and transition in and out of the inner block. Nesting the two list comprehensions makes them quite difficult to read because of the postfix logic operator.

Python's list comprehensions are superior in most ways to Perl's handling, mind you, and I would be doing a disservice to Python to suggest otherwise. But, it's these minor sticking points where I have to wonder why it is that Python chooses to be an almost-functional (in the computer science sense) language. Sure, you can use Python's limited lambda here with its map function, but it doesn't gain you anything, since you still need list comprehension for the reduction, so it will just end up complicating the inner loop (note that in Perl, you can use a single map statement to do all of the work, since map can reduced the size of the returned list, but that wasn't the point of the example).

I like both Perl and Python. They both have their places in my toolbox, but I'll always long for a language that can be all of the promise of both languages without the warts of either...

Ideally, we would take Python's ability to name and scope the iterator variable (not to be confused with the iterable parameter) with perl's chained "Schwartzian transforms").

In pseudo-code this would be:

  resultlist = filter variable: functor, input_iterable

So, in Python, something like:

  apples = filter x: x is not None, \
     [ d['apple'] for d in fruitstuff ]

where filter has pretty much exactly the same syntax as lambda, but with an additional, positional parameter after the body.

In Perl, we would need to add the named iterator variable:

  @apples = grep $x { defined $x }
     map $d { $d{apple} } @fruitstuff

Much cleaner on both counts, but of course, the two communities are too busy treating each other like Baptists and Lutherans at a PTA meeting to see the value in each other's approaches...

Friday, March 9, 2012

Why case-insensitive filesystems need to go away

It happens all the time. Someone using a Mac for the first time, but who is used to Linux or other Unix systems comes across a problem caused by the mix of case-sensitive and case-insensitive filesystem handling. It also happens in reverse. So, the obvious line-up is the people who are most comfortable with MacOS defend case insensitivity and those most familiar with traditional Unix and Unix-like systems defend case sensitivity.

The reality is that case sensitivity is the only sane option, but it has nothing to do with tradition or Unix history. It has to do with the idea of upper and lower case and what they mean.

In the modern day, most systems support not just the Western (specifically American) subset of characters called ASCII in filesystems, but very nearly every character that is in use around the world. These expanded character sets exist within a framework called Unicode, and in the Unicode world it's rather a lot more complex. For example, on my Mac, I just created a file with the name, "一". This is the Japanese and Chinese character for the word, "one". In fact, it can be used interchangeably with the numeral "1". So why is it that, on my Mac I can create a file named "一" and another file in the same directory named "1"?

Oh, but that's just the start.

There's full-width versions of all of the ASCII characters like "D" which is the full-width "D". On a Mac, you can create a file whose name is "D" and another whose name is "D" in the same directory. Not only are these the same conceptual letter, but they look almsot exactly the same in a directory listing. So why? Because the Apple filesystem people rightly determined that trying to fold every variation of every "glyph" onto every other variant of that same glyph was not only prohibitively complicated, but guaranteed to be wrong in many circumstances (in some cases, for example, 一 doesn't mean the same thing as 1 and you could reasonably use 1 as a way to resolve ambiguity). The "wrong" behavior of mapping upper- and lower-case attributes to each other is no less wrong, but it was Apple tradition, and breaking with it would have created problems for Apple users, so they kept it, but they weren't foolish enough to try to expand it to every one of the possible glyph-folding permutations.

So, the next time something breaks because a user checked a file in from a Linux system with a name that conflicted with an existing, but upcased filename, before you blame that user or the Linux filesystem semantics, consider that the OS you're using is preserving part of a historical glitch that should never have been perpetuated in the first place.

For a more complete treatment of the complexities of case-folding, let me direct you to the Wikipedia "Letter case" article which contains a section on Unicode case folding. And which further points out the complexities of certain edge cases:

  • The German letter ß exists only in lowercase (but see Capital ß), and is capitalized as "SS".
  • The Greek letter Σ has two different lowercase forms: "ς" in word-final position and "σ" elsewhere.
  • The Cyrillic letter Ӏ usually has only a capital form, which is also used in lowercase text.
  • Unlike most Latin-script languages that use uppercase "I" and lowercase "i", Turkish has dotted and dotless I independent of case.

I wasn't even aware of a couple of these. I can't imagine how you would try to handle the German case. That's just ugly.