Saturday, July 7, 2007

A tangent

Everybody else is doing it, so why not me?

Take this hypothetical piece of code:

int maxSalary = 0;
for (Employee employee : employees) {
if (Role.PROGRAMMER.equals(employee.getRole())) {
int salary = employee.getSalary();
if (salary > maxSalary) {
maxSalary = salary;
}
}
}

Now translate it into Factor (assuming the employees:

[ employee-role programmer = ] subset [ employee-salary ] map supremum

Note: this will fail if there are no programmers, because that's how supremum works. A simple workaround would be to precede supremum with 0 add, which puts a 0 on the end of the sequence returned by map.

Whoa! Functional programming! (By which I mean programming using functions as first class values, Steve Dekorte, which is what the name means.) I've never seen that in my life before! Let's make a big deal out of this, and submit at least three articles to Reddit about it! I need to find something better to do than read Reddit, don't I. I promise, I'll do a substantiative post on Unicode normalization forms soon...

Well, something I did notice here. The Factor version looks remarkably similar to the Io version:

employees select(role == "programmer") max(salary)

In Io, certain methods like select, max, map, and others, allow a number of forms, for convenience. One form looks like list(1, 2, 3) map(elt, elt+1), but for cases where the map is simply a message passed to each object, map(+1) will suffice. There is no currying going on here; +1 is merely a message which can be passed to an object. So something like select(role == "programmer") is equivalent to select(p, p role == programmer), and max(salary) is equivalent to max(p, p salary).

Messages can easily be composed by tacking them on in sequence—in Io, separated only by whitespace. It often works out well to chain a whole bunch of messages on to one object, where each message is sent to the object returned by the last method call. After a while, these chained postfix method calls start to look like like Forth or Factor. You don't actually need as many named variables when you can do this.

I learned Io before I learned Factor. I got very comfortable with the system of composing messages, but felt like there was something missing. Imagine you have a List created by foo bar baz and you want to make that list, but appending 0 on the end. So, you can have foo bar baz append(0). This is just an expression, and doesn't affect the variables in the environment at all. But, say you decide it shouldn't add 0 to the end; instead it should add the last element. To do this, your code would have to look more like (x := foo bar baz) append(last x). You need to introduce a new variable into the environment just because it's referenced more than once. There's a good chance that that variable will never need to be referenced again, yet it needs to be named.

I felt like there should be a way to reference something twice without having to name it. And the only way to do that is if it's possible to store more than one object on the postfix chain. This is how I learned to understand stacks and reverse Polish notation: it's like composing operations the way Io does, but there can be more than one object. You can't do this in Io, but in Factor, you could always do foo bar baz dup peek add (where peek is like Io's last and add is like Io's append).

I don't mean to criticize Io, especially as this property of dataflow applies to all languages with application- or mutation-style dataflow. I have some issues with it, but I have no interest in starting any sort of flame war. Still, the correspondence here is really interesting. But I like being able to program where I don't have to name my values, whether I use them once or more than once, no matter where the data flows. It's not very important, though, and occasionally causes problems, particularly in combinators. But it's still good.

No comments: