Saturday, April 26, 2008

Scala's Pattern Matching = Visitor Pattern on Steroids

Since I had my first adventure in Scala a month and a half ago, I've been delving more deeply into Scala and functional programming in general. One of the things I ran across is Pattern Matching. Apparently, it's a key-feature in functional programming languages because it facilitates definition of recursive functions in a way that maps closely to functions in lambda calculus.

Let's take Fibonacci for an example:

f(0) = 0
f(1) = 1
f(n) = f(n-1) + f(n-2)

This is how you'd define it with Scala's Pattern Matching:

def f(n: Int) : Int = n match {
  case 0 => 0
  case 1 => 1
  case n => f(n-1) + f(n-2)
}

Here is another example of Pattern Matching that is a bit more advanced:

abstract class Expr
case class Num(n: Int) extends Expr
case class Sum(l: Expr, r: Expr) extends Expr
case class Prod(l: Expr, r: Expr) extends Expr

def evalExpr(e: Expr): Int = e match {
  case Num(n) => n
  case Sum(l, r) => evalExpr(l) + evalExpr(r)
  case Prod(l, r) => evalExpr(l) * evalExpr(r)
}

def printExpr(e: Expr) = e match {
  case Num(n) => print(" " + n + " ")
  case Sum(l, r) => printExpr(l); print("+"); printExpr(r)
  case Prod(l, r) => printExpr(l); print("x"); printExpr(r)
}

What we have above is an abstract class to represent math expressions, and three implementations to represent numbers, sum operation, and product operation.

For example, the math expression "1 + (2 x 3)" can be composed using the expression classes as follows:

var expr1 = new Sum(new Num(1), new Prod(new Num(2), new Num(3)))

The syntax may seem verbose, but Scala's operator overloading can simplify it. I won't go over that right now. Let's focus instead on pattern matching.

So, we have two functions to evaluate expressions and print expressions.

Here are examples of using them on expr1:
evalExpr(expr1) => 7
printExpr(expr1) outputs 1 + 2 x 3

Now, what does the above example remind you of in the Object-Oriented world?

You guessed it! The Visitor Design Pattern by the Gang of Four. At first glance, I thought pattern matching was heresy by the OO standards as it looks like the dreaded switch statement that often breaks the open/closed principle and causes maintenance hell. After pondering it for a while though, I changed my mind and decided to see pattern matching as its own separate thing, which happens to provide another way of accomplishing the goals of the Visitor Pattern.

Now in Java, how would this have been implemented with the Visitor Pattern?

abstract class Expr {
  public <T> T accept(ExprVisitor<T> visitor) {visitor.visit(this);}
}

class Num extends Expr {
  private int value;

  public Num(int value) {this.value = value;}

  public int getValue() {return value;}
}

abstract class BiCompositeExpr extends Expr {
  private Expr left;
  private Expr right;

  protected BiCompositeExpr(Expr left, Expr right) {
    this.left = left;
    this.right = right;
  }

  public Expr getLeft() {return left;}
  public Expr getRight() {return right;}
}

class Sum extends BiCompositeExpr {
  protected Sum(Expr left, Expr right) {super(left, right);}
}

class Prod extends BiCompositeExpr {
  protected Prod(Expr left, Expr right) {super(left, right);}
}

interface ExprVisitor<T> {
  T visit(Num num);
  T visit(Sum sum);
  T visit(Prod prod);
}

class EvalExpr implements ExprVisitor<Integer> {
  public Integer visit(Num num) {
    return num.getValue();
  }

  public Integer visit(Sum sum) {
    return sum.getLeft().accept(this) + sum.getRight().accept(this);
  }

  public Integer visit(Prod prod) {
    return prod.getLeft().accept(this) * prod.getRight().accept(this);
  }
}

class PrintExpr implements ExprVisitor<Void> {
  public Void visit(Num num) {
    print(" " + num.getValue() + " ");
    return null;
  }

  public Void visit(Sum sum) {
    sum.getLeft().accept(this); print("+"); sum.getRight().accept(this);
    return null;
  }

  public Void visit(Prod prod) {
    prod.getLeft().accept(this); print("*"); prod.getRight().accept(this);
    return null;
  }
}

As you can see, each case statement in the Scala example maps to a visit method in the Java example. So, another way of looking at the Visitor Pattern is that it's simply a special case of pattern matching, which matches only on the type of method parameters.

Of course, if we add a new expression type in Java, such as Div, we have to add a new visit method for that type in every visitor. Likewise, to handle Div in Scala, we would also have to add a new case statement in every Scala function that works with expressions. This is a famous trade-off for using the Visitor Pattern as it enables you to define new operations easily, but makes it difficult to add new types to visit.

Now, I am sure everyone noticed how the amount of code written with Scala's Pattern Matching was a tiny fraction of that written with Visitor Pattern in Java!!!

Why?
  • Scala's case classes provide a shortcut for creating classes with a constructor and getters, saving you from all the manual labor provided you just define the signature of the constructor
  • Scala's use of recursive functions instead of visitors accomplishes the same goals while saving you from creating all the boiler-plate code for the Visitor Pattern (e.g. Visitor superclass and double-dispatch accept method on expressions)
  • Scala's use of pattern matching with case classes and recursion allows you to evaluate values from expression objects without having to call getters or visitor accept methods, thus yielding a much more concise syntax
So as you can see; in a way, Scala's Pattern Matching is Visitor Pattern on steroids!!!

3 comments:

murphee said...

I think you'll find that the Visitor pattern is really just an implementation of double dispatch, which is a special case of multi dispatch, ie. deciding which function to call by considering _all_ arguments.
Also, if you go in the other direction, you'll find that OOP's virtual methods are really single dispatch, ie. deciding which method to call based on the first argument (the "self" or "this" argument).

I go into more depth on this in "Hell hath no fury like Polymorphism scorned...":
http://www.jroller.com/murphee/date/20080211

Annas "Andy" Maleh said...

Thanks for the link Murphee. That was an enjoyable read.

Of course, the famous trade-off is that while single-dispatch makes it easier to add new data types, yet harder to add new operations; double-dispatch makes it easier to add new operations, yet harder to add new data types.

wtc said...

What about the run-time costs? The visitor-based approach has a fixed cost of two virtual function calls. The match-based approach has a variable cost proportional to the number of subject classes, right?