From Ruby to Haskell, Part 3: Lazy Evaluation

One of the defining characteristics of Haskell— that you often see listed in what sets it apart from other languages— is that it is the only common language that’s lazy. In fact, laziness has been cited as the feature that keeps Haskell functionally pure. This is because if you can’t tell when things are going to be evaluated, you can’t cause side effects (because you don’t know when they’ll happen either).

What is laziness, what does it mean?

Each programming language uses an evaluation strategy to process code. For instance, Ruby is a “pass-by-value strict” language. This means that parameters to a method/function are evaluated first, and then their results are bound to the parameter names within the body of the method. Haskell uses an opposite strategy from Ruby— you can imagine arguments being sort of pasted-into the body of the function as needed. If the argument values are never used, then they are never evaluated.

Why is lazy evaluation desirable?

Let’s look at one oft-used Ruby library— the included testing framework Minitest— to gain some understanding into the utility of laziness. A test case in Minitest (written in spec syntax) might take either of the following forms:

describe "with strict setup" do
  before do
    @cheap = my_cheap.calculation
    @expensive = my_expensive.calculation
  end

  it "must be present" do
    @cheap.must_be :present
  end

  it "must frob cheap" do
    @expensive.frob(@cheap).must == true
  end
end

describe "with lazy setup" do
  let(:cheap) { my_cheap.calculation }
  let(:expensive) { my_expensive.calculation }

  it "must be present" do
    cheap.must_be :present
  end

  it "must frob cheap" do
    expensive.frob(cheap).must == true
  end
end

The two specs, "with strict setup" and "with lazy setup", accomplish the same tests, with one important difference: the let blocks in "with lazy setup" aren’t evaluated until the first time they are called— the expression inside the let block is lazily evaluated.

In contrast, the instance variables in "with strict setup" force the evaluation of the expressions to the right of= on assignment, before the values to which they resolve are required. This eager evaluation has two immediate caveats:

We can see that the spec example contains two assertions, but only one of them depends upon the value of expensive, which we can imagine is an expensive calculation. In "with strict setup", the expensive calculation is evaluated for every spec, regardless of whether it is used; this is not the case in "with lazy setup", meaning the lazy test suite will run faster. In a well-tested application with hundreds of specs (such as a Rails application with many tested database calls), strict setups can result in a significant amount of lost time during test runs. What if the expensive calculation being tested raises an error? In the lazily-setup spec, this results in one spec failure: the spec that depends on expensive — a very logical outcome! In the strictly-setup spec, however, two test failures result because the error is evaluated each time. Think about that for a minute: due to excessive eagerness, you get the wrong answer.

This is why eager evaluation can be said to violate the separation of concerns. If a particular value doesn’t depend on the result of some arbitrary expression, why should that expression potentially modify the resulting value? Haskell takes this many steps further, though. In Haskell, not only are functions first-class citizens, but they are (almost) all lazily-evaluated. This leads to some more-interesting examples, such as the comparable expressions:

  • in Ruby: first_element = [1, 2, raise("bad element")].first
  • in Haskell: firstElement = head [1, 2, error "bad element"]

Both seek to accomplish the same goal: retrieve the first element in the array or list. The difference lies in their outcomes. The Ruby variable assignment will raise a "bad element" exception, because every expression in the array is evaluated on assignment. The Haskell function definition will succeed, because even the list elements themselves aren’t evaluated until required.

That allows for an expression like firstElement = last $ reverse [1, 2, error "bad element"] to evaluate successfully even though it requires two list traversals to achieve the result. Though the intermediate elements are passed through the functions, they are never evaluated and no error is thrown. Haskell’s laziness goes far beyond simply not evaluating the remaining list arguments, though. The entire function remains unevaluated (living as a “thunk” in memory) until the value is finally called upon, such as when it will be printed to the screen.

Divers Examples: If/Else

But wait there’s more. By separating the evaluation of code from the implementation of it you allow code to be more modular, more transportable, and even allows better reasoning.

Say I wanted my own if construct in Ruby. We could imagine defining a Ruby method called my_if:

def my_if(bool, my_true, my_false)
  if bool
    my_true.call
  else
    my_false.call
  end
end

and calling it would look like this:

my_if(true, lambda { puts "one"; 1 }, lambda { puts "two"; 2 })
one
# => 1

If you like, we just wrote a tiny little embedded DSL for if/else statements. In the definition of the function my_if, we had to do a little extra work to explicitly control evaluation. We know that we have to evaluate bool (because otherwise we can’t decide which branch to take) but after that, we only wanted to evaluate one branch or the other. Here’s a similar function that gets it slightly wrong:

def wrong_if(bool, eager_true, eager_false)
  if_branch = eager_true.call
  else_branch = eager_false.call

  if bool
    if_branch
  else
    else_branch
  end
end

wrong_if(true, lambda { puts "one"; 1 }, lambda { puts "two"; 2 })
one
two # you didn't actually want this printed!
# => 1

And there you see that both branches got evaluated! Lazy evaluation would let you implement this the naïve way:

def lazy_if(bool, lazy_true, lazy_false)
  if bool
    lazy_true
  else
    lazy_false
  end
end

lazy_if(true, ( puts "one"; 1 ), ( puts "two"; 2 ))

# note this is NOT the output you'll get:
one
# => 1

The parens above are meant to group code without using it, sort of like a free-floating block. Speaking of blocks…

Blocks, Procs, Lambdas… Oh My!

Have you now come to desire lazy evaluation? Fear not, gentle Rubyist, for Ruby gives us at least some ability to suspend strict evaluation in the form of blocksprocs, and lambdas. Each are ways of passing chunks of code around— that is, of defining code without immediately using it. Again, there’s the theme of separation of concerns.

Of these, lambda is probably the most flexible because it is more modular (that is, its effects are local to thelambda). Where you can clearly see this is in how it handles return:

def foo
  x = lambda { return 1 }
  x.call
  return 2
end

foo
# => 2

whereas Proc behaves a little bit differently:

def bar
  x = Proc.new { return 1 }
  x.call
  return 2
end

bar
# => 1

Proc immediately exits from not only the block containing the Proc.new { return 1 } but also the enclosingdef bar method. Both are, however ways to invert control— the method handles high-level details but the code you pass in controls the details. This is a really fundamental and flexible aspect of functional programming that we can also leverage in Ruby. We use functions to create behavior and we use higher-order functions to create modular, tweak-able functions (or methods, in the case of Ruby):

A great example of such a function is map!

square = lambda { |x| x * x }
[1, 2, 3].map(&square)
# => [1, 4, 9]

The high-level pattern is “do this [mumble] on each element of the array”, and then you get to tweak it with the code you pass in. In this case, we pass in a lambda that says “give me a number and I’ll square it”, making the whole phrase “square each element of the array.” We could subsequently pass in lambda { |x| puts x } to print each number to the screen.

This allows us to preserve the separation of concerns. The concern of “do something to each element” is neatly separated from what specifically we want to do. Controlling when and where things happen are little wins that help us to write better, more modular, and clearer code. Lazy evaluation gives us the optimal “when”— only when actually required and Haskell gives us this as the default.

The greater challenge for the budding Haskell-er, then, is learning to think in terms of lazy evaluation, but this is a subject for a future blog post.

Special thanks to Jon for co-authoring this blog post! It wouldn’t have happened otherwise.