Transducers: Clojure's Next Big Idea
In Clojure, we often work with collections in various types of sequences (lazy or not), and core.async channels. The workhorse functions in Clojure–
filter, among others–are all functions that operate on collections. When we compose multiple functions that operate over these collections, we gain performance benefits and power. However, implementing those composed functions ourselves means locking it into a particular collection input and output. For example,
core.async needed to have its own implementation of
map that behaved like Clojure’s
map but only worked with core.async channels.
Transducers were introduced to free functions from implementation-specific knowledge of collection types. Instead, transducers operate on any type of collection and focus on the operation they wish to perform on that collection. Before we go further into transducers, however, let’s take a look at their predecessors.
Reducers introduced the idea to Clojure that functions operating over a collection could be combined into one function and then operate on the entire collection in one go. Reducers decoupled the implementation of inputs from the operation you wish to perform on the inputs. Reducers can only be evaluated eagerly, not lazily, and not over a core.async channel. As
core.async becomes more and more popular with Clojure, reducers are left behind. Reducers’ eager evaluation means the work is all done at once; the output cannot be a lazy sequence anymore. Additionally, reducers use macros to perform their magic instead of using function composition, which means we have to repeat our logic to handle different abstractions.
Clojure’s collection function internals have all been rewritten with this new concept of transducers. By digging into how transducers work, we can learn to use their full power and gain their benefits. I’ve been reading about the recent changes to Clojure’s functions, and this post details what I’ve learned so far.
In our code, we want to ignore the implementation of our inputs and outputs and instead focus on the operations to perform on a collection. To get a mental model of how this looks, consider a worker in an chocolate factory. The workers in the factory have many different tasks they need to perform including decorating the candies, removing misshapen candies, and putting the candies into different bins based on color. The candies could come continuously from a conveyer belt or arrive in bulk: the worker doesn’t care how they show up, only that they do show up. Once the worker is done with their task, the candies leave the worker’s area and the worker has no further interaction with them.
How would an eager-loaded collection look to our factory worker? The candies would show up in a big bin, all ready to be processed. The worker might perform a filtering operation—throw out the bad candies—and place the good candies into a new bin. In this example, the worker isn’t so concerned about where that bin comes from or where the output bin goes after the work is finished.
For a lazily-loaded collection, we can imagine that a tube comes down next to the worker’s station on the assembly line. The tube fills up with candies coming from somewhere else. Every time the worker takes one to process, there’s another candy waiting. Again, the input and output don’t matter. The worker just does their filtering job of removing the bad candies.
In both examples, the worker doesn’t have to worry about how the candy gets to their spot on the assembly line or where the candies go afterwards. The assembly line itself is composed of many different workers doing many different jobs. Because their inputs and outputs are interchangeable with other steps in the assembly line, they can be combined to build complex candy-making operations.
Our last type of input and output is a core.async channel. We might imagine the worker has many conveyor belts with irregularly arriving candies coming from all over the factory. But the worker must still determine which candies are bad. That’s it. The implementation of the rest of the assembly line is outside of the worker’s awareness. The worker still fits into our assembly line paradigm.
These three candy making scenarios describe the three types of input collections that transducers handle out of the box. (Transducers should also allow us to handle new types of collections, including types that we haven’t thought of yet.) The Clojure team implemented transducers in both Clojure and ClojureScript, and the addition of transducers across the board means that we can begin to ignore the differences between collection implementations and focus on the work we want our functions to perform. Most importantly, we can combine transducers with simple functional composition to build more powerful operations.
Let’s take a look at how at the generic layout of a transducer. The basic structure you’ll see is two anonymous functions—an inner and an outer function. It looks like:
(fn [reduction-function] (fn ( (reduction-function)) ([result] (reduction-function result)) ([result input] (reduction-function result input))))
The outer function exists to allow transducers to be combined with functional composition. The inner function performs the actual work.
Notice how the inner function has 3-arity. The transducer’s inner function must handle three cases:
- When no arguments are provided, we consider this the “base” case. This arity will always be called when the input and output are specified and ready. An example is if our
+function, with no arguments it returns 0. Clojure core’s
conjhas been changed to work with transducers: it now handles a 0-arity call returning an empty PersistentVector.
- When called with the single
resultargument, the inner function performs the lazy case. This arity is best explained as being used when the transducer chain is stopped and the code must perform any cleanup. When done, it will call
result. This arity is used for internal plumbing and you shouldn’t need to worry about this case unless you’re digging into something like
- When called with the
input, the operation actually performs work. The
inputargument in this case is a single item given to the function. We’re not concerned with where it came from, whether it’s from a lazy sequence or a core.async channel or a conveyor belt full of candies. Here are some examples of how this arity works taken from Clojure’s source code:
mapping-function to its
input then uses
reduction-function to add it to
([result input] (reduction-function result (mapping-function input)))
filter uses the
filter-predicate to determine if the
reduction-function should add the
input to the
([result input] (if (filter-predicate input) (reduction-function result input) result))
keep transforms the
input and only passes it to the
reduction-function if it is not nil:
([result input] (let [keep-value (keep-transformation input)] (if (nil? keep-value) result (reduction-function result keep-value)))
All of these examples operate on the entire input. What about when you want to halt midway, say after you have found a truffle in a batch of peanut brittle? You can send a halt signal by using
reduced. Here is an example from
([result input] (if (take-while-predicate input) (reduction-function result input) (reduced result))) ;;Halts the transducer operation
You might wonder: Why can’t we just use
partial to compose these collection functions, and skip all this new transducer hullabaloo?
For example, why not just write code like
(comp (partial filter even?) (partial map inc))? Applying
partial here gives you the ability to chain functions (just like
->>), but the logistics of managing the inputs and outputs are tied up in the collection functions. The
map function still returns a lazy sequence. You can’t take this composed function and use it with core.async channels or a reducer. To do that, you’d have to re-implement the same logic with those abstraction’s functions.
core.async is becoming increasingly popular as a way to add performance to your existing Clojure code. You might find yourself replacing some part of your code with core.async channels. If your code was implemented with transducers, it will work in the new case without having to rewrite it.
To further drive home why transducers over reducers: Reducers were a separate namespace and required pulling in functions like
clojure.core.reducers and ensuring correct namespacing, for example, because you don’t want to clobber your existing usage of
map. Because transducers are built into the core functions, they don’t present the same namespace-clobbering issues.
With transducers, we don’t have to juggle both reducers’ functions and Clojure core’s functions at the same time. The distinctions of each implementation collapse into one
map function, regardless of context.
Using the same logic with multiple types of collections is the real power of transducers. Clojure has transducers built in, meaning that you can begin using them immediately everywhere that you need to compose functions that operate over collections. There’s no separate namespace to juggle in your code. And transducers play well with
core.async, which is increasingly finding its way into our code bases.
Have any questions about transducers? Reach out to us @bendyworks on Twitter.