Sequence and Traverse
One problem that appears from time to time is that we we have some kind of
collection (I use list
here) and we want to map
every element with a
monadic function 'a -> M<'b>
. This then returns a list<M<'a>>
. But
often we want a M<list<'a>>
.
To be more concrete. Let’s assume we want to turn a list of strings into integers. We could
write a tryParseInt
function that does string -> option<int>
. But if we map
our function with a list<string>
we get a list<option<int>>
back.
Sometimes that is what we want, but very often it is not. Usually what we want is
a option<list<int>>
instead. So we expect the types to be switched.
The idea behind it in this example is that when we use map
over a list of strings it is
either successful as a whole and all elements are integers or as soon a single element
is not an integer we get None
back as a whole.
We could just write a function that somehow does that kind of transformation just for
tryParseInt
, but instead of doing that again and again for every function we try to
generalize that problem, so we can turn every list<option<'a>>
into option<list<'a>>
.
Not only that, we want to generalize the problem that it works for any type, not
just option
.
This generalization is what we think of the sequence
and traverse
functions.
Sequence
We first start with the monadic tryParseInt
function we already mentioned.
|
|
Next, we have some kind of input from a file, user or somewhere else.
|
|
In our example we want to do the following:
- Parse every
string
to anint
- If all inputs are valid, we want to
sum
the results - If one input is invalid, we want to print an error message
The first step is easy, as we could just map
the list.
|
|
We now have a list containing:
|
|
The problem starts in how we determine that every element is valid. We sure could
use fold
to loop through our list. Starting with a bool
set to true
and as soon
we encounter a None
we set the bool
to false
.
But we already have Option
for this kind of purpose. With Option
we still can
return the idea of true
(Some) and false
(None), but additional we also can return
a value.
Instead of just getting a boolean flag like true
and false
we just return Some
with a new list that has all the option
values stripped instead. For the valid
input case we just expect:
|
|
for the invalid input list we just expect:
|
|
As we need to loop through every element and build a new list, this is just a task for
List.foldBack
.
- Because we want a
option<list<'a>>
as a result, we start withSome []
- Then we check if
acc
andx
are bothSome
- If that is the case we add
x
toacc
- Otherwise we return
None
We name this operation sequence
.
|
|
When we test our function it returns the right result
|
|
Nice, it works! But this implementation still has a problem. Our folder
function
is basically a duplicate of the apply
function! The whole idea to check two
option
and only execute some code if both are Some
is exactly what apply
does.
Let’s look again at apply
.
|
|
There is also another problem here. It doesn’t matter which type we use. We always have
to lift an empty list. In this case we did Some []
for the accumulator. But in a
Async
case we just want an empty list inside an Async
. So we always just want to
return
a list for the context. It just means: We always can create a sequence
function as long our type provides a return
and apply
function. Or in other words,
our type is an Applicative Functor.
Let’s think about how we can implement sequence
with return
and apply
.
- The code we executed when we had two
Some
values wasx :: list
- So we just create a function for this operation
- And
apply
this function - Then we use this function as our
folder
function
|
|
We still get our expected results
|
|
Nice, everything works. So why is rewriting sequence
in such a way better?
What we basically have here is a Design Pattern (or what every Design Pattern is –
A Copy & Paste Pattern).
The sequence
operation for a list
is always the same. It just depends solely that
a type supports a retn
and a apply
function. It probably opens up the question
that when the implementation is always the same if we cannot just have a single
implementation?
Yes and no. Currently in F# retn
and <*>
are not Polymorphic, they are specific functions
we define ourself. We could fix it with Higher-Kinded Polymorphism
but F# don’t support this nicely, but there are ways around it. But i will not cover
this topic here.
Traverse
So far we discussed sequence
but in practice you will less likely implement sequence
at all. Instead we will implement traverse
. So how is traverse
different from sequence
?
As you have seen so far. Even with sequence
there is one pattern that is always the same.
You first map
a list, then you use sequence
on it. traverse
is just the idea to combine
both operations into a single operation.
If that sounds complicated, it isn’t at all! Just think for a moment. map
just means we apply
a function to every element before we use sequence
. So the only thing we need to
implement traverse
is to make sure we call a function that transforms every element
before we pass it to our lifted function.
|
|
The difference is so minimal that it can even be overlooked easily. We added the function
call between the <*>
operators: ... <*> f x <*> xs
. Instead of map
and then calling
sequence
, we now just can use traverse
instead.
|
|
If the logic seems still hard to follow. We just can think of traverse
as a map
function
for monadic functions that additionally swaps the layer when it finishes. When we use
|
|
We get a list<option<'b>>
. But when we use
|
|
we get a option<list<'b>>
.
Sequence defined through traverse
The primary reason why you less likely implement sequence
is because traverse
is
basically the same implementation. You can enhance a sequence
implementation easily
by just adding a function call to the element before you apply
it.
Once you have a traverse
function, you can very easily create sequence
by just
using the id
function with traverse
.
|
|
You could even come to the conclusion to not provide a sequence
implementation at all.
Finishing the Example
With traverse
we now can easily finish our example that we started.
|
|
This code now produces:
Sum: 5902
Error: Some inputs were not numbers!
Not limited to Option
It is in general important to understand that this concept is not limited to list
and option
. The only thing we need is a data-structure that has a fold
function,
and another type that is an Applicative Functor (has return
and a apply
function).
After we have those the general idea is to swap both layers.
For example when we have a monadic function download
that has the signature
Uri -> Async<string>
we expect that we can use this function on a list<Uri>
.
With List.map
we would get a
list<Async<string>>
But when we use traverse
we get a
Async<list<string>>
We don’t need to write such a function for the Async
type as we can use Async.Parallel
.
Async.Parallel
is basically the sequence
function. It takes a seq<Async<'a>>
and turns
it into an Async containing an array Async<'b []>
. But anyway to see how it works we could
extend the Async
module with the needed functions.
|
|
We now can use Async.traverse
in the following way
|
|
With Async.traverse
we get a single Async that only completes once all Uris are downloaded.
As you can see from the implementation. Async.traverse
is identical to the version that we
wrote for the option
type. We just have to set retn
and <*>
to functions that work
with Async
.