List programming
- First we will do a few more recursive functions over lists
- Then we will show how the
Core.List
library functions allow a great many (most?) operations to be written without recursion - This is combinator programming, solve a task by composing operations on a few standard combinators (“combiners”)
- (Note for Assignment 1 Part I you will not be able to use the library combinators, they are for Part II)
Reversing a list
- Let us write a somewhat more interesting function, reversing a list.
- Lists are immutable so it is going to create a completely new list, not change the original.
- This style of programming is called “Data structure corresponds to control flow” - the program needs to touch and reconstruct the whole data structure as it runs.
let rec rev l =
match l with
| [] -> []
| hd :: tl -> rev tl @ [hd]
;;
rev [1;2;3];; (* recall input list is the tree 1 :: ( 2 :: ( 3 :: [])) *)
- Correctness of a recursive function by induction: assume recursive call does what you expect in arguing it is overall correct.
- For this example, can assume
rev tl
always reverses the tail of the list,- (e.g. in computing
rev [1;2;3]
we matchhd
=1
andtl
=[2;3]
and can assumerev [2;3]
=[3;2]
)
- (e.g. in computing
- Given that fact,
rev tl @ [hd]
should clearly reverse the whole list- (e.g.
[3;2] @ [1]
=[3;2;1]
for the example)
- (e.g.
- QED, the function is proved correct! (actually partially correct, this induction argument does not rule out infinite loops)
Of course rev
is also in Core.List
since it is a common operation:
# List.rev [1;2;3];;
- : int list = [3; 2; 1]
Another Example: zero out all the negative elements in a list of numbers
- C solution:
for
-loop over it and mutate all negatives to 0 - OCaml immutable list solution: recurse on list structure, build the new list as we go
let rec zero_negs l =
match l with
| [] -> []
| hd :: tl -> (if hd < 0 then 0 else hd) :: zero_negs tl
in
zero_negs [1;-2;3];;
Core.List library functions
- We already saw a few of these previously, e.g.
List.rev
andList.nth
. List
is a module, think fancy package. It contains functions plus values plus types plus even other modulesList
is itself in the moduleCore
so the full name forrev
isCore.List.rev
- but we put an
open Core
in our.ocamlinit
(and in the template for A1) so you can just write e.g.List.rev
- but we put an
- (Note that
List.hd
is also available, but you should nearly always be pattern matching to take apart lists; don’t useList.hd
on the homework.) - Let us peek at the documentation
Core.List
to see what is available; we will cover a few of them now.
Some simple but very handy List
library functions
List.length ["d";"ss";"qwqw"];;
List.is_empty [];;
List.last_exn [1;2;3];; (* get last element; raises an exception if list is empty *)
List.join [[1;2];[22;33];[444;5555]];;
List.append [1;2] [3;4];; (* Usually the infix @ syntax is used for append *)
… And their types
- The types of the functions are additional hints to their purpose, get used to reading them
- Much of the time when you mis-use a function you will get a type error
- Recall that
'a list
etc is a polymorphic aka generic type,'a
can be any type
# List.length;;
- : 'a list -> int = <fun>
# List.is_empty;;
- : 'a list -> bool = <fun>
# List.last_exn;;
- : 'a list -> 'a = <fun>
# List.join;;
- : 'a list list -> 'a list = <fun>
# List.append;;
- : 'a list -> 'a list -> 'a list = <fun>
# List.map;; (* We will do this one below *)
- : 'a list -> f:('a -> 'b) -> 'b list = <fun>
- We coded
nth
andrev
before, here is one more,join
:
let rec join (l: 'a list list) = match l with
| [] -> [] (* "joining together a list of no-lists is an empty list" *)
| l :: ls -> l @ join ls (* "by induction assume (join ls) will turn list-of-lists to single list" *)
OCaml tuples and some List
library functions using tuples
- Along with lists
[1;2;3]
OCaml has tuples,(1,2.,"3")
- It is like a fixed-length list, but tuple elements can have different types
- You can also pattern match on tuples
# (1,2.,"3");;
- : int * float * string = (1, 2., "3")
# [1,2,3];; (* a common error, parens not always needed so this is a singleton list of a 3-tuple, not a list of ints *)
- : (int * int * int) list = [(1, 2, 3)]
- Here is a simple function to break a list in half using the
List.split_n
function- a pair of lists is returned by
split_n
, dividing it at the nth position
- a pair of lists is returned by
let split_in_half l = List.split_n l (List.length l / 2);;
split_in_half [2;3;4;5;99];;
- Now, using the
List.cartesian_product
function we can make all possible pairs of (front,back) elements- (Also observe how these OCaml combinators have overlap with math combinators we already knew)
let all_front_back_pairs l =
let front, back = split_in_half l in
List.cartesian_product front back;; (* observe how let can itself pattern match pairs *)
val all_front_back_pairs : 'a list -> ('a * 'a) list = <fun>
# all_front_back_pairs [1;2;3;4;5;6];;
- : (int * int) list =
[(1, 4); (1, 5); (1, 6); (2, 4); (2, 5); (2, 6); (3, 4); (3, 5); (3, 6)]
- Fact: lists of pairs are isomorphic to pairs of lists (of the same length)
- zipping and unzipping library functions can convert between these two equivalent forms.
List.unzip @@ all_front_back_pairs [1;2;3;4;5;6];;
- Note the use of
@@
here, recall it is function application but with “loosest binding”, avoids need for parens - Here is an even cooler way to write the same thing, with pipe operation
|>
(based on shell pipe|
)
[1;2;3;4;5;6] |> all_front_back_pairs |> List.unzip;;
- In a series of pipes, the leftmost argument is data, and all the others are functions
- The data is fed into first function, output of first function fed as input to second, etc
- This is exactly what the shell
|
does with standard input / standard output. -
Please use pipes as much as possible on Assignment 2 - will make the code more readable
List.zip
is the opposite of unzip: take two lists and make a single list pairing elements
List.zip [1;2;3] [4;5;6];;
- : (int * int) list List.Or_unequal_lengths.t =
Core.List.Or_unequal_lengths.Ok [(1, 4); (2, 5); (3, 6)]
- The strange result type is dealing with the case where the lists supplied may not be same length
- This type and value are hard to read, let us take a crack at it.
((int * int) list) List.Or_unequal_lengths.t
is the proper parentheses.List.Or_unequal_lengths.t
is referring to the typet
found in theList.Or_unequal_lengths
module (a moduleOr_unequal_lengths
inside theList
module)- one of the great things about modules is they can also contain types (like C’s
.h
files but more principled)
- one of the great things about modules is they can also contain types (like C’s
- We can use the
#show_type
directive in the top loop to see whatt
actually is:
# #show_type List.Or_unequal_lengths.t;;
type 'a t = 'a List.Or_unequal_lengths.t = Ok of 'a | Unequal_lengths
- This means the value is either
Ok(..)
orUnequal_lenghts
, very similar toresult
oroption
- (Why don’t they just use one of those two here instead?? No idea!)
- The
'a
here is the type parameter, more on those later so don’t sweat it now - The latter case is for zipping lists of different lengths:
List.zip [1;2;3] [4;5];;
- : (int * int) list List.Or_unequal_lengths.t =
Core.List.Or_unequal_lengths.Unequal_lengths
- In the original same-length case we got the result from the first clause in this type,
Core.List.Or_unequal_lengths.Ok [(1, 4); (2, 5); (3, 6)]
. - Note
List.zip_exn
will just raise an exception for unequal-length lists, avoiding all of this wrapper ugliness- but in larger programs we want to avoid exceptions at a distance so it is often worth the suffering
zip/unzip and Currying
We should be able to zip and then unzip as a no-op, one should undo the other (we will use the _exn
version to avoid the above error wrapper issue).
List.unzip @@ List.zip_exn [1;2] [3;4];;
And the reverse should also work as it is an isomorphism:
List.zip_exn @@ List.unzip [(1, 3); (2, 4)];;
Line 1, characters 16-43:
Error: This expression has type int list * int list
but an expression was expected of type 'a list
- Oops! It fails. What happened here?
List.zip_exn
takes two curried arguments, lists to zip (its type is'a list -> 'b list -> ('a * 'b) list
), whereasList.unzip
returns a pair of lists.- No worries, we can write a wrapper (an adapter) turning
List.zip_exn
into a version taking a pair of lists:
let zip_pair (l,r) = List.zip_exn l r in
zip_pair @@ List.unzip [(1, 3); (2, 4)];;
[(1, 3); (2, 4)] |> List.unzip|> zip_pair ;; (* Pipe equivalent form *)
- Congratulations, we just wrote a fancy no-op function 😁
- The general principle here is a curried 2-argument function like
int -> int -> int
is isomorphic toint * int -> int
- The latter form looks more like a standard function taking multiple arguments and is the uncurried form.
- And we sometimes need to interconvert between the two representations
- This conversion is called uncurrying (curried to pair/triple/etc form) or currying (putting it into curried form)
Curry/Uncurry are themselves functions
- We can even write combinators which generically convert between these two forms - !
curry
- takes in uncurried 2-arg function and returns a curried versionuncurry
- takes in curried 2-arg function and returns an non-curried version
let curry f = fun x -> fun y -> f (x, y);;
let uncurry f = fun (x, y) -> f x y;;
Observe the types themselves in fact fully define their behavior:
curry : ('a * 'b -> 'c) -> 'a -> 'b -> 'c
uncurry : ('a -> 'b -> 'c) -> 'a * 'b -> 'c
We can now use our new combinator to build zip_pair
directly:
let zip_pair = uncurry @@ List.zip_exn;;
One last higher-order function: compose
Composition function g o f: take two functions, return their composition
let compose g f = (fun x -> g (f x));;
compose (fun x -> x+3) (fun x -> x*2) 10;;
- The type says it all again,
('a -> 'b) -> ('c -> 'a) -> 'c -> 'b
- Equivalent ways to code
compose
in OCaml:
let compose g f x = g (f x);;
let compose g f = (fun x -> g(f x));; (* this equivalent form reads more how you think of the "o" operation in math *)
let compose = (fun g -> (fun f -> (fun x -> g(f x))));;
let compose g f x = x |> f |> g;; (* This is the readability winner: feed x into f and f's result into g *)
- We can express the Zip/unzip composition explicitly with
compose
:
# (compose zip_pair List.unzip) [(1, 3); (2, 4)];;
- : (int * int) list = [(1, 3); (2, 4)]
List
module functions which take function arguments
- So far we have done the “easier” functions in
List
; the real meat are the functions taking other functions - Think of these as “recursion patterns”, they will recurse over the list so you don’t have to
let rec
- makes functional code a lot easier to read once you are familiar with these “recursion combinators”
- Lets warm up with
List.filter
: remove all elements not meeting a condition which we supply a function to check
List.filter [1;-1;2;-2;0] (fun x -> x >= 0);;
- Cool, we can “glue in” any checking function (boolean-returning, i.e. a predicate) and
List.filter
will do the rest - Note though that we got a strange warning on the above, “label f was omitted” - ??
- This is because
List.filter
has type'a list -> f:('a -> bool) -> 'a list
– thef:
is declaring a named argument - OCaml gives warnings if you leave off a name so please always use them
- We can put args out of order if we give name via
~f:
syntax:
List.filter ~f:(fun x -> x >= 0) [1;-1;2;-2;0];;
- And, since OCaml functions are Curried we can leave off the list argument to make a generic remove-negatives function.
let remove_negatives = List.filter ~f:(fun x -> x >= 0);;
remove_negatives [1;-1;2;-2;0];;
Note that you can either inline the function as a fun
or can declare it in advance:
let gtz x = x >= 0;;
List.filter ~f:gtz [1;-1;2;-2;0];;
Usually for short functions it is better to inline them, it makes the code more readable.
Let us use filter
to write a function determining if a list has any negative elements:
let has_negs l = l |> List.filter ~f:(fun x -> x < 0) |> List.is_empty |> not;;
- The example shows the power of pipelining, it is easier to see the processing order with
|>
- This is a common operation so there is a library function for it as well: does there exist any element in the list where predicate holds?
let has_negs l = List.exists ~f:(fun x -> x < 0) l;;
Similarly, List.for_all
checks if it holds for all elements.
List.map
List.map
is super powerful, apply some operation we supply to every element of a list:
# List.map ~f:(fun x -> x + 1) [1;-1;2;-2;0];;
- : int list = [2; 0; 3; -1; 1]
# List.map ~f:(fun x -> x >= 0) [1;-1;2;-2;0];;
- : bool list = [true; false; true; false; true]
List.map ~f:(fun (x,y) -> x + y) [(1,2);(3,4)];; (* turns list of number pairs into list of their sums *)
Folding
- OK, so far we have been cruising along on impulse power; its now time for warp speed!
- Some of the most powerful combinators are the folds:
fold_right
, andfold_left
aka simplyfold
. - They “fold together” list data using an operator.
- Think of fold as something you feed just the “base case” and the “recursive case” code to and it makes a recursive function for you.
- this recursive function will make one recursive call on the tail of the list
- Here for example is how we can turn a list of characters into a string with
fold_right
List.fold_right ['a';'b';'c'] ~init:"" ~f:(fun elt -> fun accum -> (Char.to_string elt)^accum);; (* computes "a"^("b"^("c"^"")) *)
- The base case is
~init
, the empty string ~f
is plugging in the code for the recursive callelt
is the current element of the listaccum
is the result of recursing on the tail of the list
Before showing potential code for List.fold_right
let’s manually implement the above use with let rec
to compare.
let rec char_list_to_string l =
match l with
| [] -> "" (* ~init above is "", plug it in as the base case *)
| elt :: elts -> (* as in the above we are calling the current list element `elt` *)
let accum = char_list_to_string elts in (* this is what `accum` is, the result of recursing on a shorter list *)
(Char.to_string elt)^accum (* now plug in the body of ~f as the calculation done on accum and elt *)
OK now lets show some potential code for List.fold_right
to show how we can pull out this base-case and recursion code as parameters ~init
and ~f
.
let rec fold_right l ~f ~init =
match l with
| [] -> init
| elt :: elts ->
let accum = fold_right elts ~f ~init in
f elt accum
- If we now plug in
""
for~init
and(fun elt -> fun accum -> (Char.to_string elt)^accum)
for~f
we get exactlychar_list_to_string
above.
fold_right ['a';'b';'c'] ~init:"" ~f:(fun elt -> fun accum -> (Char.to_string elt)^accum);;
- observe how we have to keep forwarding
~f
and~init
down the calls to make them available; we could have instead made an aux function without those:
let fold_right l ~f ~init =
let rec folder_aux l =
match l with
| [] -> init
| elt :: elts ->
let accum = folder_aux elts in
f elt accum in
folder_aux l
Here is another simple right fold to summate an integer list
List.fold_right ~f:(fun elt accum -> elt + accum) ~init:0 [3; 5; 7];; (* this computes 3 + (5 + (7 + 0)) *)
which more concisely could be written as
List.fold_right ~f:(+) ~init:0 [3; 5; 7];;
Left folding
- There is another way to fold: left fold!
- Notice in the above summate example the zero is on the right; that is why that is a right fold
- We could have instead summated as
((0 + 3) + 5) + 7
, with the zero on the left which is a fold left. List.fold_left
is the function, and it has synonymList.fold
which is because by default you should fold left for efficiency.
List.fold ~f:(fun accum elt -> accum + elt) ~init:0 [3; 5; 7];; (* this is ((0 + 3) + 5) + 7 *)
- Here since it is a fold left the accumulator is on the left (compare with folding right above)
- the arguments to
f
are swapped to make that more clear
- the arguments to
- Note that for
~f
being addition, folding left or right gives the same answer;- but, that is only because
+
happens to be commutative and associative.
- but, that is only because
- For example,
List.fold ~f:(-) ~init:0 [1;2]
is(0 - 1) - 2
is-3
andList.fold_right ~f:(-) ~init:0 [1;2]
is1 - (2 - 0)
is-1
Let us understand how left folding differs by again looking at an implementation for the char list to string function.
let rec char_list_to_string l accum =
match l with
| [] -> accum (* we are totally done at this point, `accum`` is the final result and just pop pop pop *)
| elt :: elts ->
char_list_to_string elts (accum^(Char.to_string elt));; (* we are computing the `~f` to accumulate result on the way *down* the recursion now *)
char_list_to_string ['a';'d'] "";; (* we need to prime the accum pump with "" here *)
Here is the general fold
, pulling out the ~f
in the above as a parameter.
Note that the accum
we call ~init
here since that is the exterior interface.
let rec fold l ~init ~f =
match l with
| [] -> init
| elt::elts -> fold elts ~init:(f init elt) ~f (*observe f is invoked **before** the call -- accumulating left-first *)
Summarize by looking at the types
- The type of
List.fold
is'a list -> init:'acc -> f:('acc -> 'a -> 'acc) -> 'acc
- The
'a
here is the type of the list elements - The
'acc
is the type of the result being accumulated
- The
- The type of
List.fold_right
is'a list -> f:('a -> 'acc -> 'acc) -> init:'acc -> 'acc
- Notice that the arguments to
f
are swapped here compared to thefold
left version
- Notice that the arguments to
More fold examples
- Folding can encapsulate most simple recursions over lists, so it can implement many of the library functions.
- We can for example implement
List.exists
above with map and fold:
let exists l ~f = (* Note: ~f is **declaring** a named argument f; ~f is shorthand for ~f:f *)
l
|> List.map ~f (* ~f alone as an argument is again shorthand for ~f:f *)
|> List.fold ~f:(||) ~init:false;; (* the List.map output is a list of booleans, just fold them up here *)
# exists ~f:(fun x -> x >= 0) [-1;-2];;
- : bool = false
# exists ~f:(fun x -> x >= 0) [1;-2];;
- : bool = true
In fact we can do this in one pass with just a fold:
let exists l ~f =
List.fold l ~f:(fun accum elt -> accum || f elt) ~init:false;;
Which hints that map
itself is easily definable with a fold
:
let map l ~f = List.fold ~f:(fun accum elt -> accum @ [f elt]) ~init:[] l
If you wanted to use fold_right
to build map it would be somewhat similar:
let map_right l ~f = List.fold_right ~f:(fun elt accum -> (f elt) :: accum) ~init:[] l;;
Note that map_right
is much more efficient, ::
takes unit time and @
is linear in size of left list.
Folding and efficiency
Let us review left vs right folding and reflect on efficiency. Here are two implementations similar to the above ones:
let rec fold_right ~f l ~init =
match l with
| [] -> init
| hd::tl -> f hd (fold_right ~f tl ~init) (* observe it is invoking f **after** the recursive call *)
vs
let rec fold_left l ~init ~f =
match l with
| [] -> init
| hd::tl -> fold tl ~init:(f init hd) ~f (*observe f is invoked **before** the call -- accumulating left-first *)
- Note that the first parameter to f in
fold_left
is the accumulated value passed down and the second parameter is the current list value - In
fold_right
on the other hand thef
computation happens after the recursive call is complete. - Fold left/right are good example contrasts of how you can accumulate a value up (
fold_right
) vs down (fold_left
) the recursion
fold_left
is in fact more efficient than fold_right
so it is preferred all things being equal:
- Observe how the value of the
fold_left
function above is what is directly returned from the base case, it bubbles all the way out - Such a function is tail recursive
- The compiler doesn’t need to use a call stack for such functions since nothing happens upon return
- so it replaces push/pop with jumps in and one jump out when done – its just a loop.
- Important when lists get really long that you don’t use stack unless required.
- Observe that
fold_right
is not tail recursive, so it needs the stack and will be slower
fold_until
- Let us end on perhaps the most powerful
List
combinator of all,fold_until
. - This is an extention to
fold
adding the functionality ofbreak
of C etc looping but in a functional style.
let summate_til_zero l =
List.fold_until l ~init:0
~f:(fun acc i -> match i, acc with
| 0, sum -> Stop sum
| _, sum -> Continue (i + sum))
~finish:Fn.id
let stz_example = summate_til_zero [1;2;3;4;0;5;6;7;8;9;10]
- The
Stop
variant is like break, here takesum
as the final value Continue
wraps the continue-folding case, which addsi
to runningsum
here.~finish
can post-process the result if theStop
case was not hit;Fn.id
isfun x -> x
, no additional processing here.