List programming
Let’s warm up by writing a few more recursive functions on lists.
Reversing a list
- Since lists are immutable, reverse will create a completely new list.
- This style of programming is called “Data structure corresponds to control flow” - the program needs to touch and reconstruct the whole data structure as it runs.
let rec rev l =
match l with
| [] -> []
| hd :: tl -> rev tl @ [hd]
;;
rev [1;2;3];; (* recall this list is 1 :: [2;3] which is the tree 1 :: ( 2 :: ( 3 :: [])) *)
- Correctness of a recursive function follows by induction: assume recursive call does what you expect in arguing it is overall correct.
- For this example, can assume
rev tl
always reverses the tail of the list,- (e.g. in computing
rev [1;2;3]
we matchhd
=1
andtl
=[2;3]
and can assumerev [2;3]
=[3;2]
)
- (e.g. in computing
- Given that fact, the code
rev tl @ [hd]
should clearly reverse the whole list- (e.g.
rev [2;3] @ [1] = [3;2] @ [1]
=[3;2;1]
)
- (e.g.
- QED, the function is proved correct! (actually partially correct, this induction argument does not rule out infinite loops)
rev
is also in Core.List
since it is a common operation:
# List.rev [1;2;3];;
- : int list = [3; 2; 1]
Core.List library functions
- We already saw a few of these previously, e.g.
List.rev
andList.nth
. List
is a module, think fancy package. It contains functions plus values plus types plus even other modulesList
is itself in the moduleCore
so the full name forrev
isCore.List.rev
- but we put an
open Core
in our.ocamlinit
(and in the template for A1) so you can just write e.g.List.rev
- but we put an
- (Note that
List.hd
is also available, but you should nearly always be pattern matching to take apart lists; don’t useList.hd
on the homework.) - (Also, read the homeworks carefully, on A1 you cannot use
List...
functions and on some questions of A2 you must use theList...
functions.) - Let us peek at the documentation
Core.List
to see what is available; we will cover a few of them now.
Some handy List
library functions
List.length ["d";"ss";"qwqw"];;
List.is_empty [];;
List.last_exn [1;2;3];; (* gets last element; raises an exception if list is empty *)
List.last [1;2;3];; (* alternate to previous which returns an option type, `None` on empty list *)
List.join [[1;2];[22;33];[444;5555]];; (* squiiiiish! *)
List.append [1;2] [3;4];; (* Note you should use the more convenient infix @ syntax for listappend *)
… And their types
- Types of functions are additional hints to their purpose, get used to reading them
- Much of the time when you mis-use a function you will get a type error
-
Recall that
'a list
etc is a polymorphic aka generic type,'a
can be any type# List.length;; - : 'a list -> int = <fun> (* "for ANY type 'a, List.length will take a list of 'a and return an integer" *) # List.is_empty;; - : 'a list -> bool = <fun> # List.last_exn;; - : 'a list -> 'a = <fun> # List.last;; - : 'a list -> 'a option = <fun> # List.join;; - : 'a list list -> 'a list = <fun> # List.append;; - : 'a list -> 'a list -> 'a list = <fun> # List.map;; (* Foreshadowing; we will review this function below *) - : 'a list -> f:('a -> 'b) -> 'b list = <fun> (* takes in a function; "f:" is a named argument, more below *)
-
We coded
nth
andrev
previously; here is one more,join
:let rec join (l: 'a list list) = match l with | [] -> [] (* "joining together a list of no-lists is an empty list" *) | l :: ls -> l @ join ls (* by induction assume (join ls) will turn a list-of-lists into a single list *)
OCaml tuples and some List
library functions using tuples
- Along with lists
[1;2;3]
OCaml has tuples,(1,2.,"3")
- It is like a fixed-length list, but tuple elements can have different types
-
You can also pattern match on tuples
# (1,2.,"3");; - : int * float * string = (1, 2., "3") # [1,2,3];; (* a common error, parens not always needed so this is a singleton list of a 3-tuple, not a list of ints *) - : (int * int * int) list = [(1, 2, 3)]
- Here is a simple function to break a list in half using the
List.split_n
function- a pair of lists is returned by
split_n
, dividing it at the nth position
let split_in_half l = List.split_n l (List.length l / 2);; split_in_half [2;3;4;5;99];;
- a pair of lists is returned by
- Now, using the
List.cartesian_product
function we can make all possible pairs of (front,back) elements- (Also observe how these OCaml combinators have overlap with math combinators we already knew)
let all_front_back_pairs l = let front, back = split_in_half l in List.cartesian_product front back;; (* observe how let can itself pattern match pairs *) val all_front_back_pairs : 'a list -> ('a * 'a) list = <fun> (* return type is a *list of pairs* *) # all_front_back_pairs [1;2;3;4;5;6];; - : (int * int) list = [(1, 4); (1, 5); (1, 6); (2, 4); (2, 5); (2, 6); (3, 4); (3, 5); (3, 6)]
- Fact: lists-of-pairs are isomorphic to pairs-of-lists (of the same length)
-
zipping and unzipping library functions can convert between these two equivalent forms.
List.unzip @@ all_front_back_pairs [1;2;3;4;5;6];; (* returns a pair-of-lists *)
- Note the use of
@@
here, recall it is function application but with “loosest binding”, avoids need for parens -
Here is a cooler way to write the same thing, with pipe operation
|>
(based on shell pipe|
)[1;2;3;4;5;6] |> all_front_back_pairs |> List.unzip;;
- In a series of pipes, the leftmost argument is data, and all the others are functions
- The data is fed into first function, output of first function fed as input to second, etc
- its like an assembly line for building the result
- This is exactly what the shell
|
does with standard input / standard output. -
Please use pipes as much as possible on Assignment 2 - it will make the code more readable
-
List.zip
is the opposite of unzip: take two lists and make a single list pairing elementsList.zip [1;2;3] [4;5;6];; - : (int * int) list List.Or_unequal_lengths.t = Core.List.Or_unequal_lengths.Ok [(1, 4); (2, 5); (3, 6)]
- The strange result type is dealing with the case where the lists supplied may not be same length
- This type is the same as an
option
type, but withOk
replacingSome
andUnequal_lengths
replacingNone
- We will figure out why the type is a
List.Or_unequal_lengths.t
soon, you can ignore that mess for now. - Note
List.zip_exn
will just raise an exception for unequal-length lists, avoiding the wrapper ugliness- But, in larger programs we want to avoid exceptions at a distance so it is often worth wrapping
Zip/unzip and Currying
We should be able to zip and then unzip as a no-op, one should undo the other (we will use the _exn
version here to avoid having to unwrap wrappers).
List.unzip @@ List.zip_exn [1;2] [3;4];;
And the reverse should also work as it is an isomorphism:
List.zip_exn @@ List.unzip [(1, 3); (2, 4)];;
Line 1, characters 16-43:
Error: This expression has type int list * int list
but an expression was expected of type 'a list
- Oops! It fails. What happened here?
List.zip_exn
takes two curried arguments, lists to zip (its type is'a list -> 'b list -> ('a * 'b) list
), whereasList.unzip
returns a pair of lists.-
No worries, we can write a wrapper (an adapter) turning
List.zip_exn
into a version taking a pair of lists:let zip_pair (l,r) = List.zip_exn l r in (* Notice there is ONE argument here, a PAIR pattern (l,r) *) zip_pair @@ List.unzip [(1, 3); (2, 4)];; (* should be a no-op *) [(1, 3); (2, 4)] |> List.unzip|> zip_pair ;; (* Pipe equivalent form *)
- Congratulations, we just wrote a fancy no-op function 😁
- The general principle here is a curried 2-argument function like
int -> int -> int
is isomorphic toint * int -> int
- The latter form looks more like a standard function taking multiple arguments and is the uncurried form.
- And we sometimes need to interconvert between the two representations
- This conversion is called uncurrying (curried to pair/triple/etc form) or currying (putting it into curried form)
Curry/Uncurry are themselves functions
- We can even write combinators which generically convert between these two forms - !
curry
- takes in uncurried 2-arg function and returns a curried versionuncurry
- takes in curried 2-arg function and returns an non-curried version
let curry f = fun x -> fun y -> f (x, y);;
let uncurry f = fun (x, y) -> f x y;;
Observe the types themselves in fact fully define their behavior:
curry : ('a * 'b -> 'c) -> 'a -> 'b -> 'c
uncurry : ('a -> 'b -> 'c) -> 'a * 'b -> 'c
We can now use our new combinator to build zip_pair
directly:
let zip_pair = uncurry List.zip_exn;;
One last higher-order function: compose
Composition function g o f: take two functions, return their composition
let compose g f = (fun x -> g (f x));;
compose (fun x -> x + 3) (fun x -> x * 2) 10;;
- The type says it all again,
('a -> 'b) -> ('c -> 'a) -> 'c -> 'b
- Equivalent ways to code
compose
in OCaml:
let compose g f x = g (f x);;
let compose g f = (fun x -> g(f x));; (* this equivalent form reads more how you think of the "o" operation in math *)
let compose = (fun g -> (fun f -> (fun x -> g(f x))));;
let compose g f x = x |> f |> g;; (* This is the readability winner: feed x into f and f's result into g *)
- We can express the Zip/unzip composition with
compose
:
# (compose zip_pair List.unzip) [(1, 3); (2, 4)];;
- : (int * int) list = [(1, 3); (2, 4)]
List
functions which take function arguments
- So far we have done the “easier” functions in
List
; the real meat are the functions taking other functions - Think of these as “recursion patterns”, they will recurse over the list so you don’t have to
let rec
- makes functional code a lot easier to read once you are familiar with these “recursion combinators”
- Lets warm up with
List.filter
: remove all list elements not meeting a condition which we supply a function to check
List.filter [1;-1;2;-2;0] (fun x -> x >= 0);;
- Cool, we can “glue in” any checking function (boolean-returning, i.e. a predicate) and
List.filter
will do the rest - Note though that we got a strange warning on the above, “label f was omitted” - ??
- This is because
List.filter
has type'a list -> f:('a -> bool) -> 'a list
– thef:
is declaring a named argument - The top loop gives warnings if you leave off a name (and, the compiler will error) so please always use them
- We can also put args out of order when we give name via
~f:
syntax:
List.filter ~f:(fun x -> x >= 0) [1;-1;2;-2;0];;
- And, since OCaml functions are Curried we can leave off the list argument to make a generic remove-negatives function.
let remove_negatives = List.filter ~f:(fun x -> x >= 0);;
remove_negatives [1;-1;2;-2;0];;
Note that you can either inline the function as an anonymous fun
or can declare it in advance:
let gtz x = x >= 0;;
List.filter ~f:gtz [1;-1;2;-2;0];;
Usually for short functions it is better to inline them as it is more concise
Let us use filter
to write a function determining if a list has any negative elements:
let has_negs l = l |> List.filter ~f:(fun x -> x < 0) |> List.is_empty |> not;;
- The example shows the power of pipelining, it is easier to see the processing order with
|>
- This is a common operation so there is a library function for it as well: does there exist any element in the list where predicate holds?
let has_negs l = List.exists ~f:(fun x -> x < 0) l;;
Similarly, List.for_all
checks if it holds for all elements.
List.map
List.map
is very powerful, apply some operation we supply to every element of a list making a new list:
# List.map ~f:(fun x -> x + 1) [1;-1;2;-2;0];;
- : int list = [2; 0; 3; -1; 1]
# List.map ~f:(fun x -> x >= 0) [1;-1;2;-2;0];;
- : bool list = [true; false; true; false; true]
List.map ~f:(fun (x,y) -> x + y) [(1,2);(3,4)];; (* turns list of number pairs into list of their sums *)
List.map ~f:(uncurry (+)) [(1,2);(3,4)];; (* equivalent: its an uncurried add function that is needed *)
Folding
- OK, so far we have been cruising along on impulse power; its now time for warp speed!
- The most powerful list combinators are the folds:
fold_right
, andfold_left
aka simplyfold
. - They “fold together” list data using an operator.
- Think of
fold
as something you feed a “base case” and “recursive case” code to, and it makes a recursive function for you.- this recursive function will make one recursive call on the tail of the list
-
Here for example is how we can turn a list of characters into a string with
fold_right
List.fold_right ['a';'b';'c'] ~init:"" ~f:(fun elt -> fun accum -> (Char.to_string elt)^accum);; (* computes "a"^("b"^("c"^"")) *)
- The base case is argument
~init
, the empty string here ~f
is how we plug in the code for the recursive callelt
is the current element of the listaccum
is going to be the result of recursing on the tail of the list (a string here)
Before showing potential code for List.fold_right
let’s manually implement char_list_to_string
with let rec
to compare.
let rec char_list_to_string l =
match l with
| [] -> "" (* ~init above is this "", plug it in as the base case *)
| elt :: elts -> (* as in the above we are calling the current list element `elt` *)
let accum = char_list_to_string elts in (* this is also what `accum` is above, the result of recursing on the tail *)
(Char.to_string elt)^accum (* same as the body of ~f above, the calculation done on accum and elt *)
- OK now lets code
fold_right
by taking the above code and making the""
and(Char.to_string elt)^accum
explicit parameters~init
and~f
respectively. - (since the code
(Char.to_string elt)^accum
refers toelt
andaccum
we also need to make them parameters,~f
will befun elt accum -> (Char.to_string elt)^accum
)
let rec fold_right l ~f ~init =
match l with
| [] -> init
| elt :: elts ->
let accum = fold_right elts ~f ~init in
f elt accum
- If we now plug in
""
for~init
and(fun elt accum -> (Char.to_string elt)^accum)
for~f
we getchar_list_to_string
above.
fold_right ['a';'b';'c'] ~init:"" ~f:(fun elt -> fun accum -> (Char.to_string elt)^accum);;
- Aside: observe how we have to keep forwarding
~f
and~init
down the recursion to keep them available; we could have instead made an aux function without those:
let fold_right l ~f ~init =
let rec folder_aux l =
match l with
| [] -> init
| elt :: elts ->
let accum = folder_aux elts in
f elt accum in
folder_aux l
Here is another simple right fold to summate an integer list
List.fold_right ~f:(fun elt accum -> elt + accum) ~init:0 [3; 5; 7];; (* this computes 3 + (5 + (7 + 0)) *)
which more concisely could be written as
List.fold_right ~f:(+) ~init:0 [3; 5; 7];;
Left folding
- There is another way to fold: left fold!
- Notice in the above summate example the zero is on the right; that is why that is a right fold
- We could have instead summated as
((0 + 3) + 5) + 7
, with the zero on the left which is a fold left. List.fold_left
is the function, and it has synonymList.fold
which is because by default you should fold left for efficiency.
List.fold ~f:(fun accum elt -> accum + elt) ~init:0 [3; 5; 7];; (* this is ((0 + 3) + 5) + 7 *)
- Here since it is a fold left the accumulator is on the left (compare with folding right above)
- the arguments to
f
are swapped to make that more clear
- the arguments to
- Note that for
~f
being addition, folding left or right gives the same answer;- but, that is only because
+
happens to be commutative and associative.
- but, that is only because
- For example,
List.fold ~f:(-) ~init:0 [1;2]
is(0 - 1) - 2
is-3
andList.fold_right ~f:(-) ~init:0 [1;2]
is1 - (2 - 0)
is-1
Let us understand how left folding differs by again looking at an implementation for the char list to string function.
let rec char_list_to_string l accum = (* invariant: accum is the accumulated result thus far *)
match l with
| [] -> accum (* we are totally done at this point, `accum`` is the final result and just pop pop pop *)
| elt :: elts ->
char_list_to_string elts (accum^(Char.to_string elt));; (* we are computing the `~f` to accumulate result on the way *down* the recursion *)
char_list_to_string ['a';'d'] "";; (* we need to prime the accum pump with "" here *)
Here is the general fold
, pulling out the ~f
in the above as a parameter.
Note that the accum
we call ~init
here since that is the exterior interface.
let rec fold l ~init ~f = (* Note: e.g. ~f is **declaring** a named argument f; ~f is shorthand for ~f:f *)
match l with
| [] -> init
| elt::elts -> fold elts ~init:(f init elt) ~f (*observe f is invoked **before** the call -- accumulating left-first *)
Summarize by looking at the types
- The type of
List.fold
is'a list -> init:'acc -> f:('acc -> 'a -> 'acc) -> 'acc
- The
'a
here is the type of the list elements - The
'acc
is the type of the result being accumulated
- The
- The type of
List.fold_right
is'a list -> f:('a -> 'acc -> 'acc) -> init:'acc -> 'acc
- (Notice that the arguments to
f
are swapped here compared to thefold
left version)
- (Notice that the arguments to
More fold examples
- Folding can encapsulate most simple recursions over lists, so it can implement many of the library functions.
- We can for example implement
List.exists
above with map and fold:
let exists l ~f =
l
|> List.map ~f
|> List.fold ~f:(||) ~init:false;; (* the List.map output is a list of booleans, just fold them up here *)
# exists ~f:(fun x -> x >= 0) [-1;-2];;
- : bool = false
# exists ~f:(fun x -> x >= 0) [1;-2];;
- : bool = true
In fact we can do this in one pass with just a fold:
let exists l ~f =
List.fold l ~f:(fun accum elt -> accum || f elt) ~init:false;;
Which hints that map
itself is definable with a fold
; we accumulate a new list here:
let map l ~f = List.fold ~f:(fun accum elt -> accum @ [f elt]) ~init:[] l
If you wanted to use fold_right
to build map it would be similar:
let map_right l ~f = List.fold_right ~f:(fun elt accum -> (f elt) :: accum) ~init:[] l;;
Note that map_right
is much more efficient, ::
takes unit time and @
is linear in size of left list.
Folding and efficiency
Let us review left vs right folding and reflect on efficiency. Here are two implementations similar to the above ones:
let rec fold_right ~f l ~init =
match l with
| [] -> init
| hd::tl -> f hd (fold_right ~f tl ~init) (* observe it is invoking f **after** the recursive call *)
vs
let rec fold_left l ~init ~f =
match l with
| [] -> init
| hd::tl -> fold tl ~init:(f init hd) ~f (*observe f is invoked **before** the call -- accumulating left-first *)
- Note that the first parameter to f in
fold_left
is the accumulated value passed down and the second parameter is the current list value - In
fold_right
on the other hand thef
computation happens after the recursive call is complete. - Fold left/right are good example contrasts of how you can accumulate a value up (
fold_right
) vs down (fold_left
) the recursion
fold_left
is in fact more efficient than fold_right
so it is preferred all things being equal:
- Observe how the value of the
fold_left
function above is what is directly returned from the base case, it bubbles all the way out - Such a function is tail recursive: there is no work to do after the (sole) recursive call finishes
- The compiler doesn’t need to use a call stack for such functions since nothing happens upon return
- there is nothing it needs to mark as a point to go back to
- so, it replaces push/pop with jumps in and one jump out when done – its just a loop.
- Important when lists get really long that you don’t use stack unless required.
- Observe that
fold_right
is not tail recursive, so it needs the stack and will be slower
fold_until
- Let us end on perhaps the most powerful
List
combinator of all,fold_until
. - This is an extention to
fold
adding the functionality ofbreak
of C etc looping but in a functional style.
let summate_til_zero l =
List.fold_until l ~init:0
~f:(fun acc i -> match i, acc with
| 0, sum -> Stop sum
| _, sum -> Continue (i + sum))
~finish:Fn.id
let stz_example = summate_til_zero [1;2;3;4;0;5;6;7;8;9;10]
- The
Stop
variant is like break, here takesum
as the final value Continue
wraps the continue-folding case, which addsi
to runningsum
here.~finish
can post-process the result if theStop
case was not hit;Fn.id
isfun x -> x
, no additional processing here.- The
~finish
exists so that theStop
andContinue
cases can hold different types:Continue
holds an'acc
, andStop
holds a'final
. - If we continued until the very end of the list, we need to tell it how to turn an
'acc
into a'final
. - In
summate_til_zero
, both wereint
, so we can do nothing.
- The