List programming

  • First we will do a few more recursive functions over lists
  • Then we will show how the Core.List library functions allow a great many (most?) operations to be written without recursion

Reversing a list

  • Let us write a somewhat more interesting function, reversing a list.
  • Lists are immutable so it is going to create a completely new list, not change the original.
  • This style of programming is called “Data structure corresponds to control flow” - the program needs to touch and reconstruct the whole data structure as it runs.
let rec rev l =
  match l with
  |  [] -> []
  |  x :: xs -> rev xs @ [x]
;;
rev [1;2;3];; (* recall input list is the tree 1 :: ( 2 :: ( 3 :: [])) *)
  • Correctness of a recursive function by induction: assume recursive call does what you expect in arguing it is overall correct.
  • For this example, can assume rev xs always reverses the tail of the list,
    • (e.g. in computing rev [1;2;3] we match x = 1 and xs = [2;3] and can assume rev [2;3] = [3;2] )
  • Given that fact, rev xs @ [x] should clearly reverse the whole list
    • (e.g. [3;2] @ [1] = [3;2;1] for the example)
  • QED, the function is proved correct! (actually partially correct, this induction argument does not rule out infinite loops)

Of course rev is also in Core.List since it is a common operation:

# List.rev [1;2;3];;
- : int list = [3; 2; 1]

Another Example: zero out all the negative elements in a list of numbers

  • C solution: for-loop over it and mutate all negatives to 0
  • OCaml immutable list solution: recurse on list structure, building the new list as we go
let rec zero_negs l =
  match l with
  |  [] -> []
  |  x :: xs -> (if x < 0 then 0 else x) :: zero_negs xs
in
zero_negs [1;-2;3];;

Core.List library functions

  • We already saw a few of these previously, e.g. List.rev and List.nth.
  • List is a module, think fancy package. It contains functions plus values plus types plus even other modules
  • List is itself in the module Core so the full name for rev is Core.List.rev
    • but we put an open Core in our .ocamlinit (and in the template for A1) so you can just write e.g. List.rev
  • Note that the Core module extends (think subclasses) Base, look in Base.List for the List documentation.
  • (Note that List.hd is also available, but you should nearly always be pattern matching to take apart lists; don’t use List.hd on the homework.)
  • Let us peek at the documentation for Base.List to see what is available; we will cover a few of them now.

Some simple but very handy List library functions

List.length ["d";"ss";"qwqw"];;
List.is_empty [];;
List.last_exn [1;2;3];; (* get last element; raises an exception if list is empty *)
List.join [[1;2];[22;33];[444;5555]];;
List.append [1;2] [3;4];; (* Usually the infix @ syntax is used for append *)

… And their types

  • The types of the functions are additional hints to their purpose, get used to reading them
  • Much of the time when you mis-use a function you will get a type error
  • Recall that 'a list etc is a polymorhic aka generic type, 'a can be any type
    ```ocaml

    List.length;;

  • : ‘a list -> int = <fun>

    List.is_empty;;

  • : ‘a list -> bool = <fun>

    List.last_exn;;

  • : ‘a list -> ‘a = <fun>

    List.join;;

  • : ‘a list list -> ‘a list = <fun>

    List.append;;

  • : ‘a list -> ‘a list -> ‘a list = <fun>

    List.map;; (* We will do this one below, but type gives it away *)

  • : ‘a list -> f:(‘a -> ‘b) -> ‘b list = <fun>
    ```

  • We coded nth and rev before, here is one more, join:
let rec join l = match l with
  | [] -> [] (* "joining together a lists of no-lists is an empty list" *)
  | l :: ls -> l @ join ls (* " by induction assume (join ls) will turn list-of-lists to single list" *)

OCaml tuples and some List library functions using tuples

  • Along with lists [1;2;3] OCaml has tuples, (1,2.,"3")
  • It is like a fixed-length list, but tuple elements can have different types
  • You can also pattern match on tuples
# (1,2.,"3");;
- : int * float * string = (1, 2., "3")
# [1,2,3];; (* a common error, parens not always needed so this is a singleton list of a 3-tuple, not a list of ints *)
- : (int * int * int) list = [(1, 2, 3)]
  • Here is a simple function to break a list in half using the List.split_n function
    • a pair of lists is returned by split_n, dividing it at the nth position
let split_in_half l = List.split_n l (List.length l / 2);;
split_in_half [2;3;4;5;99];;
  • Now, using the List.cartesian_product function we can make all possible pairs of (front,back) elements
let all_front_back_pairs l = 
  let front, back = split_in_half l in 
    List.cartesian_product front back;; (* observe how let can itself pattern match pairs *)
val all_front_back_pairs : 'a list -> ('a * 'a) list = &lt;fun>
# all_front_back_pairs [1;2;3;4;5;6];;
- : (int * int) list =
[(1, 4); (1, 5); (1, 6); (2, 4); (2, 5); (2, 6); (3, 4); (3, 5); (3, 6)]
  • Fact: lists of pairs are isomorphic to pairs of lists (of the same length)
  • zipping and unzipping library functions can convert between these two equivalent forms.
List.unzip @@ all_front_back_pairs [1;2;3;4;5;6];;
  • Note the use of @@ here, recall it is function application but with “loosest binding”, avoids need for parens
  • Here is an even cooler way to write the same thing, with pipe operation |> (based on shell pipe |)
[1;2;3;4;5;6] |> all_front_back_pairs |> List.unzip;;
  • In a series of pipes, the leftmost argument is data, and all the others are functions
  • The data is fed into first function, output of first function fed as input to second, etc
  • This is exactly what the shell | does with standard input / standard output.

  • List.zip is the opposite of unzip:
List.zip [1;2;3] [4;5;6];;
- : (int * int) list List.Or_unequal_lengths.t =
Core.List.Or_unequal_lengths.Ok [(1, 4); (2, 5); (3, 6)]
  • The strange result is dealing with the case where the lists supplied may not be same length
  • This type and value are hard to read, let us take a crack at it.
  • ((int * int) list) List.Or_unequal_lengths.t is the proper parentheses.
  • List.Or_unequal_lengths.t is referring to the type t found in the List.Or_unequal_lengths module (a small module within the List module)
  • We can use the #show_type directive in the top loop to see what t actually is:
    # #show_type List.Or_unequal_lengths.t;;
    type nonrec 'a t = 'a List.Or_unequal_lengths.t = Ok of 'a | Unequal_lengths
    
  • This means the value is either Ok(..) or Unequal_lenghts, very similar to result
  • The latter case is for zipping lists of different lengths:
List.zip [1;2;3] [4;5];;
- : (int * int) list List.Or_unequal_lengths.t =
Core.List.Or_unequal_lengths.Unequal_lengths
  • In the original same-length case we got the result from the first clause in this type, Core.List.Or_unequal_lengths.Ok [(1, 4); (2, 5); (3, 6)].
  • They should have just used the result type here, these values and types are ugly!!
  • Note List.zip_exn will just raise an exception for unequal-length lists, avoiding all of this wrapper ugliness
    • but in larger programs we really want to avoid exceptions at a distance so it is often worth the suffering

zip/unzip and Currying

We should be able to zip and then unzip as a no-op, one should undo the other (we will use the _exn version to avoid the above error wrapper issue).

List.unzip @@ List.zip_exn [1;2] [3;4];;

And the reverse should also work as it is an isomorphism:

List.zip_exn @@ List.unzip [(1, 3); (2, 4)];;
Line 1, characters 16-43:
Error: This expression has type int list * int list
       but an expression was expected of type 'a list
  • Oops! It fails. What happened here?
  • List.zip_exn takes two curried arguments, lists to zip (its type is 'a list -> 'b list -> ('a * 'b) list ), whereas List.unzip returns a pair of lists.
  • No worries, we can write a wrapper (an adapter) turning List.zip_exn into a version taking a pair of lists:
let zip_pair (l,r) = List.zip_exn l r in 
zip_pair @@ List.unzip [(1, 3); (2, 4)];;
[(1, 3); (2, 4)] |> List.unzip|> zip_pair ;; (* Pipe equivalent form *)
  • Congratulations, we just wrote a fancy no-op function :smile:
  • The general principle here is a curried 2-argument function like int -> int -> int is isomorphic to int * int -> int
  • The latter form looks more like a standard function taking multiple arguments and is the uncurried form.
  • And we sometimes need to interconvert between the two representations
  • This conversion is called uncurrying (curried to pair/triple/etc form) or currying (putting it into curried form)

Curry/Uncurry are themselves functions

  • We can even write functions which generically convert between these two forms - !
  • curry - takes in uncurried 2-arg function and returns a curried version
  • uncurry - takes in curried 2-arg function and returns an non-curried version
let curry f = fun x -> fun y -> f (x, y);;
let uncurry f = fun (x, y) -> f x y;;

Observe the types themselves in fact fully define their behavior:

curry : ('a * 'b -> 'c) -> 'a -> 'b -> 'c
uncurry : ('a -> 'b -> 'c) -> 'a * 'b -> 'c

We can now use them to build zip_pair directly:

let zip_pair  = uncurry @@ List.zip_exn;;

One last higher-order function: compose

Composition function g o f: take two functions, return their composition

let compose g f = (fun x -> g (f x));;
compose (fun x -> x+3) (fun x -> x*2) 10;;
  • The type says it all again, ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b
  • Equivalent ways to code compose in OCaml:
let compose g f x =  g (f x);;
let compose g f x =  x |> f |> g;; (* This is the readability winner: feed x into f and f's result into g *)
let compose = (fun g -> (fun f -> (fun x -> g(f x))));;
  • We can express the Zip/unzip composition explicitly with compose:
# (compose zip_pair List.unzip) [(1, 3); (2, 4)];;
- : (int * int) list = [(1, 3); (2, 4)]

List functions which take function arguments

  • So far we have done the “easier” functions in List; the real meat are the functions taking other functions
  • Lets warm up with List.filter: remove all elements not meeting a condition which we supply a function to check
List.filter [1;-1;2;-2;0] (fun x -> x >= 0);;
  • Cool, we can “glue in” any checking function (boolean-returning, i.e. a predicate) and List.filter will do the rest
  • Observe List.filter has type 'a list -> f:('a -> bool) -> 'a list – the f is a named argument, we can put args out of order if we give name via ~f: syntax:
    List.filter ~f:(fun x -> x >= 0) [1;-1;2;-2;0];;
    
  • And, since OCaml functions are Curried we can leave off the list argument to make a generic remove-negatives function.
    let remove_negatives = List.filter ~f:(fun x -> x >= 0);;
    remove_negatives  [1;-1;2;-2;0];;
    

Let us use filter to write a function determining if a list has any negative elements:

let has_negs l = not (l |> List.filter ~f:(fun x -> x &lt; 0) |> List.is_empty);;
  • The example shows the power of pipelining, it is easier to see the processing order with |>
  • This is a common operation so there is a library function for it as well: does there exist any element in the list where predicate holds?
let has_negs l = List.exists ~f:(fun x -> x &lt; 0) l;;

Similarly, List.for_all checks if it holds for all elements.

List.map

  • List.map is super cool, apply some operation we supply to every element of a list:
# List.map ~f:(fun x -> x + 1) [1;-1;2;-2;0];;
- : int list = [2; 0; 3; -1; 1]
# List.map ~f:(fun x -> x >= 0) [1;-1;2;-2;0];;
- : bool list = [true; false; true; false; true]
List.map ~f:(fun (x,y) -> x + y) [(1,2);(3,4)];; (* turns list of number pairs into list of their sums *)

Folding

  • Observe the for_all and exists functions can be viewed as just mapping over the predicate like in the previous, and inserting an “and” (for all) or an “or” (exists) between each list element.
  • The fold library functions do exactly that. Here for example is List.fold_right at work
let exists ~f l =  (* Note the ~f is **declaring** a named argument f, we were only using pre-declared ones above *)
  let bool_result_list = List.map ~f:f l in
  List.fold_right bool_result_list ~f:(||) ~init:false;;
# exists ~f:(fun x -> x >= 0) [-1;-2];;
- : bool = false
# exists ~f:(fun x -> x >= 0) [1;-2];;
- : bool = true
  • The ~f parameter is the operation to put between list elements, disjunction || in this example;
  • The ~init is needed because || is a binary operator so an initial value is needed to fold
  • For fold_right the ~init is on the right, that is why it is called a “fold right”:
# List.fold_right ~f:(||) ~init:false [true; false];; (* this is true || (false || (false)), the final false the ~init *)
- : bool = true
  • List.fold_left aka List.fold (use the latter syntax generally) puts the ~init on the left:
    ```ocaml

    List.fold ~f:(||) ~init:false [true; false];; (* this is false || (true || false), the FIRST false is the ~init *)

  • : bool = true
    ```

  • Note that in this case folding left or right gives the same answer;
    • that is because || is commutative and associative, so e.g. true || (false || (false) = false || (true || false).
  • But e.g. List.fold ~f:(-) ~init:0 [1;2;3] is -6 and List.fold_right ~f:(-) ~init:0 [1;2;3] is 2
  • Folding left is preferred, it is tail recursive and can be optimized (more on this later)

fold_until

  • Let us end on perhaps the most powerful List combinator of all, fold_until.
  • This is an extention to fold adding the functionality of break of C etc looping but in a functional style.
let summate_til_zero l =
  List.fold_until l ~init:0
    ~f:(fun acc i -> match i, acc with
        | 0, sum -> Stop sum
        | _, sum -> Continue (i + sum))
    ~finish:Fn.id
let stz_example = summate_til_zero [1;2;3;4;0;5;6;7;8;9;10]
  • The Stop variant is like break, here take sum as the final value
  • Continue wraps the continue-folding case, which adds i to running sum here.
  • ~finish can post-process the result if the Stop case was not hit; Fn.id is fun x -> x, no additional processing here.