Functional Programming and Software Engineering

List programming

Let’s warm up by writing a few more recursive functions on lists.

Reversing a list

Since lists are immutable, reverse will create a completely new list.
This style of programming is called “Data structure corresponds to control flow” - the program needs to touch and reconstruct the whole data structure as it runs.

let rec rev l =
  match l with
  |  [] -> []
  |  hd :: tl -> rev tl @ [hd]
;;
rev [1;2;3];; (* recall this list is 1 :: [2;3] which is the tree 1 :: ( 2 :: ( 3 :: [])) *)

Correctness of a recursive function follows by induction: assume recursive call does what you expect in arguing it is overall correct.
For this example, can assume rev tl always reverses the tail of the list,
- (e.g. in computing rev [1;2;3] we match hd = 1 and tl = [2;3] and can assume rev [2;3] = [3;2] )
Given that fact, the code rev tl @ [hd] should clearly reverse the whole list
- (e.g. rev [2;3] @ [1] = [3;2] @ [1] = [3;2;1])
QED, the function is proved correct! (actually partially correct, this induction argument does not rule out infinite loops)

rev is also in Core.List since it is a common operation:

# List.rev [1;2;3];;
- : int list = [3; 2; 1]

Core.List library functions

We already saw a few of these previously, e.g. List.rev and List.nth.
List is a module, think fancy package. It contains functions plus values plus types plus even other modules
List is itself in the module Core so the full name for rev is Core.List.rev
- but we put an open Core in our .ocamlinit (and in the template for A1) so you can just write e.g. List.rev
(Note that List.hd is also available, but you should nearly always be pattern matching to take apart lists; don’t use List.hd on the homework.)
(Also, read the homeworks carefully, on A1 you cannot use List... functions and on some questions of A2 you must use the List... functions.)
Let us peek at the documentation Core.List to see what is available; we will cover a few of them now.

Some handy `List` library functions

List.length ["d";"ss";"qwqw"];;
List.is_empty [];;
List.last_exn [1;2;3];; (* gets last element; raises an exception if list is empty *)
List.last [1;2;3];; (* alternate to previous which returns an option type, `None` on empty list *)
List.join [[1;2];[22;33];[444;5555]];; (* squiiiiish! *)
List.append [1;2] [3;4];; (* Note you should use the more convenient infix @ syntax for listappend *)

… And their types

Types of functions are additional hints to their purpose, get used to reading them
Much of the time when you mis-use a function you will get a type error

Recall that 'a list etc is a polymorphic aka generic type, 'a can be any type

# List.length;;
- : 'a list -> int = <fun> (* "for ANY type 'a, List.length will take a list of 'a and return an integer" *)
# List.is_empty;;
- : 'a list -> bool = <fun>
# List.last_exn;;
- : 'a list -> 'a = <fun>
# List.last;;
- : 'a list -> 'a option = <fun>
# List.join;;
- : 'a list list -> 'a list = <fun>
# List.append;;
- : 'a list -> 'a list -> 'a list = <fun>
# List.map;;  (* Foreshadowing; we will review this function below *)
- : 'a list -> f:('a -> 'b) -> 'b list = <fun> (* takes in a function; "f:" is a named argument, more below *)

We coded nth and rev previously; here is one more, join:

let rec join (l: 'a list list) = match l with
  | [] -> [] (* "joining together a list of no-lists is an empty list" *)
  | l :: ls -> l @ join ls (* by induction assume (join ls) will turn a list-of-lists into a single list *)

OCaml tuples and some `List` library functions using tuples

Along with lists [1;2;3] OCaml has tuples, (1,2.,"3")
It is like a fixed-length list, but tuple elements can have different types

You can also pattern match on tuples

# (1,2.,"3");;
- : int * float * string = (1, 2., "3")
# [1,2,3];; (* a common error, parens not always needed so this is a singleton list of a 3-tuple, not a list of ints *)
- : (int * int * int) list = [(1, 2, 3)]

Here is a simple function to break a list in half using the List.split_n function
- a pair of lists is returned by split_n, dividing it at the nth position
```
let split_in_half l = List.split_n l (List.length l / 2);;
split_in_half [2;3;4;5;99];;
```

Now, using the List.cartesian_product function we can make all possible pairs of (front,back) elements

(Also observe how these OCaml combinators have overlap with math combinators we already knew)

let all_front_back_pairs l = 
  let front, back = split_in_half l in 
  List.cartesian_product front back;; (* observe how let can itself pattern match pairs *)
val all_front_back_pairs : 'a list -> ('a * 'a) list = <fun> (* return type is a *list of pairs* *)
# all_front_back_pairs [1;2;3;4;5;6];;
- : (int * int) list =
[(1, 4); (1, 5); (1, 6); (2, 4); (2, 5); (2, 6); (3, 4); (3, 5); (3, 6)]

Fact: lists-of-pairs are isomorphic to pairs-of-lists (of the same length)
zipping and unzipping library functions can convert between these two equivalent forms.
```
List.unzip @@ all_front_back_pairs [1;2;3;4;5;6];; (* returns a pair-of-lists *)
```
Note the use of @@ here, recall it is function application but with “loosest binding”, avoids need for parens
Here is a cooler way to write the same thing, with pipe operation |> (based on shell pipe |)
```
[1;2;3;4;5;6] |> all_front_back_pairs |> List.unzip;;
```
In a series of pipes, the leftmost argument is data, and all the others are functions
The data is fed into first function, output of first function fed as input to second, etc
- its like an assembly line for building the result
This is exactly what the shell | does with standard input / standard output.
Please use pipes as much as possible on Assignment 2 - it will make the code more readable

List.zip is the opposite of unzip: take two lists and make a single list pairing elements

List.zip [1;2;3] [4;5;6];;
- : (int * int) list List.Or_unequal_lengths.t =
Core.List.Or_unequal_lengths.Ok [(1, 4); (2, 5); (3, 6)]

The strange result type is dealing with the case where the lists supplied may not be same length
This type is the same as an option type, but with Ok replacing Some and Unequal_lengths replacing None
We will figure out why the type is a List.Or_unequal_lengths.t soon, you can ignore that mess for now.
Note List.zip_exn will just raise an exception for unequal-length lists, avoiding the wrapper ugliness
- But, in larger programs we want to avoid exceptions at a distance so it is often worth wrapping

Zip/unzip and Currying

We should be able to zip and then unzip as a no-op, one should undo the other (we will use the _exn version here to avoid having to unwrap wrappers).

List.unzip @@ List.zip_exn [1;2] [3;4];;

And the reverse should also work as it is an isomorphism:

List.zip_exn @@ List.unzip [(1, 3); (2, 4)];;
Line 1, characters 16-43:
Error: This expression has type int list * int list
       but an expression was expected of type 'a list

Oops! It fails. What happened here?
List.zip_exn takes two curried arguments, lists to zip (its type is 'a list -> 'b list -> ('a * 'b) list ), whereas List.unzip returns a pair of lists.

No worries, we can write a wrapper (an adapter) turning List.zip_exn into a version taking a pair of lists:

let zip_pair (l,r) = List.zip_exn l r in (* Notice there is ONE argument here, a PAIR pattern (l,r) *)
zip_pair @@ List.unzip [(1, 3); (2, 4)];; (* should be a no-op *)
[(1, 3); (2, 4)] |> List.unzip|> zip_pair ;; (* Pipe equivalent form *)

Congratulations, we just wrote a fancy no-op function 😁
The general principle here is a curried 2-argument function like int -> int -> int is isomorphic to int * int -> int
The latter form looks more like a standard function taking multiple arguments and is the uncurried form.
And we sometimes need to interconvert between the two representations
This conversion is called uncurrying (curried to pair/triple/etc form) or currying (putting it into curried form)

Curry/Uncurry are themselves functions

We can even write combinators which generically convert between these two forms - !
curry - takes in uncurried 2-arg function and returns a curried version
uncurry - takes in curried 2-arg function and returns an non-curried version

let curry f = fun x -> fun y -> f (x, y);;
let uncurry f = fun (x, y) -> f x y;;

Observe the types themselves in fact fully define their behavior:

curry : ('a * 'b -> 'c) -> 'a -> 'b -> 'c
uncurry : ('a -> 'b -> 'c) -> 'a * 'b -> 'c

We can now use our new combinator to build zip_pair directly:

let zip_pair  = uncurry List.zip_exn;;

One last higher-order function: compose

Composition function g o f: take two functions, return their composition

let compose g f = (fun x -> g (f x));;
compose (fun x -> x + 3) (fun x -> x * 2) 10;;

The type says it all again, ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b
Equivalent ways to code compose in OCaml:

let compose g f x =  g (f x);;
let compose g f = (fun x -> g(f x));; (* this equivalent form reads more how you think of the "o" operation in math *)
let compose = (fun g -> (fun f -> (fun x -> g(f x))));;
let compose g f x =  x |> f |> g;; (* This is the readability winner: feed x into f and f's result into g *)

We can express the Zip/unzip composition with compose:

# (compose zip_pair List.unzip) [(1, 3); (2, 4)];;
- : (int * int) list = [(1, 3); (2, 4)]

`List` functions which take function arguments

So far we have done the “easier” functions in List; the real meat are the functions taking other functions
Think of these as “recursion patterns”, they will recurse over the list so you don’t have to let rec
- makes functional code a lot easier to read once you are familiar with these “recursion combinators”
Lets warm up with List.filter: remove all list elements not meeting a condition which we supply a function to check

List.filter [1;-1;2;-2;0] (fun x -> x >= 0);;

Cool, we can “glue in” any checking function (boolean-returning, i.e. a predicate) and List.filter will do the rest
Note though that we got a strange warning on the above, “label f was omitted” - ??
This is because List.filter has type 'a list -> f:('a -> bool) -> 'a list – the f: is declaring a named argument
The top loop gives warnings if you leave off a name (and, the compiler will error) so please always use them
We can also put args out of order when we give name via ~f: syntax:

List.filter ~f:(fun x -> x >= 0) [1;-1;2;-2;0];;

And, since OCaml functions are Curried we can leave off the list argument to make a generic remove-negatives function.

let remove_negatives = List.filter ~f:(fun x -> x >= 0);;
remove_negatives  [1;-1;2;-2;0];;

Note that you can either inline the function as an anonymous fun or can declare it in advance:

let gtz x = x >= 0;;
List.filter ~f:gtz [1;-1;2;-2;0];;

Usually for short functions it is better to inline them as it is more concise

Let us use filter to write a function determining if a list has any negative elements:

let has_negs l = l |> List.filter ~f:(fun x -> x < 0) |> List.is_empty |> not;;

The example shows the power of pipelining, it is easier to see the processing order with |>
This is a common operation so there is a library function for it as well: does there exist any element in the list where predicate holds?

let has_negs l = List.exists ~f:(fun x -> x < 0) l;;

Similarly, List.for_all checks if it holds for all elements.

List.map

List.map is very powerful, apply some operation we supply to every element of a list making a new list:

# List.map ~f:(fun x -> x + 1) [1;-1;2;-2;0];;
- : int list = [2; 0; 3; -1; 1]
# List.map ~f:(fun x -> x >= 0) [1;-1;2;-2;0];;
- : bool list = [true; false; true; false; true]
List.map ~f:(fun (x,y) -> x + y) [(1,2);(3,4)];; (* turns list of number pairs into list of their sums *)
List.map ~f:(uncurry (+)) [(1,2);(3,4)];; (* equivalent: its an uncurried add function that is needed *)

Folding

OK, so far we have been cruising along on impulse power; its now time for warp speed!
The most powerful list combinators are the folds: fold_right, and fold_left aka simply fold.
They “fold together” list data using an operator.
Think of fold as something you feed a “base case” and “recursive case” code to, and it makes a recursive function for you.
- this recursive function will make one recursive call on the tail of the list

Here for example is how we can turn a list of characters into a string with fold_right

List.fold_right ['a';'b';'c'] ~init:"" ~f:(fun elt -> fun accum -> (Char.to_string elt)^accum);; (* computes "a"^("b"^("c"^"")) *)

The base case is argument ~init, the empty string here
~f is how we plug in the code for the recursive call
- elt is the current element of the list
- accum is going to be the result of recursing on the tail of the list (a string here)

Before showing potential code for List.fold_right let’s manually implement char_list_to_string with let rec to compare.

let rec char_list_to_string l =
  match l with 
  | [] -> "" (* ~init above is this "", plug it in as the base case *)
  | elt :: elts ->  (* as in the above we are calling the current list element `elt` *)
    let accum = char_list_to_string elts in (* this is also what `accum` is above, the result of recursing on the tail *)
    (Char.to_string elt)^accum (* same as the body of ~f above, the calculation done on accum and elt *)

OK now lets code fold_right by taking the above code and making the "" and (Char.to_string elt)^accum explicit parameters ~init and ~f respectively.
(since the code (Char.to_string elt)^accum refers to elt and accum we also need to make them parameters, ~f will be fun elt accum -> (Char.to_string elt)^accum)

let rec fold_right l ~f ~init =
  match l with
  | [] -> init
  | elt :: elts -> 
    let accum = fold_right elts ~f ~init in 
    f elt accum

If we now plug in "" for ~init and (fun elt accum -> (Char.to_string elt)^accum) for ~f we get char_list_to_string above.

fold_right ['a';'b';'c'] ~init:"" ~f:(fun elt -> fun accum -> (Char.to_string elt)^accum);;

Aside: observe how we have to keep forwarding ~f and ~init down the recursion to keep them available; we could have instead made an aux function without those:

let fold_right l ~f ~init =
  let rec folder_aux l = 
    match l with
    | [] -> init
    | elt :: elts -> 
      let accum = folder_aux elts in 
        f elt accum in
  folder_aux l

Here is another simple right fold to summate an integer list

List.fold_right ~f:(fun elt accum -> elt + accum) ~init:0 [3; 5; 7];;  (* this computes 3 + (5 + (7 + 0))  *)

which more concisely could be written as

List.fold_right ~f:(+) ~init:0 [3; 5; 7];;

Left folding

There is another way to fold: left fold!
Notice in the above summate example the zero is on the right; that is why that is a right fold
We could have instead summated as ((0 + 3) + 5) + 7, with the zero on the left which is a fold left.
List.fold_left is the function, and it has synonym List.fold which is because by default you should fold left for efficiency.

List.fold ~f:(fun accum elt -> accum + elt) ~init:0 [3; 5; 7];; (* this is ((0 + 3) + 5) + 7 *)

Here since it is a fold left the accumulator is on the left (compare with folding right above)
- the arguments to f are swapped to make that more clear
Note that for ~f being addition, folding left or right gives the same answer;
- but, that is only because + happens to be commutative and associative.
For example, List.fold ~f:(-) ~init:0 [1;2] is (0 - 1) - 2 is -3 and List.fold_right ~f:(-) ~init:0 [1;2] is 1 - (2 - 0) is -1

Let us understand how left folding differs by again looking at an implementation for the char list to string function.

let rec char_list_to_string l accum = (* invariant: accum is the accumulated result thus far *)
  match l with 
  | [] -> accum (* we are totally done at this point, `accum`` is the final result and just pop pop pop *)
  | elt :: elts -> 
    char_list_to_string elts (accum^(Char.to_string elt));;  (* we are computing the `~f` to accumulate result on the way *down* the recursion *)
char_list_to_string ['a';'d'] "";; (* we need to prime the accum pump with "" here *)

Here is the general fold, pulling out the ~f in the above as a parameter.
Note that the accum we call ~init here since that is the exterior interface.

let rec fold l ~init ~f = (* Note: e.g. ~f is **declaring** a named argument f; ~f is shorthand for ~f:f *)
  match l with
  | [] -> init
  | elt::elts -> fold elts ~init:(f init elt) ~f (*observe f is invoked **before** the call -- accumulating left-first *)

Summarize by looking at the types

The type of List.fold is 'a list -> init:'acc -> f:('acc -> 'a -> 'acc) -> 'acc
- The 'a here is the type of the list elements
- The 'acc is the type of the result being accumulated
The type of List.fold_right is 'a list -> f:('a -> 'acc -> 'acc) -> init:'acc -> 'acc
- (Notice that the arguments to f are swapped here compared to the fold left version)

More fold examples

Folding can encapsulate most simple recursions over lists, so it can implement many of the library functions.
We can for example implement List.exists above with map and fold:

let exists l ~f =
  l
  |> List.map ~f
  |> List.fold ~f:(||) ~init:false;; (* the List.map output is a list of booleans, just fold them up here *)
# exists ~f:(fun x -> x >= 0) [-1;-2];;
- : bool = false
# exists ~f:(fun x -> x >= 0) [1;-2];;
- : bool = true

In fact we can do this in one pass with just a fold:

let exists l ~f = 
  List.fold l ~f:(fun accum elt -> accum || f elt) ~init:false;;

Which hints that map itself is definable with a fold; we accumulate a new list here:

let map l ~f = List.fold ~f:(fun accum elt -> accum @ [f elt]) ~init:[] l

If you wanted to use fold_right to build map it would be similar:

let map_right l ~f = List.fold_right ~f:(fun elt accum -> (f elt) :: accum) ~init:[] l;;

Note that map_right is much more efficient, :: takes unit time and @ is linear in size of left list.

Folding and efficiency

Let us review left vs right folding and reflect on efficiency. Here are two implementations similar to the above ones:

let rec fold_right ~f l ~init =
  match l with
  | [] -> init
  | hd::tl -> f hd (fold_right ~f tl ~init) (* observe it is invoking f **after** the recursive call *)

let rec fold_left l ~init ~f =
  match l with
  | [] -> init
  | hd::tl -> fold tl ~init:(f init hd) ~f (*observe f is invoked **before** the call -- accumulating left-first *)

Note that the first parameter to f in fold_left is the accumulated value passed down and the second parameter is the current list value
In fold_right on the other hand the f computation happens after the recursive call is complete.
Fold left/right are good example contrasts of how you can accumulate a value up (fold_right) vs down (fold_left) the recursion

fold_left is in fact more efficient than fold_right so it is preferred all things being equal:

Observe how the value of the fold_left function above is what is directly returned from the base case, it bubbles all the way out
Such a function is tail recursive: there is no work to do after the (sole) recursive call finishes
The compiler doesn’t need to use a call stack for such functions since nothing happens upon return
- there is nothing it needs to mark as a point to go back to
- so, it replaces push/pop with jumps in and one jump out when done – its just a loop.
Important when lists get really long that you don’t use stack unless required.
Observe that fold_right is not tail recursive, so it needs the stack and will be slower

`fold_until`

Let us end on perhaps the most powerful List combinator of all, fold_until.
This is an extention to fold adding the functionality of break of C etc looping but in a functional style.

let summate_til_zero l =
  List.fold_until l ~init:0
    ~f:(fun acc i -> match i, acc with
        | 0, sum -> Stop sum
        | _, sum -> Continue (i + sum))
    ~finish:Fn.id
let stz_example = summate_til_zero [1;2;3;4;0;5;6;7;8;9;10]

The Stop variant is like break, here take sum as the final value
Continue wraps the continue-folding case, which adds i to running sum here.
~finish can post-process the result if the Stop case was not hit; Fn.id is fun x -> x, no additional processing here.
- The ~finish exists so that the Stop and Continue cases can hold different types: Continue holds an 'acc, and Stop holds a 'final.
- If we continued until the very end of the list, we need to tell it how to turn an 'acc into a 'final.
- In summate_til_zero, both were int, so we can do nothing.

FPSE