Variants
- Variants build or-data (this-or-this-or-this); records build and-data (this-and-this-and-this)
- They are the fundamental data constructors
- We start with variants
Variants
- The
option
andresult
types we have been using are simple forms of variant types - Variants let your data be one of several forms (either-or), with a label wrapping the data indicating the specific form
- They are related to
union
types in C orenums
in Java, but are more safe than C and more general than Java - Like OCaml lists and tuples they are by default immutable
Example variant type for doing mixed arithmetic (integers and floats)
type ff_num = Fixed of int | Floating of float;; (* read "|" as "or" *)
Fixed 5;; (* tag 5 as a Fixed *)
Floating 4.0;; (* tag 4.0 as a Floating *)
- Each case of the variant is wrapped with a constructor which serves for both
- constructing values of the variant type
- inspecting them by pattern matching
- Constructors must start with a Capital Letter to distinguish from variables
- Variants must be declared but once declared type inference can infer them.
- The
of
indicates what type is under the wrapper - Note constructors look like functions but they are not – you always need to give the argument
let ff_as_int x =
match x with
| Fixed n -> n (* variants fit well into pattern matching syntax *)
| Floating z -> int_of_float z;;
ff_as_int (Fixed 5);;
A non-trivial function using the above variant type
let ff_add n1 n2 =
match n1, n2 with (* note use of pair here to parallel-match on two variables *)
| Fixed i1, Fixed i2 -> Fixed (i1 + i2)
| Fixed i1, Floating f2 -> Floating(float i1 +. f2) (* need to coerce *)
| Floating f1, Fixed i2 -> Floating(f1 +. float i2) (* ditto *)
| Floating f1, Floating f2 -> Floating(f1 +. f2)
;;
ff_add (Fixed 123) (Floating 3.14159);;
- No data item? leave off the
of
. - Multiple data items in a single variant clause? Wrap with a tuple (or record once we cover them):
type complex = CZero | Nonzero of float * float;;
let com = Nonzero(3.2,11.2);;
let zer = CZero;;
let ocaml_annoyance = Fn.id Nonzero(3.2,11.2);; (* this is a parsing error, it views as (Fn.id Nonzero)(3.2,11.2) *)
let ocaml_annoyance = Fn.id @@ Nonzero(3.2,11.2);; (* so use @@ instead of " " *)
An Example of Variants plus List. libraries
- Here is a small Hamming distance calculator for DNA.
- Observe the
[@@deriving equal]
, this is a macro (called a “ppx extension” in OCaml) - It automatically generates a function
equal_nucleotide
(equal_the-types-name-here
in general) - You will need to use this with
Core
since regular=
will not work onnucleotide
s.
(* Example derived from
https://exercism.io/tracks/ocaml/exercises/hamming/solutions/afce117bfacb41cebe5c6ebb5e07e7ca
This code needs a #require "ppx_jane";; in top loop to load ppx extension for @@deriving equal
Or, in a dune file it will need (preprocess (pps ppx_deriving.eq)) added to the library decl *)
type nucleotide = A | C | G | T [@@deriving equal]
let hamming_distance (left : nucleotide list) (right : nucleotide list) : ((int, string) result)=
match List.zip left right with (* recall this returns Ok(list) or Unequal_lengths, another variant *)
| List.Or_unequal_lengths.Unequal_lengths -> Error "left and right strands must be of equal length"
| List.Or_unequal_lengths.Ok l ->
l
|> List.filter ~f:(fun (a,b) -> not (equal_nucleotide a b))
|> List.length
|> fun x -> Ok(x) (* Unfortunately we can't just pipe to `Ok` since `Ok` is not a function in OCaml - make it one here *)
let hamm_example = hamming_distance [A;A;C;A;T;T] [A;A;G;A;C;T]
Now let’s use fold
instead of filter
/length
let hamming_distance (left : nucleotide list) (right : nucleotide list) : ((int, string) result)=
match List.zip left right with
| List.Or_unequal_lengths.Unequal_lengths -> Error "left and right strands must be of equal length"
| List.Or_unequal_lengths.Ok (l) ->
l
|> List.fold ~init:0 ~f:(fun accum (a,b) -> accum + if (equal_nucleotide a b) then 0 else 1)
|> fun x -> Ok(x)
Parametric variant types
We have used several of these but just have not looked at the type so carefully.
Here is the system’s declaration of the option
type – the #show_type
top loop directive (or just #show
) will print it:
# #show_type option;;
type 'a option = None | Some of 'a
- The
'a
here is a parameter, which gets filled in by a concrete type to make an actual type. - e.g.
Some("hello") : string option
– thestring
fills in the parameter'a
None : 'a option
type means instantiate the'a
parameter with polymorphic/generic'a
- This may have been more clear if OCaml used function notation for these types, e.g.
type option('a) = None | Some of 'a
andSome("hello") : option(string)
And here is result
:
# #show_type result;;
type ('a, 'b) result = ('a, 'b) result = Ok of 'a | Error of 'b
- Same idea but a pair of type parameters;
'b
is the type of theError
. - Observe
Ok(4) : (int, 'a) result
andError("bad") : ('a, string) result
Lastly, List.zip
has a special type for its return value which is very similar to option
:
# #show_type List.Or_unequal_lengths.t;;
type 'a t = 'a List.Or_unequal_lengths.t = Ok of 'a | Unequal_lengths
Recursive data structures
- A common use of variant types is to build self-referential data structures
- Functional programming is fantastic for computing over tree-structured data
- Recursive types can refer to themselves in their own definition
- similar in spirit to how C structs can be recursive (but, no pointers needed here)
-
Unlike with functions, no need for
rec
- Homebrew lists as a warm-up - the built-in
list
type is in fact not needed- Note you will never need to do this, just use the built-in ones! This example is just for understanding.
type 'a homebrew_list = Mt | Cons of 'a * 'a homebrew_list;;
let hb_eg = Cons(3,Cons(5,Cons(7,Mt)));; (* analogous to 3 :: 5 :: 7 :: [] = [3;5;7] *)
Coding over homebrew lists is nearly identical to built-in lists.
let rec homebrew_map (ml : 'a homebrew_list) ~(f : 'a -> 'b) : ('b homebrew_list) =
match ml with
| Mt -> Mt
| Cons(hd,tl) -> Cons(f hd,homebrew_map tl ~f)
let map_eg = homebrew_map (Cons(3,Cons(5,Cons(7,Mt)))) ~f:(fun x -> x - 1)
Lets look at the built-in list
type:
# #show_type list;;
type 'a list = [] | (::) of 'a * 'a list
Looks just like our homebrew one other than names for the components
Binary trees
- Binary trees are like lists but with two self-referential sub-structures instead of one
- Binary trees also show how arbitrary recursive variants work; same idea but more variants.
- Here is a tree with data in the intermediate nodes but not in the leaves.
type 'a bin_tree = Leaf | Node of 'a * 'a bin_tree * 'a bin_tree
Here are some simple example trees
let bt0 = Node("whack!",Leaf, Leaf);;
let bt1 = Node("fiddly ",
Node("backer ",
Leaf,
Node("crack ",
Leaf,
Leaf)),
bt0);;
let bt2 = Node("fiddly ",
Node("backer ",
Leaf,
Node("crack ",
Leaf,
Leaf)),
bt0);;
(* Type error, like list, must have uniform type: *)
Node("fiddly",Node(0,Leaf,Leaf),Leaf);;
Combinators for Binary Trees
- Since lists are built-in we get a massive library of functions on them.
- For these binary trees (and in general for whatever variant types you roll yourself) there is no such luxury.
- This is because there is no single canonical form of tree, there are many different kinds of trees used
- Still, that doesn’t mean you should just code everything by recursing over the tree. Instead
- Define the combinators you need (maps, folds, node counts, etc.) using
let rec
- Use your combinators without needing
let rec
- Define the combinators you need (maps, folds, node counts, etc.) using
- Here is a simple recursive function over binary trees for example:
let rec add_gobble binstringtree = match binstringtree with | Leaf -> Leaf | Node(y, left, right) -> Node(y^"gobble",add_gobble left,add_gobble right)
- (Remember, as with lists this is not mutating the tree, its building a new one)
- Observe: this is an instance of the general operation of building a tree with same structure but applying an operation on each node value
- i.e. it is a map operation over a tree. Let us code
map
and use it to add gobbles.
let rec map (tree : 'a bin_tree) ~(f : 'a -> 'b) : ('b bin_tree) =
match tree with
| Leaf -> Leaf
| Node(y, left, right) ->
Node(f y,map ~f left,map ~f right)
(* using tree map to make a non-recursive add_gobble *)
let add_gobble tree = map ~f:(fun s -> s ^ "gobble") tree
- Fold is also natural on binary trees, apply operation f to node value and each subtree result.
- This is a fold right (post-processed), folding left on a tree isn’t so useful because there are two subtrees to go down in to.
let rec fold (tree : 'a bin_tree) ~(f : 'a -> 'acc -> 'acc -> 'acc) ~(leaf : 'acc) : 'acc =
match tree with
| Leaf -> leaf
| Node(y, left, right) ->
f y (fold ~f ~leaf left) (fold ~f ~leaf right)
(* using tree fold *)
let int_summate tree = fold ~f:(fun elt laccum raccum -> elt + laccum + raccum) ~leaf:0 tree;;
int_summate @@ Node(3,Node(1,Leaf,Node(2,Leaf,Leaf)),Leaf);;
(* fold can also do map-like operations - the folder can return a tree *)
let inc_nodes tree = fold ~f:(fun elt la ra -> Node(elt+1,la,ra)) ~leaf:Leaf tree;;
- Many of the other
List
functions have analogues on binary trees and recursive variants in generallength
(size
ordepth
for a tree),forall
,exists
,filter
(filter out a subtree), etc etc.
- For some operations we need to know how to compare the tree elements,
- e.g. if it is a binary (sorted) tree an insertion requires comparison
- For integers at least this is easy as we have
<=
:
let rec insert_int (x : int) (bt : int bin_tree) : (int bin_tree) =
match bt with
| Leaf -> Node(x, Leaf, Leaf)
| Node(y, left, right) ->
if x <= y then Node(y, insert_int x left, right)
else Node(y, left, insert_int x right)
;;
- Like list operations this is not mutating – it returns a whole new tree.
- Well, recall for lists that if we have list
l
then0 :: l
can share thel
due to immutability - So, for here, only one path through tree is not shared: on average only log n new nodes need to be made. More later in lecture on efficiency.
let bt' = insert_int 4 bt;;
let bt'' = insert_int 0 bt';; (* thread in the most recent tree into subsequent insert *)
- For non-integers , we need to explicitly supply any equal or comparison function.
- recall
=
inCore
works on integers only.
- recall
- Library functions needing to compare will in fact take a comparision operation as argument
- For example in the
List
library, theList.sort
function - Here is an example of how to sort a string list with
List.sort
:
List.sort ["Zoo";"Hey";"Abba"] ~compare:(String.compare);; (* pass string's comparison function as argument *)
(* insight into OCaml expected behavior for compare: *)
# String.compare "Ahh" "Ahh";; (* = returns 0 : equal *)
- : int = 0
# String.compare "Ahh" "Bee";; (* < returns -1 : less *)
- : int = -1
# String.compare "Ahh" "Ack";; (* > returns 1 : greater *)
- : int = 1
So, our more general tree insert should follow the lead of List.sort
:
let rec insert x bt ~compare =
match bt with
| Leaf -> Node(x, Leaf, Leaf)
| Node(y, left, right) ->
if (compare x y) <= 0 then Node(y, insert x left compare, right)
else Node(y, left, insert x right compare)
;;
let bt' = insert 4 bt ~compare:(Int.compare);;
- In general all the built-in types have both
compare
andequal
(which is same as(=)
) defined - Define your own compare/equal for your own types if you need it
- Appending
[@@ppx_deriving equal]
to type decl as we saw above in Hamming DNA example will automatically define functionequal_mytype
for your typemytype
- Appending
[@@ppx_deriving compare]
is similar but will define functioncompare_mytype
. - Appending
[@@ppx_deriving equal compare]
will get both
Polymorphic Variants Briefly
- OCaml has an additional form of variant which has different syntax and is overlapping in uses: polymorphic variants
- A better term would be “inferred variants” - you don’t need to declare them via
type
.
# `Zinger(3);; (* prefix constructors with a backtick for the inferred variants *)
- : [> `Zinger of int ] = `Zinger 3
- This looks a bit useless, it inferred a 1-ary variant type
- But the “
>
” in the type means there could be other variants showing up in the future.
# let f b = if b then [`Zinger 3] else [`Zanger "hi"];;
val f : bool -> [> `Zanger of string | `Zinger of int ] list = <fun>
- We can of course pattern match as well:
# let zing_zang z =
match z with
| `Zinger n -> "zing! "^(Int.to_string n)
| `Zanger s -> "zang! "^s
val zing_zang : [< `Zanger of string | `Zinger of int ] -> string = <fun>
Observe how the type now has a <
instead of a >
; the meaning is it is those fields or fewer.
# zing_zang @@ `Zanger "wow";;
- : string = "zang! wow"
# zing_zang @@ `Zuber 1.2;;
Line 1, characters 13-23:
Error: This expression has type [> `Zuber of float ]
but an expression was expected of type
[< `Zanger of string | `Zinger of int ]
The second variant type does not allow tag(s) `Zuber
- Generally you should use the non-polymorphic form by default
- The main advantage of the polymorphic form is sharing tags amongst different types
- regular variants like
Ok(4)
must be in only one type,result
forOk
inCore
- variants like
`Zanger "f"
can be in[> `Zanger of string ]
,[> `Zanger of string | `Zinger of int ]
, etc - really OCaml should just have one form; the two forms are historical baggage.
- regular variants like