Introduction to Caml

These lectures cover

The Basics
Simple built-in types
Functions (including Patterns and Higher-Order Functions)
Declared types: variants and records, etc
Imperative features: references, arrays, exceptions
Modules
Separate compilation

The Transcripts of Caml sessions run in class are available.

Background on ML

ML is

A language developed by the programming languages research community (Robin Milner; Dave MacQueen; Xavier Leroy...)
Is an "extended functional" language: has imperative features built on a functional core. (pure functional = functions only; no assignment or other mutation operator in language)
Particularly useful for metaprogramming (writing programs that manipulate programs, e.g. for compilers, interpreters etc).
That was original purpose, "ML" = "MetaLanguage".
Most current compilers run too slowly for the language to be feasible for widespread use (but, OCaml compiler is pretty darn good).

Versions of ML

classic ML: Milner's original creation
Standard ML (SML): an early 90's standard
SML '97: revision of early 90's standard
(O)Caml: French variant on classic ML; includes classes and objects too

We are using OCaml because it is currently the best-supported version of ML and it has a very fast compiler. Unfortunately almost all the English books are the Standard ML dialect.

ML Novelties

Has some novel features relative to Java (and C++).

all variables are themselves immutable
--their values are fixed.
everything is an expression (commands also return values)
completely higher-order functions
--functions can be defined anywhere in the code, passed as arguments, and returned as values.
dynamic top-loop programming environment (a la Lisp/Scheme/Smalltalk)
--can type/compile/load little bits of new code into the system incrementally.
automated inference of most type declarations,
data pattern matching,
parametric polymorphism,
functors (higher-order modules), ...

These are all conceptually more advanced ideas.

Caml Basics

Read the Core Language section of the manual to suppplement these notes. For more depth and examples, read (mainly) Chapter 2 of the online O'Reilly Book.

The top loop

The concept of a top loop is found in many languages: Lisp, Scheme, Smalltalk, ML
OCaml is a top-loop based system;
It is an interactive compile-load-run-printResult loop: type in some code, and its directly compiled, loaded, and run, and the result printed. Repeat.
This code is loaded into the context there from the previous compile-load-run commands typed into ocaml, so some variables may already have values.
Lisp, Scheme and Smaltalk have a stateful top-loop and Caml has a declarative top-loop; this subtlety we will deal with later.

OCaml's top loop:

Commands are typed in to the top loop, with the system providing a "#" prompt.
The dual semicolon ";;" signifies the end of input to the top loop, which the system will immediately process.
Note, when you use the emacs Caml mode, you "submit" the code you select to the top loop, and need not directly type it in at the prompt.

Here is a Caml top-loop session, starting from a UNIX prompt. (In these notes, blue typewriter font is typed by the user, and maroon typewriter font is the computer reply.)

% ocaml
        Objective Caml version 3.06

# let x = 3+4;;
val x : int = 7

Here the incredibly simple OCaml program let x = 3+4 is compiled, loaded, run, and the result printed.
val x : int = 7 means "the return value of the expression is 7", and int is the (automatically inferred) type of the return value.

# x+4;;
- : int = 11

The system knows now that x was declared to be 7, from the previous entry into the top loop.
But, the result 11 is not put in any variable (the - indicates this anonymous variable). Variables must be lower-case in Caml.

let syntax allows for local variable declarations in Caml:

# let x = 4 in x+3;;
- : int = 7

Declarations typed in at the top level are like an open-ended let:


# let x=4;;
val x : int = 4
# let y = x+3;;
val y : int = 7
# x*x;;
- : int = 16

Notice how the types are being inferred for us (pretty simple to do here but harder for more complex programs).

Simple Types

int, float, string, char, bool types:

4, 4.3, "hi", 'c', true

These types have all the standard kinds of operations on them, in the libraries.

Libraries

The Core library contains the more core operations on basic types: + - mod abs ** sqrt && || <> etc
- You will need to look there to see what the operator is to do what you want.
- Technically, all these core operators are defined in the OCaml module Pervasives which is always loaded. We will cover OCaml modules later.
Other less basic operators are found in the Standard Library;
- There are multiple modules here such as Array Stream List, etc.
- To use these operators, you have to refer to entities via dot notation including the module name, for example Char.code for the char -> int ASCII code function.
```
# Char.code 'a';;
- : int = 97
```

Int/Real non-Overloading

Caml is pure in that it does not overload the meaning +/* etc to work on integers and floats.

# 2.3 * 5.7;;
Characters 6-9:
This expression has type float but is here used with type int

# 2.3 *. 5.7;;
- : float = 13.110000

Caml also never performs implicit coercions, all coercions must be explicit.

# 2.3 *. 5;;
Characters 7-8:
This expression has type int but is here used with type float

Why is Caml doing this?

It is being precise about the types of arithmetic.
The original screw-up was made by sloppy mathematicians years ago who overloaded these symbols.
Please raise your hand and bitch in your next math class :-)

Caml is Expression-based

Caml is expression-based, there are no pure "commands" like in Java/C++; instead, commands are also expressions, they return values.

# (if (2=3) then 5 else 6) + 1;;
- : int = 7

One sometimes annoying consequence of the above is the two branches of the if need to return the same type.

if (2=3) then 5 else 6.5;;
Characters 21-24:
This expression has type float but is here used with type int

Let

let allows local declarations to be defined.

let x = ... in .... is somewhat analogous to local variable definition


{
int x = ...;
...
}

in C, but the Caml variable x is given a value that is immutable (can never change).

The top-loop is analogous to an open-ended let: each entry defines a new nested let-scope with a new declaration, and the scope can never be closed.

Built-in simple datatypes: lists, tuples

Caml's built-in list/tuple data objects are implicitly allocated for you: no need to malloc/free them. A Garbage collector automatically collects them when they are unreachable and thus no longer used.

Lists

Lists are ... lists of Caml values. Defining a new list is a triviality, even easier than in Java.

# [2;1+2;4];;
- : int list = [2; 3; 4]

This automatically allocates space for the list and puts in the elements.
Caml is garbage-collected like Java so no explicit de-allocation is needed.
Notice how the type, int list in this case, is inferred automatically.
All elements of a list must be of the same type.

# ["e"; String.concat "" ["f";"g"] ;"h"];;
- : string list = ["e"; "fg"; "h"]

Notice how the function call

String.concat
["f";"g"]

does not require ( ... ) around the function's arguments, and how they are space- and not comma-separated. Thats because

You don't need parentheses for function arguments in OCaml: sin 0.3
Multiple arguments can be passed in Curried form which means they are separated by spaces: max 3 4.
This is a deep issue, more on it later.

Lists must be uniform in their type ("homogenous").

# [3;"e"];;
Characters 3-6:
This expression has type string but is here used with type int

List operations are numerous.

# let x = [2;3];;
val x : int list = [2; 3]
# let y = 1::x;; (* this is called "consing" an element on to a list *)
val y : int list = [1; 2; 3]
# x;;
val x : int list = [2; 3] (* y is a NEW list; the
list x is IMMUTABLE  and didn't change *)
# x @ y;;        (* appending lists *)
- : int list = [2; 3; 1; 2; 3]
# List.hd x;; (* head: first element of a list *)
- : int = 2
# List.tl x;;  (* the tail or rest of a list *)
- : int list = [3]

The hd/tl operations are not in the core library (the module name must be used when referring to them) because you should generally not use them. (And, we will penalize any use of them in homeworks). Instead, use pattern matching:

# match x with h::t -> h;;
Characters 0-22:
Warning: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
[]
- : int = 2

The match forces x to match pattern h::t, a list with head and tail, and then we grab the head h.
The warning indicates we didn't match the case of an empty list, which will cause List.hd to blow up (raise an excaption).
List.hd/List.tl are bad because you really should deal with all possible patterns (more later on that).

Tail is very similar:

# match x with h::t -> t;;
Characters 0-22:
Warning: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
[]
- : int list = [3]

Tuples

Tupes are fixed-length lists, but the fields may be of differing types ("heterogenous").

# let x = (2,"hi");;
val x : int * string = 2, "hi"
# match x with (y,z) -> y;;
- : int = 2

Note tuple type syntax: int * string, etc.

All the data mentioned so far is immutable - it is impossible to change an entry in an existing list, tuple, or record!
Also, all variables are immutable. Thus the language described so far is a pure functional subset of ML.
Mutable features will be discussed later.

Caml Functions

Here is a simple recursive fibonnaci function definition.

#let rec fib n =
   if n < 2 then 1 else fib(n-1) + fib(n-2);;
val fib : int -> int = <fun>
#   fib 33;;
- : int = 5702887

fib

n

The optional rec keyword must be added to a recursive function. If there is no rec, the function can't call itself. n here is the argument, which also doesn't need to be in parens.
When you invoke a function you don't need ( .. ) around the argument:
```
sin 0.34;;
```
There is no return statement; instead, the value of the whole body-expression is implicitly what gets returned.
Function types are grammatically printed and read in the form "domain -> range" Their values (yes, functions are expressions with values, too!) are always printed as <fun>.
"officially", Caml functions can take only one argument(!) Multi-argument functions are possible by some tricks to be explained below.

Other important aspects of Caml functions

Functions can be defined anywhere in the code via function expressions:
```
# ((function x -> x+1) 4) + 7;;
- : int = 12
```
```
# let f =
      (function x -> x+1;; (* identical to "let f x = x+1;;" *)
      val f : int -> int = <fun>
```
These anonymos functions (they aren't given names) end up being very useful as we will see.
Functions can be passed to and returned from functions
In immutable programming, for/while loops are useless (they either loop never or forever since the test is the same) and so recursion is the only means of iteration.

Multiple-argument functions are not built-in; generally use Currying to define them.

# let rec comb n m = (* assumes 0 <= m <= n *)
                        if m=0 or m=n then 1
                        else comb (n-1) m + comb (n-1) (m-1);;
   val comb : int -> int -> int = <fun>
# comb 10 4;;
- : int = 210

Look at the type of comb: int -> int -> int which is int -> (int -> int): it takes an integer and returns a function which expects another integer! Lets test this:

# let comb10 = comb 10;;
val comb10 : int -> int = <fun>
# comb10 4;;
- : int = 210
# comb10 3;;
- : int = 120

Indeed, we can give comb only one argument, in which case it returns a function that we can later use. More on Currying below.

Mutually recursive functions must be defined simultaneously:

let rec
        take(l) = match l with [] -> []
                | hd :: tl ->  hd::skip tl
and
        skip(l) = match l with [] -> []
                | hd :: tl -> take tl;;
          val take : 'a list -> 'a list = <fun>
val skip : 'a list -> 'a list = <fun>
# take [1;2;3;4;5;6;7;8;9;10];;
- : int list = [1; 3; 5; 7; 9]
# skip [1;2;3;4;5;6;7;8;9;10];;
- : int list = [2; 4; 6; 8; 10]

This example also shows a pattern match with multiple cases, either empty list or nonempty list. More on patterns now.

Patterns

Patterns make function definitions much more succinct, as we just saw.


let rec rev l =
  match l with [] -> []
  |   x::xs -> rev xs @ [x];;

The pattern matching process

In this function definition, [] and x::xs are patterns against which the value passed to the function is matched.
[] matches any empty list argument. x::xs matches any list, and binds x to the head and xs to the tail.

You can only use so-called constructors in a pattern ((x,4), 3::y, {e = 4, f = x}).
Patterns can be deep: x::y::z.
"|" separates the different possible patterns.

First successful match is taken if more than one pattern matches

# match [1;2;3] with x::y -> true
      | x::y::z -> false
      | [] -> true;;
# match [1;2;3] with x::y -> true
        | x::y::z -> false
          ^^^^^^^
        | [] -> true;;
Warning: this match case is unused.
- : bool = true

Warning generated at compile time if patterns don't cover all possibilities. We saw this earlier with our hd/tl implementations.

# let myhd x = match x with x::y -> x;;
Toplevel input:
# let myhd x = match x with x::y -> x;;
             ^^^^^^^^^^^^^^^^^^^^^^
Warning: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
[]
val myhd : 'a list -> 'a = <fun>

Exception generated at run time if no match possible (only happens when previous warning generated)
```
# myhd [];;
Exception: Match_failure ("", 11, 33).
```

_ is an anonymous pattern that matches anything; for something thrown away.

# let x = [1;2];;
val x : int list = [1; 2]
#  match x with x::y::z::w -> 5
       | _ -> 7;;
  - : int = 7

Patterns also can be defined in let to attribute values to multiple variables:


let l = [1;2;3;4;5;6;7;8;9;10];;
val l : int list = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10]
# let (evens,odds) = (skip l, take l);; (* skip and take defined above *)
val evens : int list = [2; 4; 6; 8; 10]
val odds : int list = [1; 3; 5; 7; 9]

Similarly patterns can be used in function definitions.


# let add (x,y) = x + y;;
val add : int * int -> int = <fun>
# add (2,3);;
- : int = 5

This looks like a function of two arguments, but its a function of one argument which matches a pair pattern. Note, in Caml it is better to use Curried function definitions for multiple-argument functions, not tuples.

Immutable Declarations

Important feature of let-defined variable values in Caml: they cannot change their value later.
Helps in reasoning about programs---we know the variable's value is fixed.
Smalltalk also forces method arguments to be immutable; C++'s const and Java's final on fields has a similar effect.

Don't forget let is immutable; its bound to screw you up at some point.


let x = 5 in
  let f y = x + 1 in
    let x = 7 in f 0 ;;
- : int = 6 (* old value of x is what f refers to *)

Here's the one that will mess with your mind: the same thing as above but with the declarations typed into the top loop (the top loop is conceptually an open-ended series of lets which never close).

# let x = 5;;
val x : int = 5
# let f y = x + 1;;
val f : 'a -> int = <fun>
# f 0;;
- : int = 6
# let x = 7;; (* not an assignment to above x -- a new declaration *)
val x : int = 7
# f 0;;
- : int = 6

Programming moral: When interactively editing a group of functions that call each other, re-submit all of the functions to the top loop when you change any one of them.
Here is another example of let:


let hundredthPower x =
   let four = x*.x*.x*.x in
   let twenty = four*.four*.four*.four*.four in
        twenty*.twenty*.twenty*.twenty*.twenty;;
          val hundredthPower : float -> float = <fun>
# hundredthPower(2.0);;
- : float = 1.26765060023e+30

Higher-Order Functions

ML is highly tuned to allowing higher-order functions, functions that either take other functions as argument or return functions as results, or both.

Higher-order functions are an important component of a programmer's toolkit.

It allows for "pluggable" programming by passing in and out chunks of code.
Many new programming design patterns are possible.
It greatly increases the reusability of code.

The classic example of a function that takes another function as argument is the map function on lists. It takes a list and a function and applies the function to every element of the list.

let rec map f l =
match l with []    -> []
      |      x::xs -> f(x) :: map f xs;;
val map : ('a -> 'b) -> 'a list -> 'b list = <fun>

The 'a/'b types are polymorphic ("any") type, more on them below.

# map (function x -> x*10) [4;2;7];;
- : int list = [40; 20; 70]

map is so common it is built into Caml as List.map.

Perhaps the simplest higher-order function is the composer, in mathematics expressed as g o f. it takes two functions and returns a new function which is their composition:


let circle g f = (function x -> g(f(x)));;
val circle : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b = <fun>

Here is an example of circle in action.

# let plus3times2 = circle (function x -> x*2) (function x -> x+3);;
val plus3times2 : int -> int = <fun>
# plus3times2 10;;
- : int = 26

As we have seen before, functions are just expressions so can also be immediately applied after being defined:


# circle (function x -> x*2) (function x -> x+3) 10;;
- : int = 26

Oddly enough, this looks like application of a three-argument function circle, but circle only takes two arguments! All three arguments to circle are in fact Curried: one is applied, the result is a function, and then that function is immediately applied to the next argument. Let us focus on this topic now.

Currying

Currying is an important concept of functional programming; it is named after logician Haskell Curry. Multi-argument functions as defined thus far are Curried, lets look at what is really happening.

Here is a two-argument function defined in our usual manner.


# let myadd x y = x + y;;
val myadd : int -> int -> int = <fun>
# myadd 3 4;;
- : int = 7

Here is another completely equivalent way to define the same function:


# let myadd x =
  function y  -> x + y;;
  val myadd : int -> int -> int = <fun>
    (* the -> type constructor associates to the RIGHT: int -> (int -> int) *)
# myadd 3 4;; (* parenthesized as (myadd 3) 4 *)
- : int = 7

# let inc3 = myadd 3;;
val inc3 : int -> int = <fun>

# inc3 4;;
- : int = 7 (* same result as myadd 3 4 in the end *)

The main observation is myadd is a function returning a function, so the way we supply two arguments is

invoke the function, get a function back
then invoke the returned function passing the second argument.
Our final value is returned, victory.
(myadd 3) 4 is an inlined version of this where the function returned by myadd 3 is not put in any variable

Here is a third equivalent way to define myadd, as an anonymous function returning another anonymous function.


#let myadd = function x -> function y -> x + y;;
val myadd : int -> int -> int = <fun>

With Currying, all functions "really" take exactly one argument
Currying also naturally arises when functions return functions, as in the map application above showed.
Multiple-argument functions should always be written in curried form; all the library functions are curried.
```
# List.map;;
- : ('a -> 'b) -> 'a list -> 'b list = <fun>
```

Note thus far we have curried only two-argument functions; in general, n-argument currying is possible.

Functions can also take pairs as arguments to achieve the effect of a two-argument function:

# let mypairadd (x,y) = x+y;;
val mypairadd : int * int -> int = <fun>
# mypairadd (2,3);;
- : int = 5

So, either we can Curry or we can pass a pair. We can also write higher-order functions to switch back and forth between the two forms.

# let curry f = function x -> function y -> f (x,y);;
val curry : ('a * 'b -> 'c) -> 'a -> 'b -> 'c = <fun>
# let uncurry f = function (x,y) -> f x y;;
val uncurry : ('a -> 'b -> 'c) -> 'a * 'b -> 'c = <fun>
# uncurry myadd;;
- : int * int -> int = <fun>
# curry mypairadd;;
- : int -> int -> int =  <fun>
# uncurry map;; (* map defined above *)

- : ('_a -> '_b) -> '_a list -> '_b list = <fun>

# curry(uncurry myadd);; (* a no-op *)
- : int -> int -> int = <fun>

Look at the types: these mappings in both directions in some sense "implement" the well-known isomorphism on sets: A * B -> C = A -> B -> C

A bigger example

Here is a more high-powered example of the use of currying.


# let rec foldr f l y =
match l with [] -> y 
  |   x::xs -> f x (foldr f xs y);;
     val foldr : ('a -> 'b -> 'b) -> 'a list -> 'b -> 'b = <fun>
# let prod = foldr (function a -> function x -> a * x);;
val prod : int list -> int -> int = <fun>
# let prod0 = prod [1;2;3;4];;
val prod0 : int -> int = <fun>
# (prod0 1, prod0 2);;
- : int * int = 24, 48

Here is an analysis of this recursive function.
for the arbitrary 2-element list [x1;x2], the call

foldr f [x1; x2] y

computes to

f x1 (foldr f [x2]
y)

which in turn computes to

f x1 (f x2 (foldr f []
y)))

which computes to

f x1 (f x2 y)

From this we can assert that the general result returned from foldr f [x1;x2;...;xn] y is


f x1 (f x2  f ...(f xn y)...))))

Currying allows us to specialize foldr to a particular function f, as with prod above.

Proving program properties by induction

We should in fact be able to prove this property by induction. Its easier if we reverse the numbering of the list.

Lemma. foldr f [xn;...;x1] y computes to f xn (f xn-1 f ...(f x1 y)...))) for n greater than 0.
Proof. Proceed by induction on the length of the list [xn;..;x1].
Base Case n=1, i.e. the list is [x1]. The function computes to f x1 (foldr f [] y) which computes to f x1 y as hypothesized.
Induction Step. Assume

foldr f [xn;...;x1] y

computes to

f xn (f xn-1  f ...(f x1 y)...)))

and show

foldr f [xn+1;xn;...;x1] y

computes to

f xn+1 (f xn  f ...(f x1 y)...))))

Computing

foldr f [x1;x2;...;xn;xn+1] y

, it matches the pattern with x being xn+1 and xs being [xn;...;x1].
Thus the recursive call is

foldr f [xn;...;x1] y

which by our inductive assumption computes to

f xn (f xn-1  f ...(f x1 y)...)))

And, given this result for the recursive call, the whole function then returns

f xn+1 (...result of recursive call...)

which is

f xn+1 (f xn (f xn-1  f ...(f x1 y)...)))

which is what we needed to show.
QED.

The above implementation is inefficient in that f is explicitly passed to every recursive call. Here is a more efficient version with identical functionality.

let efficientfoldr f =
 let rec localfun l y =
      match l with [] -> y 
      |   x::xs -> f x (localfun xs y)
 in localfun
;;

This function also illustrates how functions may be defined in a local scope. Observe localfun is defined locally but then exported since it is the return value of f.

Question: How does the return value localfun know where to look for f when its called??

# let summate = efficientfoldr (function a -> function x -> a+x);;
val summate : int list -> int -> int = <fun>
# summate [1;2;3;4] 0;;
- : int = 10

--summate is just localfun, but somehow it "knows" that f is (function a -> function x -> a+x), even though f is undefined at the top level:

# f;;
Characters 0-1:
Unbound value f

localfun in fact knew the right f to call, so it must have been kept somewhere: in a closure.

At function definition point, the current values of variables not local to the function definition are remembered in a closure.

Function values in ML are thus really a pair consisting of the function (pointer) and the closure.

Without making a closure, higher-order functions will do unexpected things. Java, C++, C can pass and return function (pointers), but all functions are defined at the top level so they have no closures.

Submitting functions to the top loop

You should never type code directly in the top loop! Its impossible to fix errors. Instead, you should edit in a file. There are several reasonable modes to interact with caml:

Use any editor, and save each group of interlinked functions in a separate file, for example "myfunctions.ml". Then, from the top loop type
```
# #use "myfunctions.ml";;
```
-- this will submit everything in the file to the top loop. Note its #use, not just use.
Use any editor, and copy-and-paste code into caml. This is great for smaller functions but eventually you want to use the method above.
(best if you are using UNIX; may even work under Windows now:) Use the emacs editor and the caml mode documented on the course emacs web page. Edit in the manner of 1. above, with groups of functions in a file. Then, use the
```
C-c C-e		caml-eval-phrase
C-c C-r		caml-eval-region
C-c C-s		caml-show-subshell
C-c `		caml-goto-phrase-error
```
commands of caml mode to submit the current definition (phrase) to caml, the selected region to caml, to show the actual shell window, and to find your errors respectively. When you are playing with little examples, just type a small bit in any emacs buffer and click on it and use the caml-eval-phrase command to send it to caml. caml-eval-phrase at the end of a file submits all the expressions in the file.

Print


print_string("hi\n");;
hi
- : unit = ()

Caml has a print_x function for the atomic types x. Again there is no overloading of meaning here.

Caml Types

We have generally been ignoring the type aspect of Caml up to now. Its time to focus on typing in more detail.

Type Declarations

Caml infers types for you, but you can add explicit type declarations if you like.


let myadd (x:int) (y:int) = x + y;;
val myadd : int -> int -> int = <fun>

You can in fact put type assertions on any variable in an expression to clarify what type the variable has:


let myadd (x:int) (y:int) = (x:int) + y;;
val myadd : int -> int -> int = <fun>

Type abbreviations

You can also make up your own name for any type

# type intpair = int*int;;
type intpair = int * int
# let f (p : intpair) = match p with (l,r) -> l+r;;
val f : intpair -> int = <fun>
# f (2,3);;
- : int = 5

Polymorphic Types and Type Inference


# let id x = x;;
val id : 'a -> 'a = <fun>
# id 3;;
- : int = 3
# id true;;
- : bool = true

Since id was not used as any type in particular, the type of the function is polymorphic ("many forms").
'a is a type variable, meaning some arbitrary type 'a.
Polymorphism is really needed with type inference -- inferring int -> int would not be completely general.

Parametric and object polymorphism

The form of polymorphism in ML is to be precise parametric polymorphism -- the type above is parametric in 'a: what comes out is the same type as what came in. Generics is another term for parametric polymorphism.
Java has no parametric polymorphism but does have object polymorphism (unfortunately this is often just called polymorphism by some writers) in that a subclass object can fit into a superclass-declared variable.
When you want parametric polymorphism in Java you declare the variable to be of type Object, but you have to cast when you get it out which requires a run-time check.
The Java JDK version 1.5 will have generic types in it.

The general intuition to have about the type inference algorithm is everything starts out as having arbitrary types 'a, 'b, etc, but then the operations infer constraints that "this thing has the same type as that thing".

Use of type-specific atomic operators obviously restricts polymorphism:

let doublenegate x = not (not x);;
val doublenegate = function : bool -> bool

When a function is defined via let to have polymorphic type, every use can be at a different type:


# let id = (function x -> x) in
    match id(true) with true -> id(3) | false -> id(4);;
  - : int = 3

Note that if the function is not declared via let, polymorphism is not allowed. The following code would run just like the code above (if it would typecheck):

# (function id -> match id(true) with true -> id(3) | false -> id(4))(function x -> x);;
                                                 ^
This expression has type int but is here used with type bool

Variant Type declarations

Variant types in ML are the analogue of union/variant types in C/Pascal. Following in the ML tradition of lists and tuples, they are not mutable.

ML variant types must be declared.
Why? Variant Types are often recursive, and recursive types cannot be inferred in ML.

Here is a really simple variant type declaration to get warmed up:


type height = Tall | Medium | Short
type height = Tall | Medium | Short

Three constructors have been defined. These are now official constants. Constructors must be capitalized, and variables must be lower-case in Caml.

Tall;;
- : height = Tall

The previous type is only an enumerated type. Much more interesting variant types can be defined. Lists could have been predefined as a variant type:


# type 'a mylist = Nil | Cons of 'a * 'a mylist;;
type 'a mylist = Nil | Cons of 'a * 'a mylist
# Cons (3, Cons (3+1,Nil));;
- : int mylist = Cons (3, Cons (4, Nil))

This form of type has several new features:

As in C/Pascal, the variants can have values and they can be recursively defined, plus,
polymorphic variant types can be defined; 'a here is a type argument.
Note how there is no need to use pointers in defining recursive variant types. The compiler does all that mucking around for you.
Also note how Cons is actually defined as a so-called constructor function.

Trees:


type 'a tree = EmptyTree | Node of 'a * 'a tree * 'a tree

Patterns are also variant type-constructor-friendly:


let rec myappend p =
 match p with (Nil,l2)  -> l2
     |  (Cons(x,xs),l2) -> Cons (x, append(xs,l2))
  val myappend : 'a mylist * 'a mylist -> 'a mylist = <fun>
# let rec myreverse l =
 match l with Nil  -> Nil
     |  Cons(x,xs) -> myappend(myreverse xs , Cons (x,Nil));;
val myreverse : 'a mylist -> 'a mylist = <fun>
# myreverse (Cons (3, Cons(4, Cons(2, Nil))));;
- : int mylist = Cons (2, Cons (4, Cons (3, Nil)))

Variant types are extremely happy with polymorphism; observe how myreverse above is polymorphic over 'a mylists.

Record Declations

Records are tuples with labels on fields.
They are very similar to strtucts of C/C++.
Their types are declared just like variants.
They can be used in pattern matches as well.

type onetwo =  {one : int; two : string};;
type onetwo = { one : int; two : string; }
# let x = {one = 2; two = "hi"};;
val x : onetwo = {one=2; two="hi"}
# x.one;;
- : int = 2
# match x with {one = x; two = s} -> x;;
- : int = 2

State

Variables in ML are never directly mutable themselves; they are only (indirectly) mutable if they hold an

array,
a reference,
or a mutable record.

Indirect mutability means the variable itself can't change, but what it points to can.
And, items are immutable unless their mutability is explicitly declared.

References are the simplest unit of mutability.

# let x = ref 4;;
val x : int ref = {contents=4}

This allocates a fresh mutable location named x. Records can have mutable fields, and so a reference is in fact implemented in Caml as a little record with one mutable field, contents:

#type 'a ref = { mutable contents: 'a };; (* a type abbreviation; its how ref is defined in OCaml *)
type 'a ref = { mutable contents : 'a; }

references are records so you can't directly operate on them.

# x+1;;
Characters 0-1:
This expression has type int ref but is here used with type int
# x.contents + 1;; (* x is a record *)
- : int = 5
# !x + 1;;    (* shorthand notation for the above *)
- : int = 5

ref variables may be modified by assignment:

# x.contents <- 6;;  (* mutate the contents record field *)
- : unit = ()
# x := 6;; (* equivalent shorthand notation for the above *)
- : unit = () (* (), the empty tuple of type unit, is the result *)
# !x + 1;;
- : int = 7

Only ref typed variables or mutable records may be assigned to. The notion of immutable variables is one of the great strengths of ML.

Note, let doesn't turn into a mutation operator with refs, either:


let x = ref 5;;
val x : int ref = {contents = 5}
let f () = !x;;
val f : unit -> int = 
let x = ref 6;; (* not an assignment to x *)
val x : int ref = {contents = 6}
f ();;
- : int = 5

Mutable records of general form can be created by putting mutable in front of mutable fields:


#type mutable_point = { mutable x: float; mutable y: float };;
type mutable_point = { mutable x : float; mutable y : float; } 
 
#let translate p dx dy =
   p.x <- p.x +. dx; p.y <- p.y +. dy;;
val translate : mutable_point -> float -> float -> unit = <fun>
 
#let mypoint = { x = 0.0; y = 0.0 };;
val mypoint : mutable_point = {x=0.000000; y=0.000000}
 
#translate mypoint 1.0 2.0;;
- : unit = ()
 
#mypoint;;
- : mutable_point = {x=1.000000; y=2.000000}

Now that we have references, the while loop construct becomes useful (without references, a while loop would either never execute or loop infinitely -# pretty useless!):

#  (x := 1; (while !x < 10 do x := !x + 1 done); !x);;
- : int = 10

Why immutability is good: programmer can depend on the fact that something will never be mutated when writing code.
ML still lets you express mutation, but its extra so you only use it when its really needed.
Haskell is a pure functional language: there is no mutation whatsoever.

Arrays

Caml arrays are fairly self-explanatory. Their syntax isn't the greatest, and they have to be initialized before you can use them.

# Array.make 10 "hi";; (* size and initial value are the params here *)
- : string array =
[|"hi"; "hi"; "hi"; "hi"; "hi"; "hi"; "hi"; "hi"; "hi"; "hi"|]
# let arr = [| 4; 3; 2 |];;
val arr : int array = [|4; 3; 2|]
# arr.(2);;
- : int = 2
# arr.(2) <- 55;;
- : unit = ()
# arr;;
- : int array = [|4; 3; 55|]

Exceptions

Exceptions are a criticial component of a modern programming language.

Exceptional cases in processes are a fundamental part of what a process is
Real-world examples:
- You miss your airline flight and need to rebook.
- Car wreck on the way to school, morning classes aborted. Resume normally with afternoon classes.
In the programming world, exceptions stop the current immediate task and look for a handler in the current call context.
Exceptions are much better than goto, they go back to a context you came from, not to any context at all.
Exceptional cases raised forces you to deal with them. Just returning e.g. 0 for division by 0 can hide errors.

Example uses

Matrix algorithm must abort a particular calculation method since singularities have resulted
Network connection has accidentally closed and so the database backup (or whatever) has to gracefully fail
Attempted to delete and element from a set that was not there to begin with.


exception Foo;;
exception Foo
# let f _ = raise Foo;;
val f : 'a -> 'b = <fun>

# f ();;
Uncaught exception: Foo.
-

As you can see, exceptions are top-level definable units. Exceptions can be handled. This means even though something bad happened, the program can detect this fact, recover, and continue:


# let g _ = try f () with Foo -> print_string("exception raised; returning 5\n"); 5 ;;
val g : 'a -> int = <fun>
# g();;
exception raised; returning 5
- : int = 5

The call f() followed by handle is syntax for handling errors that may arise when f() is called (including errors that may arise if f in turn calls some other function etc).

Exception foo is only handled when it happens in the call to f() in g.
Exception handling thus always has a scope.

Exceptions that pass up an argument.
Useful both for print diagnostics and error recovery.


exception Goo of string;;
exception Goo
-let f _ = raise (Goo "bad stuff");;
val f : 'a -> 'b = <fun>
# f ();;
Uncaught exception: Goo "bad stuff".
# let g () = try f () with Goo s ->
                  (print_string("exception raised: ");print_string(s);print_string("\n"));;
  val g : unit -> unit = <fun>
# g();;
exception raised: bad stuff
- : unit = ()

Examples

See Basic Examples for some simple programs.

The example sieve.ml shows recursive functions on lists.
The example reversefile.ml is a more imperative program which reverses lines of a file via arrays.

Modules

Modules are an important dimension of programming language design.
A modules is a larger level of program abstraction: functional units or library.
You all know modules in the form of Java packages

Examples:

Stack data structure module (small)
UNIX file I/O module (medium size)
Windowing system interface module (huge, probably a module of modules)

Why?

Its like why you need a bookshelf if you have 50 books: if you have too much stuff, you have to organize it for easier use.
The Java API is a particularly good example: imagine all the classes there were in one big soup, not in separate packages.

Fundamentals of module structure:

Modules have names they can be referred to by.
A module itself contains declarations of functions, classes, types, etc.
The module has an interface in which it imports some things (e.g. other modules) from the outside and exports some things it has declared for outsiders to use, implicitly hiding the rest.

Desirable features not always found:

Module names are also around at run-time.
Should have an explicit interface, one place to look to see what the code is importing and exporting
---its easy to find out what a given module imports and exports by looking in one spot.
Should allow for separate compilation

The C/C++ module system

Informal use of files and filesystem directories as modules (Java makes this more formal)
.h file declaring what is externally visible of a modules; someone importing the module #includes that .h file
Also, for global variables they are treated differently: declare extern in the file using them, no need to explicitly export

Problems with it

There is a global space of function names, so there can be name clashes
There is no strict relation enforced between the .c and .h files, lots of room for error.
This stuff works and great programmers can use it well, but since its informal its very easy to totally screw up and not realize it.
Conclusion: YUCK!

The Java module system: packages

A cleaner version of the C/C++ spirit of module
Directory is explicitly a module; allows for nested modules
Implicit .h file in the public decls on classes/methods
No need to mess with extern
import of other modules by ...
Separate namespaces for each module, avoiding name clashes
But, still have to poke around to figure out the imports/exports
Also, have to have at least the .class files of the imports around to compile
And, need the javadoc to see types of what you are importing
Conclusion: Much better than C/C++ but with a few rough edges

Some languages with good module systems: Modula-2, ML, Ada.

The Caml module system

See The Caml manual Chapter 4.

Basic entites of Caml modules: structures and functors

Caml distinguishes between modules with unresolved imports (functors) and modules without imports or with all imports resolved (structures).
In all other languages, you only see the structure form---you can't load a module without resolving the imports during the load.
In Caml, linking of modules with imports (functors) is done after loading them.
First we do structures, and deal with imports and import resolution (functors) later.

Structures and Signatures

Structures

Structures are collections of definitions (functions, types, otheer structures, exceptions, values, ... ) given a name.
Structures themselves have types, called signatures.
the signature lists names and types of structure components.
The signature of a structure is what outsiders can see; if its not in the signature they can't access it

Here is an example.

module Mapping = 
  struct

     exception NotFound

     (* create the empty mapping *)

     let create = []

     (* lookup(d,M) finds the range value r such that
        (d,r) is a pair in mapping M *)

     let rec lookup pr = 
       match pr with (d,[]) -> raise NotFound
       |               (d,(e,r)::es) -> 
	   if d=e then r
	   else lookup(d,es)

     (* insert(d,r,M) puts (d,r) in mapping M and removes
        any other pair (d,s) that was present in M *)

     let rec insert triple =
       match triple with (d,r,[]) -> [(d,r)]
       |   (d,r,(e,s)::es) ->
             if d = e then (d,r)::es
             else (e,s)::insert(d,r,es)
     end;;

(Note this example should probably use curried functions, not tuple arguments)
Signatures can be inferred if not declared. For the above, the signature inferred is:

module Mapping :
  sig
    exception NotFound
    val create : 'a list
    val lookup : 'a * ('a * 'b) list -> 'b
    val insert : 'a * 'b * ('a * 'b) list -> ('a * 'b) list
  end

Observe the syntactic details.

Structures are written between struct ... end, and the declarations are pretty much what one can type into the top loop (and, may even include other modules).
Structures and signatures are declared (named) by "module" (you cant say "let S = struct ..." --structures aren't expressions)
Signatures are written as sig ... end.
Signatures are very vaguely like a .h file: function headers and type declarations.
Signatures are declared by "signature".
The exception is both declared in the structure and in the signature. type declarations similarly appear in both places. These entities need to be in both places because both the structure and the signature may need to refer to them.

Structure internals may be referenced by qualified names Mapping.insert, etc:

# Mapping.insert(4,"tru",Mapping.create);;
- : (int * string) list = [4, "tru"]
# let m1 = Mapping.insert(4,"tru",Mapping.create);;
val m1 : (int * string) list = [4, "tru"]
# let m2 = Mapping.insert(44,"ru",m1);;
val m2 : (int * string) list = [4, "tru"; 44, "ru"]
#

The whole structure may be made available at the "top level" of the namespace by the declaration open Mapping:

# open Mapping;;
# insert;;
- : 'a * 'b * ('a * 'b) list -> ('a * 'b) list = <fun>

Built-in Caml modules: the standard library.

The Standard Library lists all the built-in modules and their signatures.
Some built-ins include a rich array of operations on the built-in data types: modules Array, Int32, Char, List.
Other modules provide implementations of standard datatypes: Stack, Map, Set, Queue, Hashtbl, Sort
Other built-in modules: Filename, Sys (system interface), etc.

Signatures can be explicitly declared as follows:

# module type INSERTMAPPING =
  sig
    exception NotFound
    val create : 'a list
    val insert : 'a * 'b * ('a * 'b) list -> ('a * 'b) list
  end;;
module type INSERTMAPPING =
  sig
    exception NotFound
    val create : 'a list
    val insert : 'a * 'b * ('a * 'b) list -> ('a * 'b) list
  end

This (stupid) signature leaves out the lookup operation -- this is how internal functions can be hidden. You probably wouldn't want to hide lookup but you get the idea.

Once you have declared a signature, you can restrict a structure to that signature as follows:

# module InsertOnlyMapping = (Mapping : INSERTMAPPING);;
module InsertOnlyMapping : INSERTMAPPING
# InsertOnlyMapping.insert;;
- : 'a * 'b * ('a * 'b) list -> ('a * 'b) list = <fun>
# InsertOnlyMapping.lookup;; (* this guy is hidden *)
Characters 0-24:
Unbound value InsertOnlyMapping.lookup

The above module includes some functions, a value, and an exception declared. Modules will often also include type declarations, as we will see later.

Functors

A Bad Thing is possible in structure definitions: some values used may be defined outside the structure.
This is possible because structures are just typed into the top loop, and can use any variables defied previously in the top loop
This kind of variable use breaks separate understanding and compilation of modules: YUCK!


let lt(x,y) = String.lowercase x < String.lowercase y

module StringBST = struct

  type 'label btree =
      Empty |
         Node of 'label * 'label btree * 'label btree
      
  let rec lookup(x, tree) = 
    match tree with Empty -> false
    |   Node(y,left,right) ->
             if lt(x,y) then lookup(x, left)
             else if lt(y,x) then lookup(x, right)
             else (* x=y *) true

  let rec insert(x, tree) = 
    match tree with Empty -> Node(x,Empty,Empty)
     |   Node(y,left,right) as t ->
             if lt(x,y) then Node(y,insert(x,left),right)
             else if lt(y,x) then Node(y,left,insert(x,right))
             else (* x=y *) t (* do nothing; x was
                                  already there *)

     exception EmptyTree

     (* deletemin(T) returns a pair consisting of the least
        element y in tree T and the tree that results if we
        delete y from T.  It is an error if T is empty *)

     let rec deletemin l =
       match l with Empty -> raise EmptyTree
     |   Node(y,Empty,right) -> (y,right) (* The
                 critical case.  If the left subtree is empty,
                 then the element at current node is min. *)
     |   Node(w,left,right) ->
             let
                 (y,l) = deletemin(left)
             in
                 (y, Node(w,l,right))

     
     let rec delete(x, tree) = 
       match tree with Empty -> Empty
     |   Node(y,left,right) ->
             if lt(x,y) then Node(y,delete(x,left),right)
             else if lt(y,x) then Node(y,left,delete(x,right))
             else (* x=y *)
                 match (left,right) with
                     (Empty,r) -> r |
                     (l,Empty) -> l |
                     (l,r) ->
                         let
                             (z,r1) = deletemin(r)
                         in
                             Node(z,l,r1)

end;;
val lt : string * string -> bool = <fun>
module StringBST :
  sig
    type 'a btree = Empty | Node of 'a * 'a btree * 'a btree
    val lookup : string * string btree -> bool
    val insert : string * string btree -> string btree
    exception EmptyTree
    val deletemin : 'a btree -> 'a * 'a btree
    val delete : string * string btree -> string btree
  end

Before getting into the problems here, observe in this example that there is a type in the signature. So, StringBST.Empty : 'a StringBST.btree.
The above definition (unfortunately in a way) works, but lt had better be defined already. So, the structure isn't explicitly declaring its imports: BAD.
Solution: explicitly import outside information; this turns the structure into a functor.
Functors are like structures, but also explicitly import some other structures.
The view is particularly mathematical: a functor is a kind of mathematical function, which a structure as argument (the imported structure) and returns a structure.
The more "proper" way to define the BST structure is thus to use a functor which imports a module StringLess which defines lt.


# module type STRINGLESS = 
 sig 
   val lt : string * string -> bool 
 end;;
module type STRINGLESS = sig val lt : string * string -> bool end

# module StringLess : STRINGLESS = (* can give signature at declaration time *)
struct
 let lt(x,y) = String.lowercase x < String.lowercase y
end;;
module StringLess : STRINGLESS

# module StringBSTFunctor=
  functor (StringLt : STRINGLESS) ->
  struct

  type 'label btree =
      Empty |
         Node of 'label * 'label btree * 'label btree
      
  let rec lookup(x, tree) = 
    match tree with Empty -> false
    |   Node(y,left,right) ->
             if StringLt.lt(x,y) then lookup(x, left)
             else if StringLt.lt(y,x) then lookup(x, right)
             else (* x=y *) true

  let rec insert(x, tree) = 
    match tree with Empty -> Node(x,Empty,Empty)
     |   Node(y,left,right) as t ->
             if StringLt.lt(x,y) then Node(y,insert(x,left),right)
             else if StringLt.lt(y,x) then Node(y,left,insert(x,right))
             else (* x=y *) t (* do nothing; x was
                                  already there *)

     exception EmptyTree

     (* deletemin(T) returns a pair consisting of the least
        element y in tree T and the tree that results if we
        delete y from T.  It is an error if T is empty *)

     let rec deletemin l =
       match l with Empty -> raise EmptyTree
     |   Node(y,Empty,right) -> (y,right) (* The
                 critical case.  If the left subtree is empty,
                 then the element at current node is min. *)
     |   Node(w,left,right) ->
             let
                 (y,l) = deletemin(left)
             in
                 (y, Node(w,l,right))

     
     let rec delete(x, tree) = 
       match tree with Empty -> Empty
     |   Node(y,left,right) ->
             if StringLt.lt(x,y) then Node(y,delete(x,left),right)
             else if StringLt.lt(y,x) then Node(y,left,delete(x,right))
             else (* x=y *)
                 match (left,right) with
                     (Empty,r) -> r |
                     (l,Empty) -> l |
                     (l,r) ->
                         let
                             (z,r1) = deletemin(r)
                         in
                             Node(z,l,r1)

end;;
module StringBSTFunctor :
  functor (StringLt : STRINGLESS) ->
    sig
      type 'a btree = Empty | Node of 'a * 'a btree * 'a btree
      val lookup : string * string btree -> bool
      val insert : string * string btree -> string btree
      exception EmptyTree
      val deletemin : 'a btree -> 'a * 'a btree
      val delete : string * string btree -> string btree
    end

Observations about functor declarations:

similar to structures but say functor .. end instead of struct ... end
They have an explicit paramater list which are other structures -- the imports.
For those parameters, you must give the structure name and its type (i.e., signature) -- this allows the functor to be separately compiled, but requires extra work you didn't need to do in Java.

What state are we in now, after defining this functor?

The functor is loaded, but we can't do anything with the innerds because we don't have all imports resolved.
In Java, etc, you don't have such half-alive beasts around.
In Caml you can't do much with them, so generally thet should be linked to their imports
One cool thing that is possible with a functor is to create two different versions, with different imports.

Here is how you resolve the imports and make a structure.

# module StringBST = StringBSTFunctor (StringLess) (* apply functor to make structure *)
;;

module StringBST :
  sig
    type 'a btree =
      'a StringBSTFunctor(StringLess).btree =
        Empty
      | Node of 'a * 'a btree * 'a btree
    val lookup : string * string btree -> bool
    val insert : string * string btree -> string btree
    exception EmptyTree
    val deletemin : 'a btree -> 'a * 'a btree
    val delete : string * string btree -> string btree
  end

This guy is a structure and behaves like any other structure.
In particular, you can give it an abstracted signature.

Type declarations are particularly useful in signatures: you can hide the details of a type by declaring

 type 'a btree

alone in the signature, hiding what 'a btree really is (this makes btree abstract):

# module type STRINGBST =
  sig
    type 'a btree  (* hide the details of the btree type in the signature *)
    val lookup : string * string btree -> bool
    val insert : string * string btree -> string btree
    exception EmptyTree
    val deletemin : 'a btree -> 'a * 'a btree
    val delete : string * string btree -> string btree
  end;;
module type STRINGBST =
  sig
    type 'a btree
    val lookup : string * string btree -> bool
    val insert : string * string btree -> string btree
    exception EmptyTree
    val deletemin : 'a btree -> 'a * 'a btree
    val delete : string * string btree -> string btree
  end
# module AbstractStringBST = (StringBST : STRINGBST);;
module AbstractStringBST : STRINGBST
#

In the above example, we applied the functor StringBSTFunctor and then hid some elements of the result. What you probably want to do is fix StringBSTFunctor to always produce the abstract type in the result. That is done as follows.


# module AbstractStringBSTFunctor = (StringBSTFunctor : functor(Lt:STRINGLESS) -> STRINGBST);;
module AbstractStringBSTFunctor : functor (Lt : STRINGLESS) ->
STRINGBST

Problems with functors

In terms of the import/export module model, functors map imports to their exports. Often this is fine, but it disallows circular dependencies, where two modules import from each other.
Explicit signatures or arguments need to be supplied. So, you need to have all of these signatures lying around.
There is a run-time aspect to building structures with functors, e.g. it allows things to be implicitly imported from the current state like the lt operator above.

Separate Compilation

Even though we typed all the modules above into the top-loop, Caml in fact allows modules (structures in particular) to be separately compiled.
This methodology is useful when very large pieces of software are being developed.
It avoids many of the pitfalls of functors.
It is similar to Java' packages but
- Java implicitly pulls the interfaces out of the .class code files; in ocamlc they are in their own files.
- You hide information by declaring a restricted signature in a .mli file, not through use of private/protected/package protected declarations

Coding in the separate compilation method in Caml is a lot closer to the C/Java programming style.

In a .ml file, everything you type is implicitly wrapped in a struct .. end: it defines a struct. File snork.ml always makes a structure named Snork, so the naming is implicit.
In a .mli file, its the analogous concept for a signature: everything you type is implicitly wrapped in a sig .. end. (You don't need an .mli file for each .ml file: its inferred if its not there)
.ml files are analogous to .c/.cc/.java files, and .mli files, analogous to .h files.
.ml files compile to .cmo object files, analogous to how .c/.java files produce .o/.class files. .mli files are also compiled (unlike .h files) and produce .cmi files.

Here is an example, the file stack.ml implementing stacks in the ocaml library:

type 'a t = { mutable c : 'a list }
exception Empty
let create () = { c = [] }
let clear s = s.c <- []
let copy s = { c = s.c }
let push x s = s.c <- x :: s.c
let pop s =
  match s.c with
    hd::tl -> s.c <- tl; hd
  | []     -> raise Empty
let top s =
  match s.c with
    hd::_ -> hd
  | []     -> raise Empty
let is_empty s = (s.c = [])
let length s = List.length s.c
let iter f s = List.iter f s.c

And, here is the interface file, stack.mli:

type 'a t
exception Empty
val create : unit -> 'a t
val push : 'a -> 'a t -> unit
val pop : 'a t -> 'a
val top : 'a t -> 'a
val clear : 'a t -> unit
val copy : 'a t -> 'a t
val is_empty : 'a t -> bool
val length : 'a t -> int
val iter : ('a -> unit) -> 'a t -> unit

Notice the interface hides the type 'a t and so outsiders will not be able to directly manipulate the stack data structure.

Compiling

The name of the compiler is ocamlc
```
   %  ocamlc stack.ml  
```
will make file stack.cmo, and
```
   %  ocamlc stack.mli  
```
will make file stack.cmi. If there is no .mli file, compiling the .ml file will make both the .cmi and .cmo files.
Suppose you have a file myprog.ml that has a function main defined in it. Then,
```
   %  ocamlc myprog.ml -o myprog 
```
will make files myprog.cmi, myprog.cmo, and myprog. Then, ./myprog will execute main.
Syntax open MyStruct at the top of a .ml file will open that structure, so you won't need to write MyStruct.myfunction (similar to use of Java).
For structures using other structures, define a load order so the structures a given structure is dependent on are always loaded before it. If two structures are mutually recursive, you are out of luck, unfortunately.
ocamlc produces bytecode which is interpreted. There is also ocamlopt which is an optimizing compiler that produces much faster binary code.
See the manual chapter 8 for more on batch compilation.

Last modified: Fri Jan 29 08:19:19 EST 2010