Modules Basics
We’ve been using modules without thinking too much about them: List.map
, Float.(2 = 3)
, Fn.id
– these are uses of modules List
, Float
, and Fn
that are in the Core
library.
Now we will look at how to define our own modules to make our own libraries and code components.
What is a module?
A module is a collection of OCaml definitions:
let
-defined entities, i.e. functions and values- types
-
other modules
- Modules are something like records, but they can also hold e.g. types which makes them much more powerful.
- But, modules are not first class values like records – for example, they can’t directly be passed as arguments to functions.
.ml
files are modules
OCaml has a simple rule:
- The contents of a file
foo.ml
define the moduleFoo
. - Capitalize the first letter (only) and drop the
.ml
to turn a file name into its module name.
In this lecture, we’ll work with a running example. See set-example.zip for the full code.
Here is string_set.ml
from that example:
open Core
type t = string list (* This is a *type abreviation*: a string set is a list of strings *)
let empty : t = [] (* the one canonical empty set *)
let add (x : string) (s : t) : t = x :: s
let rec remove (x : string) (s : t) : t =
match s with
| [] -> failwith "item is not in set"
| hd :: tl ->
if String.equal hd x
then tl (* we don't remove from the tail: this is actually a multiset *)
else hd :: remove x tl
let rec contains (x : string) (s : t) : bool =
match s with
| [] -> false
| hd :: tl ->
if String.equal x hd
then true
else contains x tl
Loading a module into utop
Use dune utop
to fire up the OCaml toploop with the module loaded.
- Then access the module’s contents in
String_set
.
$ dune utop
# String_set.add "hello" String_set.empty ;;
- : String_set.t = ["hello"]
Or open the module with open String_set
to put everything from inside it into scope:
# open String_set ;;
# add "hello" empty ;;
- : t = ["hello"]
- Here, the fact that we used a
list
to implement the set is exposed to library users. - Naming the type
t
is standard for “the” underlying type in a module (if there is one).- Built-in libraries also use this: for example,
Int.t
is an alias forint
, etc.
- Built-in libraries also use this: for example,
- Then,
String_set.t
is read as “string set’s underlying type`.
Hiding details with module types
Modules have types, called module types or signatures.
- The latter term is used in math, e.g. “a DFA has signature D = (S, Σ, τ, s0, F)”
- In a signature all the types of entities in the module are declared
- And, types declared in a module are repeated in the signature again (a bit odd, but for a reason)
- Module types are also placed in files, just put an
i
on the end- So for example the module type of
String_set
is in thestring_set.mli
file.
- So for example the module type of
- Here are the contents of that file:
type t = string list (* Type declarations are by default copied from .ml to .mli file *)
(* type t (* this alternate version of type t declaration *hides* t's internals *) *)
val empty : t
val add : string -> t -> t
val remove : string -> t -> t
val contains : string -> t -> bool
- See how we repeat the
type t =
alias declaration in this.mli
file - But, there is an alterative way to write that declaration: remove
= string list
- comment the first line and uncomment the second to get that version
- By doing this, the type
t
has been made abstract: users no longer can seet
is a list
Now if we save that change and type dune utop
:
# String_set.add "hello" String_set.empty ;;
- : String_set.t = <abstr>
The printed value is now <abstr>
, not ["hello"]
like before.
- The type is hidden; so are the values.
Why do this?
- Good:
- Program to interfaces, not implementations.
- We can change the implementation without changing client code.
- The abstraction prevents misuse and maintains invariants.
- Bad:
- It’s hard to see what’s going on in
utop
. - It can be harder to test our module.
- It’s hard to see what’s going on in
Further, anything define in the .ml
that is not declared in the .mli
is not accessible to users.
- It’s like those types/values are
private
. - If there is nothing to hide, then you don’t need an
.mli
file at all. The type of the module will be inferred. - All assignments come with an
.mli
file so you get used to the format- Also the documentation specifying what a function does should go in the
.mli
file by convention - We have followed that pattern for Assignment 3
- Also the documentation specifying what a function does should go in the
Building modules
Recall that dune
files are like Makefile
s for OCaml
To make a library module from the string_set.ml
file, we include this in the dune
file:
; in file `src/dune`
(library
(name string_set)
(modules string_set)
(libraries core)
)
Writing executables
Let’s make an actual executable program to do something, not just a library.
- The file
set_main.ml
is our example, it takes a string and a file name and looks for that line in the file. - Executables work by running all statements in the file from top to bottom. Any side effects are the “outputs” of the executable.
- Note that pure functional programs are useless as executables, input and output is a side effect and we need it to write applications.
- Typically, the main work in an executable is put under a
let () = ...
statement. The...
evaluates to() : unit
, and the side effects it performs are what we see.
(* Just a helper function. Does not run until it's given arguments in `let () = ...` *)
let do_search search_string filename =
let my_set =
filename
|> In_channel.read_lines
|> List.fold ~f:(fun set elt -> String_set.add elt set) ~init:String_set.empty
in
if String_set.contains search_string my_set
then print_string @@ "\"" ^ search_string ^ "\" found\n"
else print_string @@ "\"" ^ search_string ^ "\" not found\n"
(* This statement has some printing side effects that we observe when running the executable *)
let () =
match Array.to_list (Sys.get_argv ()) with
| _ :: search_string :: filename :: _ -> do_search search_string filename
| _ -> failwith "Invalid arguments: requires two parameters, search string and file name"
Building executables
To build our executable, the dune
file has a stanza for an executable
:
; in file `src/dune`
(executable
(name set_main)
(modules set_main)
(libraries string_set core) ; uses Core and the String_set module we made
)
This makes an executable out of the set_main.ml
file.
Running executables
- If you declared an executable in
dune
as above, it will make a fileset_main.exe
- To run it, you can do
dune exec -- ./src/set_main.exe "open Core" src/string_set.ml
- Which is really just
_build/default/src/set_main.exe "open Core" src/string_set.ml
after building
Aside: the Stdio.In_channel
library used in this executable
set_main.ml
uses theIn_channel
module to read in file contents- (Note that I/O is a side effect, I/O functions do things besides the value returned)
- It is part of the
Stdio
module (which is itself included inCore
soCore.In_channel
is the same asStdio.In_channel
) - The Documentation is here; we will go through it to observe a few points
- First, now that we covered abstract types we can see there is an abstract type
t
here - As with our own set, it is “the underlying data” for the module, in this case file handles
- It is hidden though so we don’t get access to the details of how “files are handled”
- In Visual Studio hover over a function definition to get the docs
- First, now that we covered abstract types we can see there is an abstract type
Aside: Optional arguments
- One topic we skipped over which is in many of these libraries is optional arguments
- They are named arguments but you don’t need to give them, indicated by a
?
before the name. - If you do give them, they are like named aguments, use
~name:
syntax - e.g. in
In_channel.read_lines
,?fix_win_eol
is an optional boolean argument- To use it just add
~fix_win_eol: true
(since its optional and we want the default false we left it off) - If you write a function with an optional argument it will show up to you as an
option
-typed object:Some
(given) orNone
(not given).
- To use it just add
- Many languages now support optional arguments
Example of writing a function with an optional argument:
# let f ?x y = match x with Some z -> z + y | None -> y;;
val f : ?x:int -> int -> int = <fun>
# f ~x:1 2;;
- : int = 3
# f 2;;
- : int = 2
- Use them when they are the right thing: will reduce clutter of passing often un-needed items.
Aside: The Sys
library in the set_main.ml
code
- We are using this library to read in the command line args, via
Sys.get_argv
. - The documentation is here
- Notice how this particular module has no carrier type
t
, it is just a collection of utility functions.
- Notice how this particular module has no carrier type
Aside: Modules within modules
- It is often useful to have modules inside of modules for further code “modularization”
- The way it is declared is in e.g.
foo.ml
(which itself defines the items for moduleFoo
using the above convention), addmodule Sub = struct let blah = ... ... end
where the
...
are the same kinds of declarations that are in files likefoo.ml
. - This syntax is also how we can directly define a module in
utop
without putting it in a file. - In the remainder of the file you can access the contents of
Sub
asSub.blah
, and outside of thefoo.ml
fileFoo.Sub.blah
will access.
Aside: Referencing and disambiguating types declared in modules
module A = struct
type t = { x : int ; y : bool }
end
let ra = A.{ x = 0 ; y = true } (* Need to write `A.` here to make the type `A.t` visible *)
module B = struct
type t = { x : int ; z : float }
end
let rb = B.{ x = 0 ; z = 1.1 }
- Recall that
open
makes the contents of a module directly available. - Now if
A
andB
are both opened, the most recently openedt
will win.
open A
open B
let f r = r.x (* type inferred for r is B.t, just like with newratio *)
(* A type annotation will disambiguate: *)
let f (r : A.t) : int = r.x
Aside: @@deriving in modules
@@deriving
names things slightly differently when used in a module.
Suppose we made a module out of our previous nucleotide example, either by putting in a file nucleotide.ml
or adding module Nucleotide = struct .. end
to make a top-loop or nested module:
module Nucleotide = struct
type t = A | C | G | T [@@deriving equal]
let hamming_distance l = failwith "dummy"
end
- When this type was called
nucleotide
not in a module theppx
made a functionequal_nucleotide
- Here the
ppx
is smarter, instead ofNucleotide.equal_t
it just makesNucleotide.equal
-t
is a special type in the module - Note that
[@@deriving ..]
declarations in types in the.ml
file need to be repeated in the.mli
file if the types are not hidden