Modules Basics
We’ve been using modules without thinking too much about them: List.map, Float.(2 = 3), Fn.id – these are uses of modules List, Float, and Fn that are in the Core library.
Now we will look at how to define our own modules to make our own libraries and code components.
What is a module?
A module is a collection of OCaml definitions:
let-defined entities, i.e. functions and values- types
-
other modules
- Modules are something like records, but they can also hold e.g. types which makes them much more powerful.
- But, modules are not first class values like records – for example, they can’t directly be passed as arguments to functions.
.ml files are modules
OCaml has a simple rule:
- The contents of a file
foo.mldefine the moduleFoo. - Capitalize the first letter (only) and drop the
.mlto turn a file name into its module name.
In this lecture, we’ll work with a running example. See set-example.zip for the full code.
Here is string_set.ml from that example:
open Core
type t = string list (* This is a *type abreviation*: a string set is a list of strings *)
let empty : t = [] (* the one canonical empty set *)
let add (x : string) (s : t) : t = x :: s
let rec remove (x : string) (s : t) : t =
match s with
| [] -> failwith "item is not in set"
| hd :: tl ->
if String.equal hd x
then tl (* we don't remove from the tail: this is actually a multiset *)
else hd :: remove x tl
let rec contains (x : string) (s : t) : bool =
match s with
| [] -> false
| hd :: tl ->
if String.equal x hd
then true
else contains x tl
Loading a module into utop
Use dune utop to fire up the OCaml toploop with the module loaded.
- Then access the module’s contents in
String_set.
$ dune utop
# String_set.add "hello" String_set.empty ;;
- : String_set.t = ["hello"]
Or open the module with open String_set to put everything from inside it into scope:
# open String_set ;;
# add "hello" empty ;;
- : t = ["hello"]
- Here, the fact that we used a
listto implement the set is exposed to library users. - Naming the type
tis standard for “the” underlying type in a module (if there is one).- Built-in libraries also use this: for example,
Int.tis an alias forint, etc.
- Built-in libraries also use this: for example,
- Then,
String_set.tis read as “string set’s underlying type`.
Hiding details with module types
Modules have types, called module types or signatures.
- The latter term is used in math, e.g. “a DFA has signature D = (S, Σ, τ, s0, F)”
- In a signature all the types of entities in the module are declared
- And, types declared in a module are repeated in the signature again (a bit odd, but for a reason)
- Module types are also placed in files, just put an
ion the end- So for example the module type of
String_setis in thestring_set.mlifile.
- So for example the module type of
- Here are the contents of that file:
type t = string list (* Type declarations are by default copied from .ml to .mli file *)
(* type t (* this alternate version of type t declaration *hides* t's internals *) *)
val empty : t
val add : string -> t -> t
val remove : string -> t -> t
val contains : string -> t -> bool
- See how we repeat the
type t =alias declaration in this.mlifile - But, there is an alterative way to write that declaration: remove
= string list- comment the first line and uncomment the second to get that version
- By doing this, the type
thas been made abstract: users no longer can seetis a list
Now if we save that change and type dune utop:
# String_set.add "hello" String_set.empty ;;
- : String_set.t = <abstr>
The printed value is now <abstr>, not ["hello"] like before.
- The type is hidden; so are the values.
Why do this?
- Good:
- Program to interfaces, not implementations.
- We can change the implementation without changing client code.
- The abstraction prevents misuse and maintains invariants.
- Bad:
- It’s hard to see what’s going on in
utop. - It can be harder to test our module.
- It’s hard to see what’s going on in
Further, anything define in the .ml that is not declared in the .mli is not accessible to users.
- It’s like those types/values are
private. - If there is nothing to hide, then you don’t need an
.mlifile at all. The type of the module will be inferred. - All assignments come with an
.mlifile so you get used to the format- Also the documentation specifying what a function does should go in the
.mlifile by convention - We have followed that pattern for Assignment 3
- Also the documentation specifying what a function does should go in the
Building modules
Recall that dune files are like Makefiles for OCaml
To make a library module from the string_set.ml file, we include this in the dune file:
; in file `src/dune`
(library
(name string_set)
(modules string_set)
(libraries core)
)
Writing executables
Let’s make an actual executable program to do something, not just a library.
- The file
set_main.mlis our example, it takes a string and a file name and looks for that line in the file. - Executables work by running all statements in the file from top to bottom. Any side effects are the “outputs” of the executable.
- Note that pure functional programs are useless as executables, input and output is a side effect and we need it to write applications.
- Typically, the main work in an executable is put under a
let () = ...statement. The...evaluates to() : unit, and the side effects it performs are what we see.
(* Just a helper function. Does not run until it's given arguments in `let () = ...` *)
let do_search search_string filename =
let my_set =
filename
|> In_channel.read_lines
|> List.fold ~f:(fun set elt -> String_set.add elt set) ~init:String_set.empty
in
if String_set.contains search_string my_set
then print_string @@ "\"" ^ search_string ^ "\" found\n"
else print_string @@ "\"" ^ search_string ^ "\" not found\n"
(* This statement has some printing side effects that we observe when running the executable *)
let () =
match Array.to_list (Sys.get_argv ()) with
| _ :: search_string :: filename :: _ -> do_search search_string filename
| _ -> failwith "Invalid arguments: requires two parameters, search string and file name"
Building executables
To build our executable, the dune file has a stanza for an executable:
; in file `src/dune`
(executable
(name set_main)
(modules set_main)
(libraries string_set core) ; uses Core and the String_set module we made
)
This makes an executable out of the set_main.ml file.
Running executables
- If you declared an executable in
duneas above, it will make a fileset_main.exe - To run it, you can do
dune exec -- ./src/set_main.exe "open Core" src/string_set.ml - Which is really just
_build/default/src/set_main.exe "open Core" src/string_set.mlafter building
Aside: the Stdio.In_channel library used in this executable
set_main.mluses theIn_channelmodule to read in file contents- (Note that I/O is a side effect, I/O functions do things besides the value returned)
- It is part of the
Stdiomodule (which is itself included inCoresoCore.In_channelis the same asStdio.In_channel) - The Documentation is here; we will go through it to observe a few points
- First, now that we covered abstract types we can see there is an abstract type
there - As with our own set, it is “the underlying data” for the module, in this case file handles
- It is hidden though so we don’t get access to the details of how “files are handled”
- In Visual Studio hover over a function definition to get the docs
- First, now that we covered abstract types we can see there is an abstract type
Aside: Optional arguments
- One topic we skipped over which is in many of these libraries is optional arguments
- They are named arguments but you don’t need to give them, indicated by a
?before the name. - If you do give them, they are like named aguments, use
~name:syntax - e.g. in
In_channel.read_lines,?fix_win_eolis an optional boolean argument- To use it just add
~fix_win_eol: true(since its optional and we want the default false we left it off) - If you write a function with an optional argument it will show up to you as an
option-typed object:Some(given) orNone(not given).
- To use it just add
- Many languages now support optional arguments
Example of writing a function with an optional argument:
# let f ?x y = match x with Some z -> z + y | None -> y;;
val f : ?x:int -> int -> int = <fun>
# f ~x:1 2;;
- : int = 3
# f 2;;
- : int = 2
- Use them when they are the right thing: will reduce clutter of passing often un-needed items.
Aside: The Sys library in the set_main.ml code
- We are using this library to read in the command line args, via
Sys.get_argv. - The documentation is here
- Notice how this particular module has no carrier type
t, it is just a collection of utility functions.
- Notice how this particular module has no carrier type
Aside: Modules within modules
- It is often useful to have modules inside of modules for further code “modularization”
- The way it is declared is in e.g.
foo.ml(which itself defines the items for moduleFoousing the above convention), addmodule Sub = struct let blah = ... ... endwhere the
...are the same kinds of declarations that are in files likefoo.ml. - This syntax is also how we can directly define a module in
utopwithout putting it in a file. - In the remainder of the file you can access the contents of
SubasSub.blah, and outside of thefoo.mlfileFoo.Sub.blahwill access.
Aside: Referencing and disambiguating types declared in modules
module A = struct
type t = { x : int ; y : bool }
end
let ra = A.{ x = 0 ; y = true } (* Need to write `A.` here to make the type `A.t` visible *)
module B = struct
type t = { x : int ; z : float }
end
let rb = B.{ x = 0 ; z = 1.1 }
- Recall that
openmakes the contents of a module directly available. - Now if
AandBare both opened, the most recently openedtwill win.
open A
open B
let f r = r.x (* type inferred for r is B.t, just like with newratio *)
(* A type annotation will disambiguate: *)
let f (r : A.t) : int = r.x
Aside: @@deriving in modules
@@deriving names things slightly differently when used in a module.
Suppose we made a module out of our previous nucleotide example, either by putting in a file nucleotide.ml or adding module Nucleotide = struct .. end to make a top-loop or nested module:
module Nucleotide = struct
type t = A | C | G | T [@@deriving equal]
let hamming_distance l = failwith "dummy"
end
- When this type was called
nucleotidenot in a module theppxmade a functionequal_nucleotide - Here the
ppxis smarter, instead ofNucleotide.equal_tit just makesNucleotide.equal-tis a special type in the module - Note that
[@@deriving ..]declarations in types in the.mlfile need to be repeated in the.mlifile if the types are not hidden