Functional Programming and Software Engineering

Specification

Terminology
- Specification: what the program should do
- Implementation: how it does it
Let us step back and look at the bigger picture of specifying

Correctness

The high-level goal is to have software that is correct
There are many levels of interpretation for correct
- Informal spec.: we had some vague idea of what the code should do, wrote it, and iteratively addressed feedback of users until bug reports shrunk to very low levels.
- Semi-formal spec.: Sat down with stakeholders, wrote out a specification document in English with a few pictures/formulas, added many tests to code which affirmed all aspects of this spec.
- Rigorous spec.: Have an unambiguous mathematical notion of what correct behavior should be, write it out as the formal specification, make sure code meets it.
The best mode from the above depends on the project: how complex are the algorithms/architecture, and how mission-critical is it?
- Need to go further down the above list as complexity and mission-critical aspects increase

Forms of specification

Specifications can range from informal to completely rigorous and unambiguous

Requirements and Design documents: high-level informal descriptions of what the app should do
- This is the classic form of specification in software engineering; includes description/pictures/etc but is far from code
Types, e.g. writing an .mli file before implementing the code
- Gives a very rigorous, compiler-checked skeleton for the code; we are doing this for you on the assignments
- Much more precise than design docss in terms of code/spec relationship
- But, limited expressiveness since types can express only so much (no “the tree is balanced” in an OCaml type)
Tests
- A test suite constitutes a specification on a finite window of behavior - can’t run on the unbounded input possibilities
- So, 100% accurate on the cases tested, but could miss many important cases if tests are not thoughtfully written.
- Coverage measurements can be used to improve tests: what lines of code were not reached by any test?
Assertions specifying code behavior in mathematical logic
- a precondition is a logical constraint on function arguments (before it is called: “pre”)
- a postcondition is a constraint on the value returned from a function
- an invariant is a property internal to the function or internal to a data structure that always holds (it “invariably” holds)
- Some examples
  - precondition on a function that tree parameter is a binary tree (left values < right values)
  - precondition on a function that tree parameter is a balanced binary tree
  - postcondition that List.sort always returns a sorted list
  - precondition/postcondition on tree add function that if the input tree is balanced the output tree will also be.
  - invariants on internal data structures such as a Set implementation which uses an underlying always-sorted list.
  - inductive invariants on recursive algorithms, e.g. assuming in the body of list reverse that it works on a shorter list.
- Logical assertions are more general than tests since they are for all inputs
  - but not necessarily verified (but, invariants do guide tests - write tests to verify on lots of examples)
Verified assertions aka formal methods
- Verify the code meets preconditions/postconditions/invariants using a tool (Roq/Lean/Dafne/Z3/etc)
- This is like nuclear fusion: a research topic for 50+ years, very slowly becoming more viable.

Type-directed programming

As mentioned above, types are a very lightweight layer of code specification
- Particularly nice in that they are all quickly checked and can even be inferred.
- Think of types as outlining the “shape” of the code you need to write in terms of the data structures/functions.

What is Type-Directed Programming??

In fact, you have been doing type-directed programming all along
- We gave you the types for the HW questions, and those types are useful in debugging your code, right??
That is all the principle is: writing code that matches declared type will get you well on the way to an implementation
- So, start with the type
- Type errors are definitely errors, no need to run tests to find them, the editor will yell at you immediately.
- When the last type error drops, the code often will directly work
Type-directed programming is 100% rigorous, but is incomplete: types only express rough shapes of data
- e.g. int list is a rough shape compared to “sorted int list” but the latter isn’t a type in OCaml
- Our lab (and others) are working on type systems which allow you to state { l : int list | is_sorted(l) }”

Simple type-directed programming examples

Not bubbling up option or other wrapped results properly

# let zadd (l1 : int list) (l2 : int list) : (int list) = let l = List.zip l1 l2 in List.map ~f:(fun (x,y) -> x+y) l;;
Line 1, characters 74-75:
Error: This expression has type (int * int) list List.Or_unequal_lengths.t
       but an expression was expected of type 'c list

Recall that List.zip doesn’t just return a list, the programmer missed that here
To solve this type error you will need to match on the result, which should fix both type error and code behavior

Variation on type-directed programming: with only some parameters applied, the remaining types hint at what is still needed.

# let l = [[3;3]; [4;4]; [22;17]] in List.map l;;
- : f:(int list -> '_weak1) -> '_weak1 list = <fun>

The type shows that f needs to be a function taking an int list as argument.

The type in an .mli file can direct your implementation, e.g. map on simpledict example from Assn3 (recall that t here is the simpledict)

val map : 'a t -> f:(string -> 'a -> 'b) -> 'b t

Q: “How can I get a dict of 'b’s built from a dict of 'a’s here?”
A: “Use the function from 'a to 'b to make elements of 'b simpledict.

Variant extension: add a new variant case, chase the type errors to patch the code (similar tactic for record field addition)

type party = Dem | Rep
type voter = { name : string; party: party }
let count_parties (l : voter list) =
  List.fold l ~init: (0,0) 
    ~f:(fun (cd,cr) -> fun {party; _} -> 
     match party with 
     | Dem -> (cd+1, cr)
     | Rep -> (cd, cr+1) );;

Adding a Gre for green party: first just change the type, and chase/fix errors

# type party = Dem | Rep | Gre
  type voter = { name : string; party: party }
  let count_parties (l : voter list) =
    List.fold l ~init: (0,0) 
      ~f:(fun (cd,cr) -> fun {party; _} -> 
      match party with 
      | Dem -> (cd+1, cr)
      | Rep -> (cd, cr+1) );;
Lines 6-8, characters 5-24:
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
Gre
type party = Dem | Rep | Gre
type voter = { name : string; party : party; }
val count_parties : voter list -> int * int = <fun>

Shows a new match case is needed; in process of adding new case it will become clear a triple is also needed
This example also shows why non-exhaustive pattern matches are bad: errors often lurk

Conclusion: Don’t wrestle with OCaml’s types, dance with them

Preconditions, Postconditions, and Data Structure Invariants

We covered these above, now lets put them into practice in our OCaml code

Let us consider some preconditions and postconditions on the Simple_set.Make functor example (click for zipfile):

module Make (M : EQ) = struct
  type t = M.t list

  let empty : t = []

  let add (x : M.t) (s : t) : t = x :: s

  let rec remove (x : M.t) (s : t) : t =
    match s with
    | [] -> failwith "item is not in set"
    | hd :: tl -> if M.equal hd x then tl else hd :: remove x tl

  let rec contains (x : M.t) (s : t) : bool =
    match s with
    | [] -> false
    | hd :: tl -> if M.equal x hd then true else contains x tl
end

Here are some pre/post conditions in the above code:

Precondition on remove: s is not empty (fails otherwise)
Stronger precondition on remove: contains x s must hold
Postcondition on remove for it returning set s': not(contains x s') - ??
- This simple “set” data structure is in fact a multiset and this will not always hold
- If we were trying to implement a true set this postcondition would help us catch an error
Postcondition on add x s: for the resulting set s', contains x s' holds

Assertions in code

OCaml assert statements can be placed in code to directly verify properties
- program dies if the assertion fails, it should always hold
- silently returns () if it succeeds

Example new version of add above:

let add (x : M.t) (s : t) = 
let s' = (x :: s) in assert (contains x s'); s'

Asserts are handy for development mode, but not after deployment (slows things down)
Generally it is better to make tests to spot-check assertions instead of using assert
- But, one thing handy about assert is it puts the assertions in the code, as “rigorous comments”
- Middle approach: ppx_inline_tests is a library where you can write tests in-line with your code
- example:
```
let%test "add adds" = contains (add (add 5 emptyset) 22) 22
```
- inline tests both document invariants and serve as tests: two-for-one!
- They also allow functions and data structures hidden in a module to be tested within that module
  - This also gets around an issue with OCaml’s modules and hidden code that needs to be tested
    - The tester module needs to see the local-only functions in the library module to test them
      (recall we make local-only things in an .ml file by leaving them out of the .mli)
    - So, they would need to be made non-local for that, defeating the locality
    - Besides ppx_test and exposing, a third solution is to put the local functions in a separate local.ml module with no .mli

Data structure invariants

As discussed above, data structure invariants are useful when not all elements of a type are “good” to have around in a context
- for example the “ordered tree” and “balanced tree” examples mentioned above
Example from simpledict on homework: is_ordered must hold for the Simpletree.t.

Recursion Invariants

Recursion and other loops (e.g. in fold) is a prime place to assert invariants
(Even if you don’t write them out, thinking of the invariants are critical to coding recursive programs)
A standard invariant for recursive functions is that the recursive calls return what the calling function expected

let rec rev l = 
  match l with 
  | [] -> []
  | x::xs -> let rxs = rev xs in assert(Poly.(List.rev xs = rxs)); rxs @ [x]

Note that we have to use the built-in List.rev to test our version - somewhat circular (note Poly quick-open lets = work on any type, its also what assert_equal in OUnit is doing)
In general a major issue with specification is it is often very difficult to give a code-based definition of the full spec.
So, the main focus should be on partial specs, give sanity conditions

Examples of Invariants Over Folds

In re-implementing some of the common List functions with folds it helps to think of the invariant
Folding left (List.fold):
- Suppose we are at some arbitrary point processing the fold;
- assume accumulation accum has “the result of the task” for all elements to the left in the list
- require ~f to then “do the task” to incorporate the current element elt
- also assume accum is initially init
Folding right: very similar, but accum is result for all elements to the right in the list

(* invariant for length fold: accum is number of elements to the left of the current `elt` we are at *)
let length l = List.fold ~init:0 ~f:(fun accum _ -> accum + 1) l 
(* invariant for rev fold: accum is reverse of list up to here *)
let rev l = List.fold ~init:[] ~f:(fun accum elt -> elt :: accum) l 
(* invariant for map fold: accum is f applied to each element of list up to here *)
let map ~f l = List.fold_right ~init:[] ~f:(fun elt accum -> (f elt) :: accum) l
(* etc *)
let filter ~f l = List.fold_right ~init:[] ~f:(fun elt accum -> if f elt then elt :: accum else accum) l

Specification and Abstraction

The more completely a module is specified the less the users need to know about the underlying implementation
Core.Map is an example where the users need to know limited information about the implementation
- they still do need to know it is O(log n) for add/remove/find, that should be in the docs (.mli/odoc)
- (Unfortunately Core is not always so good about that; for Map they give big-O of more advanced ops but assume users know complexity of these basic ones. Bad Core!)
On your own libraries you will want to document them well
- It will make it a lot easier for your users, they can just think about the spec. view.
- odoc formatting for .mli files is in the FPSE Style Guide

Testing

Writing tests has two important and distinct goals
1. For specification: use tests to define and refine what the code should do
  - Writing tests before fully coding the answer lets the tests serve as your “coding spec”
  - Adding tests for corner cases will clarify the behavior in those cases
    (e.g. does removing an element from a set which doesn’t contain the element raise an exception, or does it just return the set unchanged?)
2. For implementation: use tests to find bugs in code
  - Before the code is working the tests will point out which cases are failing
  - Adding tests covering past bugs will make sure they are caught quickly next time
  - Related term: regression testing, running tests after an update to make sure your code didn’t regress, i.e. old bugs resurfaced
Related terminology you may see
- Black-box tests are tests written against the spec
- Glass-box tests are in the context of bugs in the code and other code properties

Standard categories of tests

Unit testing: what you have mainly done – test the small pieces of the app; no I/O testing
Acceptance testing: test the bigger pieces including I/O
- For example testing your keywordcount.exe on a certain fixed directory tree.
Random testing of which there are many types: fuzz testing / monkey testing / property-based testing / quickcheck:
- the tests are run on data generated randomly from some distribution
- “quickcheck”ing aka property-based testing is running unit tests on randomly generated data (random lists of ints, etc)
- “fuzz testing” is running acceptance tests with random input strings supplied.

Testing and coverage

Code coverage is a great glass-box (impl-based) metric of how good your test suite is
The simple idea of coverage: are there lines of your code that never get exercised by any of your tests?
Coverage tools let you easily check this.
We will show how the Bisect coverage tool can be used below

OUnit2

We have been using the OUnit2 library mostly as a black box up to now
Now we will go through the details, which are in fact very simple
- There is not much in OUnit2 per se, if you want something extra just write some higher-order functions to do it

To review, here is your standard simple tests.ml file, this one is from the String set example:

open OUnit2
open String_set

let tests = "test suite for simple set" >::: [
"empty"  >:: (fun _ -> assert_equal empty empty);
"3-elt"    >:: (fun _ -> assert_equal true (contains "5" (add "5" empty)));
"1-elt nested" >:: (fun _ -> assert_equal false (contains "5" (remove "5" (add "5" empty))));
]

let () = run_test_tt_main tests

OUnit2.assert_equal is just the OUnit2 version of assert, it uses Poly.(=) for simplicity (but be careful)
The infix >:: operator takes a string (test name) and a piece of test code under fun _ -> (to keep it from running right away) and builds a single test of type test (type #require "ounit2" and open OUnit2 to the top-loop before playing with this code there):
```
# let test1 = "simple test" >:: fun _ -> assert_equal (2 :: []) [2];;
  val test1 : test =
  TestLabel ("simple test",
  TestCase (Short, <fun>))
```

The >::: operator simply takes a test list and builds a test suite (which in fact is just of type test)

# let test_suite = "suite now" >::: [test1];;
val test_suite : test =
TestLabel ("suite now",
 TestList
  [TestLabel ("simple test",
    TestCase (Short, <fun>))])

Then, OUnit2.run_test_tt_main tests will run the suite tests
- (Note this will work but will then freeze the top loop - OUnit is designed to run in the command line only)

How the tests run when you say dune test

The above tests.ml file is just defining an executable, like keywordcount.exe on HW4
Build and run the executable to run the tests
Here a dune build file which would work for the simple set tests for example:

(executable
  (name tests)
  (libraries
    ounit2
    simple_set
  ))

; dune rule so command line "dune runtest" (and "dune test") will run tests.
; in other words, there is nothing special about `dune test`, its just a build plus running `_build/default/test/tests.exe`
(rule
 (alias runtest)
 (action (chdir %{project_root}
  (run ./test/tests.exe))))

There is even special shorthand for the above: replace executable with test and it makes an executable plus the above alias to run tests:

(test
  (name tests)
  (libraries
    ounit2
    simple_set
  ))

Higher-order testing

If you did unit testing in other languages it looks pretty much like the above
But in OCaml we can make tests programatically which makes for less code duplication
Example: lets make a bunch of different tests on the same invariant, that reversing a list twice is a no-op:

# let make_rev_test l = ("a rev test" >:: (fun _ -> assert_equal (List.rev @@ List.rev l) l));; 
val make_rev_test : 'a list -> test = <fun>

let make_test_suite l ~f = 
  "suite of tests of f" >::: List.map l ~f ;;
val make_test_suite : 'a list -> f:('a -> test) -> test = <fun>

let s = make_test_suite [[];[1;2;3];[2;44;2];[32;2;3;2;1]] ~f:make_rev_test ;;
let () = run_test_tt_main s;; (* recall this crashes the top loop when finished *)

In general you can build an arbitrarily big tree of tests with suites of suites etc
- As can be seen above, a suite of tests just has type test

let s' = "id tests" >::: 
  [ "one" >:: (fun _ -> assert_equal (Fn.id 4) 4)
  ; "two" >:: (fun _ -> assert_equal (Fn.id "hello") "hello") ];;
let suites = test_list [s;s'];; (* make suite of suites *)
let named_suites = "revrev and Fn.id" >: suites (* any tree of tests can be named with >: *)

Here is the type of test under the hood (from the docs) which should make clear why the above works:

type test =
| TestCase of test_fun
| TestList of test list
| TestLabel of string * test

Tangent: defining infix operators

The OUnit infix operators >:/>::/>::: are just like +, ^ etc
Using them arguably makes the code more readable, so consider defining your own infix operators

# #require "ounit2";;
# open OUnit2;;
# (>::) ;;
- : string -> test_fun -> test = <fun>
# (>:::);;
- : string -> test list -> test = <fun>

There is no magic to this, you can also do it:

# let (^^) x y = x + y;;
val ( ^^ ) : int -> int -> int = <fun>
# 3 ^^ 5;;
- : int = 8

Note that unlike in C++ we are not overloading operators, ^^ only works on two ints now.
The old version of ^^ for printing is now shadowed so is not directly accessible.
So, new infix ops are always defined within a module to avoid overlap
OCaml will eventually have true operator overloading but it is still in the development pipe

The different assert_X statements possible in OUnit2

We used assert_equal above which is the OUnit function to check things being equal
assert_bool which is like the assert OCaml command: assert_bool "name that test" (0=0) for example
If you want to verify some code raises an exception, use assert_raises
To perform acceptance testing (I/O), use assert_command to run a shell command and compare against output
- in the A4 exec_tests.ml code we provided you can see we are using assert_command to test the executable
If you need fixed setup/teardown code bracketing a group of tests to setup e.g. files: bracket_tmpfile

As always, see the documentation for more details:
OUnit2 API docs

Tangent: Testing executables with cram

As mentioned above OUnit can be used to test executables: OUnit2.assert_command can run any shell command (in particular, your OCaml .exe file)
This is probably the approach you should use
dune also contains an extension called cram which allows for output to be compared against expected output for a given input
It makes it really easy to write tests for a command-line executable
It is very general, you just specify the shell command to run and expected output
See the cram docs if you are interested

An example in a file cramtest.t. Non-indented lines are comments, $ is the input and after the input is implicitly the expected output (4 here)

We first create a test artifact called "foo"
  $ cat >foo <<EOF
  > foo
  > bar
  > baz
  > EOF

After creating the fixture, we want to verify that ``wc`` gives us the right result:
  $ wc -l foo | awk '{ print $1 }'
  4

Bisect for OCaml code coverage

The bisect_ppx preprocessor can decorate your code with one hit-bit per line
- it can then show which lines are “hit” upon running your test suite
Add (preprocess (pps bisect_ppx)) to library or executable declaration in dune to decorate
- don’t add to your (test ... ) dune declaration, you want to count lines hit in your code not in your test code!
Then do a dune test which will generate the low-level hit-lines data in a file.
- or dune exe and run your app if you want to see coverage there
Shell command bisect-ppx-report html generates a pretty report showing which lines hit in latest execution
- open _coverage/index.html in your browser to see the report
- if this command is not working make sure you did the opam install bisect-ppx in the course required installs
See Bisect docs for more details
Note that if you have single lines of code that you know should not be run (e.g. invariants that should not fail) you can put [@coverage off] at the end of those lines. To turn coverage off on a single let definition, put [@@coverage off] immediately after the end of definition. To turn coverage off on an arbitrary range of lines in the file, put [@@@coverage off] to turn it off and then [@@@coverage on] to turn it back on. See the docs for details.

We will check how well my tests of the simple set example covered the code using Bisect. The only addition to code is the (preprocess (pps bisect_ppx)) added to src/dune for the library.

Random Testing aka Property-Based Testing aka Quickcheck

Recall that Quickchecking is making up some random data to test our code
To use random testing we need to know when the code is working on that random data
- e.g. add a bunch of random key-value maps to a dictionary and delete them: should be the original dictionary again
- e.g. above of reversing a reversed list gives back the original list

Here is the methodology

Given: a function f that we want to test.
Have a function to generate random data which is the parameters of f
Run f on that random data many times (say 1000 times)
Verify whether the test worked on that random data

Using `Base_quickcheck`

For step 1. above we want to not have to “roll our own” every time
Base_quickcheck is a library that will do it for us
Base_quickcheck contains three key algorithms:
1. Generators, Quickcheck.Generator - make random data of desired distribution in given type
2. Shrinkers, Quickcheck.Shrinker - if a failing case is discovered, try to make random data smaller, e.g. a shorter list (we will not cover these in detail)
3. Runner, Quickcheck.test etc, which runs some fixed number (10,000 by default) of random tests and shrinks failures.
We will look at several examples of the Base_quickcheck library in action in quickcheck_examples.ml
We will look at quickchecking on a Map in school_quickcheck.ml
Base_quickcheck docs
The Real World OCaml book has a short tutorial (note it uses ppx_inline_tests and not OUnit2)

Fuzz testing vs Quickcheck

Fuzz testing aka fuzzing is to acceptance tests as quickcheck is to unit tests
- fuzzers feed random inputs into an app to see what it does (acceptance test modality)
- quickcheckers on the other hand are testing single functions (unit test modality)
Industry fuzz testers do a lot more more than generate totally random data
- They may be aware that the string input should fit a particular grammar, e.g. html
- They may be combined with a coverage tool and work to find random data inputs covering all the code

FPSE