Specification
- Terminology
- Specification: what the program should do
- Implementation: how it does it
- Let us step back and look at the bigger picture of specifying
Correctness
- The high-level goal is to have software that is “correct”
- There are many levels of interpretation for “correct”
- Informal spec.: we had some vague idea of what the code should do, wrote it, iteratively addressed feedback of users until bug reports shrunk to very low levels.
- Semi-formal spec.: Sat down with stakeholders, wrote out a specification document in English with a few pictures/formulas, added many tests to code which affirmed all aspects of this spec.
- Rigorous spec.: Have an unambiguous mathematical notion of what correct behavior should be, write it out as the formal specification, make sure code meets it.
- The best mode from the above depends on the project: how complex are the algorithms/architecture, and how mission-critical is it?
- Need to go further down the above list as complexity/mission-critical aspects increase
Forms of specification
Specifications can range from informal to completely rigorous and unambiguous
- Requirements and Design documents: high-level informal descriptions of what the app should do
- This is the classic form of specification in software engineering; includes description/pictures/etc but far from code
- Types, e.g. writing an
.mli
file before implementing the code- Gives a very rigorous, compiler-checked skeleton for the code; we are doing this for you on the assignments
- Much more precise than the previous in terms of code/spec relationship, but limited expressiveness since types can express only so much
- Tests
- A test suite constitutes a specification on a finite window of behavior - can’t run all infinite cases
- Can be 100% accurate on the cases tested, but could be woefully incomplete
- Coverage measurements can be used to help close the incompleteness gap somewhat (never fully however)
- Full logical assertions specifying code behavior in terms of math
- a precondition is a logical constraint on function arguments (before it is called: “pre”)
- a postcondition is a constraint on the value returned from a function
- an invariant is a property internal to the function or internal to a data structure that always holds (it “invariably” holds)
- Some examples
- precondition on a function that tree parameter is a binary tree (left values < right values)
- precondition on a function that tree parameter is a balanced binary tree
- postcondition that
List.sort
always returns a sorted list - precondition/postcondition on tree
add
function that if the input tree is balanced the output tree will also be. - invariants on data structures such as a
Set
implementation which uses an underlying always-sorted list. - inductive invariants on recursive algorithms, e.g. assuming in the body of list
reverse
that it works on a shorter list.
- Logical assertions are more general than tests since they are for all inputs
- but not necessarily verified (but, invariants do guide tests - write tests to verify on lots of examples)
- Verified assertions aka formal methods
- After making the above logical assertions, verify the code meets the assertions using a verification tool
- Now this is mostly research, but becoming more mainstream
Type-directed programming
Fact: types outline the “shape” of the code you need to write and serve as a “structural” spec.
- You have been doing type-directed programming all along
- We give you the types for most HW questions
- Principle is: writing code that matches declared type will get you well on the way to an implementation
- type errors point to code errors to be fixed
- when the last type error drops, the code may directly work
- Type-directed programming is 100% rigorous, but is incomplete: types only express rough shapes of data
- e.g.
int list
is a rough shape compared to “sortedint list
” but the latter isn’t a type in OCaml
- e.g.
Type-directed programming examples
(These examples are super simple, they only serve to clarify what the term means.)
Example: not bubbling up option
or other wrapped results properly
# let zadd (l1 : int list) (l2 : int list) : (int list) = let l = List.zip l1 l2 in List.map ~f:(fun (x,y) -> x+y) l;;
Line 1, characters 74-75:
Error: This expression has type (int * int) list List.Or_unequal_lengths.t
but an expression was expected of type 'c list
- To solve this type error you will need to
match
on the result, which should fix both type error and behavior
Review example: with partial parameters applied, the remaining types hint at what is still needed.
let l = [[3;3]; [4;4]; [22;17]] in
List.map l;;
- : f:(int list -> '_weak1) -> '_weak1 list = <fun>
- The type shows that
f
needs to be a function taking anint list
as argument.
Review example: the type in an .mli
file can direct your implementation, e.g. map
on dict
example from HW2 (t
here is the simpledict
)
val map : 'a t -> f:(string -> 'a -> 'b) -> 'b t
- Q: “how can I get a dict of
'b
’s built from a dict of'a
’s here - A: “Use the function from
'a
to'b
on elements of'a dict
.
Extension example: add a new field to a record or a new variant case, chase the type errors to patch
type party = Dem | Rep
type voter = { name : string; party: party }
let count_parties (l : voter list) =
List.fold l ~init: (0,0)
~f:(fun (cd,cr) -> fun {party; _} ->
match party with
| Dem -> (cd+1, cr)
| Rep -> (cd, cr+1) );;
Adding a Gre
for green party: first just change the type, and chase errors
type party = Dem | Rep | Gre
type voter = { name : string; party: party }
let count_parties (l : voter list) =
List.fold l ~init: (0,0)
~f:(fun (cd,cr) -> fun {party; _} ->
match party with
| Dem -> (cd+1, cr)
| Rep -> (cd, cr+1) );;
Lines 6-8, characters 5-24:
Warning 8: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched:
Gre
type party = Dem | Rep | Gre
type voter = { name : string; party : party; }
val count_parties : voter list -> int * int = <fun>
- Shows a new
match
case is needed; in process of adding new case it will become clear a triple is also needed - This example also shows why non-exhaustive pattern matches are bad: errors often lurk
Conclusion: Don’t wrestle with OCaml’s types, dance with them
Preconditions, Postconditions, and Data Structure Invariants
- Types are fine for high-level structural information, but cannot express deeper properties.
- “a function only takes non-negative input”
- “a function returns a sorted list”
- etc
- Preconditions, postconditions, and invariants allow properties beyond types to be expressed
Let us consider some preconditions and postconditions on the Simple_set.Make
functor example (click for zipfile):
module Make (M: Eq) =
struct
open Core
type t = M.t list
let emptyset : t = []
let add (x : M.t) (s : t) = (x :: s)
let rec remove (x : M.t) (s: t) =
match s with
| [] -> failwith "item is not in set"
| hd :: tl ->
if M.equal hd x then tl
else hd :: remove x tl
let rec contains (x: M.t) (s: t) =
match s with
| [] -> false
| hd :: tl ->
if M.equal x hd then true else contains x tl
end
- Precondition on
remove
:s
is not empty (it would always fail otherwise) - Stronger precondition on
remove
:contains x s
must hold - Postcondition on
remove
for it returning sets'
:not(contains x s')
- ??- No, this simple “set” data structure is in fact a multiset and this will not always hold
- If we were trying to implement an actual set this postcondition would let us catch an error
- Postcondition on
add x s
: for the resulting sets'
,contains x s'
holds
Assertions in code
- OCaml
assert
can be placed in code to directly verify properties- program dies if the assertion fails, it should always hold
- silently returns
()
if it succeeds
- Example new version of
add
above:let add (x : M.t) (s : t) = let s' = (x :: s) in assert (contains x s'); s'
- Asserts are handy for development mode, but not after deployment (slows things down)
- Generally it is better to make tests to spot-check assertions instead of using
assert
- But, one thing handy about
assert
is it puts the assertions in the code, as “rigorous comments” - Middle approach:
ppx_inline_tests
is a library where you can write tests in-line with your code - example:
let%test "add adds" = contains (add (add 5 emptyset) 22) 22
- inline tests both document invariants and serve as tests: two-for-one!
- They also allow functions and data structures hidden in a module to be tested
- one issue with OCaml’s modules is the tester module needs to see the local functions in the library module so they need to be made non-local for that - oops!
- But, one thing handy about
Data structure invariants
- It is often the case that there are additional restrictions on values allowed in a data structure type
- for example the “ordered tree” and “balanced tree” examples mentioned above
- Example from
simpledict
on homework:is_ordered
must hold for theSimpletree.t
. - Such data structure invariants should be made clear in the code documentation
Recursion Invariants
- Recursion and other loops (e.g. in
fold
) is a prime place to assert invariants - (Even if you don’t write them out, thinking of the invariants are critical to coding recursive programs)
- A standard invariant for recursive functions is that the recursive calls return what the calling function expected
let rec rev l =
match l with
| [] -> []
| x::xs -> let rxs = rev xs in assert(Poly.(List.rev xs = rxs)); rxs @ [x]
- Note that we have to use the built-in
List.rev
to test our version - somewhat circular (notePoly
quick-open lets=
work on any type) - In general a big issue with specification is it is often very difficult to give a code-based definition of the full spec.
- So, the main focus should be on partial specs, give sanity conditions
Invariants over folds as examples
- In re-implementing some of the common
List
functions withfold
s it helps to think of the invariant - Folding left (
List.fold
):- Suppose we are at some arbitrary point processing the fold;
- assume accumulation
accum
has “the result of the task” for all elements to the left in the list - require
~f
to then “do the task” to incorporate the current elementelt
- also assume
accum
is initiallyinit
- Folding right: almost identical; “for all elements to the right in the list”, not “left”
(* invariant for length: accum is length of list up to here *)
let length l = List.fold ~init:0 ~f:(fun accum _ -> accum + 1) l
(* invariant for rev: accum is reverse of list up to here *)
let rev l = List.fold ~init:[] ~f:(fun accum elt -> elt::accum) l
(* invariant for map: accum is f applied to each element of list up to here *)
let map ~f l = List.fold_right ~init:[] ~f:(fun elt accum -> (f elt)::accum) l
(* etc *)
let filter ~f l = List.fold_right ~init:[] ~f:(fun elt accum -> if f elt then elt::accum else accum) l
Formal Verification
- Formal verification is proving the invariants hold (e.g. that
rev
really reverses the list) - It is currently of only limited applicability in mainstream SE but is becoming more common as tools improve
- A simple view of what it is is the preconditions/postconditions/invariants/
asserts
above will be verified to always hold by a computer program.- Like how a compiler verifies type declarations but on a much grander scale.
- End goal is to do this over a full spec. but verification of partial spec is also good (e.g. dictionary is a balanced binary tree)
Specification and Abstraction
- The more completely a module is specified the less the users need to know about the underlying implementation
Core.Map
is an example where the users need to know almost nothing about the implementation- Note that documentation of the specification in the interface is important; sometimes
Core
is weak there - On your own libraries you can do the same
- it will make it a lot easier for your users, they can just think about the spec. view.
Testing
- Testing wears two very different but very useful hats
- Specification-based: use tests to define and refine what the code should do
- Implementation-based: find bugs in code, for example when you change code make sure you didn’t break it.
- Both hats are important
- Writing tests before fully coding the answer lets the tests serve as your “coding spec”
- Adding tests for corner cases will flesh out the spec
- Adding tests covering past bugs will make sure they are caught quickly next time
- Equivalent terminology you may see: black-box (spec) and glass-box (code-based) testing
- Black-box tests are those written against the spec
- Glass-box tests are in the context of bugs in the code and other code properties
Standard categories of tests
- Unit testing: what you have mainly done – test the small pieces of the app; no I/O testing
- Acceptance testing: test the bigger pieces including I/O
- For example testing your
keywordcount.exe
on a certain fixed directory tree.
- For example testing your
- Random testing of which there are many types: fuzz testing / monkey testing / property-based testing / quickcheck:
- the tests are run on data generated randomly from some distribution
- “quickcheck”ing aka property-based testing is running unit tests on randomly generated data (random lists of ints, etc)
- “fuzz testing” is running acceptance tests with random input strings supplied.
Testing and coverage
- Code coverage is a great glass-box (impl-based) metric of how good your test suite is
- The simple idea of coverage: are there lines of your code that never get exercised by any of your tests?
- Coverage tools let you easily check this.
- We will show how the Bisect coverage tool can be used below
OUnit2
- We have been using the
OUnit2
library mostly as a black box up to now - Now we will go through the details, which are in fact very simple
- There is not much in
OUnit2
per se, if you want something extra just write some higher-order functions to do it
- There is not much in
- To review, here is your standard simple
tests.ml
file, this one is from the simple-set example:open OUnit2 (* we usually open OUnit2 since it is pervasively used in test files *) open Simple_set let tests = "test suite for set" >::: [ "empty" >:: (fun _ -> assert_equal (emptyset) (emptyset)); "3-elt" >:: (fun _ -> assert_equal true (contains 5 (add 5 emptyset) (=))); "1-elt nested" >:: (fun _ -> assert_equal false (contains 5 (remove 5 (add 5 emptyset) (=))(=))); ] let () = run_test_tt_main tests
OUnit2.assert_equal
is just theOUnit2
version ofassert
, it usesPoly.(=)
for simplicity (but be careful)- The infix
>::
operator takes a string (test name) and a piece of test code underfun _ ->
(to keep it from running right away) and builds a single test of typetest
(type#require "ounit2"
andopen OUnit2
to the top-loop before playing with this code there):# let test1 = "simple test" >:: fun _ -> assert_equal (2 :: []) [2];; val test1 : test = TestLabel ("simple test", TestCase (Short, <fun>))
- The
>:::
operator simply takes atest list
and builds a test suite (which in fact is just of typetest
)# let test_suite = "suite now" >::: [test1];; val test_suite : test = TestLabel ("suite now", TestList [TestLabel ("simple test", TestCase (Short, <fun>))])
- Then,
OUnit2.run_test_tt_main tests
will run the suitetests
- (note this will work but will then freeze the top loop unfortunately)
How the tests run when you say dune test
- The above
tests.ml
file is just defining an executable, likekeywordcount.exe
on HW4 - Build and run the executable to run the tests
- Here a dune build file which would work for the simple set tests for example:
(executable
(name tests)
(libraries
ounit2
simple_set
))
; dune rule so command line "dune runtest" (and "dune test") will run tests.
; in other words, there is nothing special about `dune test`, its just a build plus running `_build/default/test/tests.exe`
(rule
(alias runtest)
(action (chdir %{project_root}
(run ./test/tests.exe))))
- There is even special shorthand for the above: replace
executable
withtest
and it makes an executable plus the above alias to run tests:
(test
(name tests)
(libraries
ounit2
simple_set
))
Higher-order testing
- If you did unit testing in other languages it looks pretty much like the above
- But in OCaml we can make tests programatically which makes for less code duplication
- Example: lets make a bunch of different tests on the same invariant, that reversing a list twice is a no-op:
# let make_rev_test l = ("test test" >:: (fun _ -> assert_equal(List.rev @@ List.rev l) l));;
val make_rev_test : 'a list -> test = <fun>
let make_rev_suite ll =
"suite of rev rev tests" >::: List.map ll ~f:(fun l -> make_rev_test l);;
val make_rev_suite : 'a list list -> test = <fun>
let s = make_rev_suite [[];[1;2;3];[2;44;2];[32;2;3;2;1]];;
let () = run_test_tt_main s;; (* recall this crashes the top loop when finished *)
- In general you can build an arbitrarily big tree of tests with suites of suites etc
- As can be seen above, a suite of tests just has type
test
- As can be seen above, a suite of tests just has type
let s' = "id tests" >:::
["one" >:: (fun _ -> assert_equal (Fn.id 4) 4) ;
"two" >:: (fun _ -> assert_equal (Fn.id "hello") "hello")];;
let suites = test_list [s;s'];; (* make suite of suites *)
let named_suites = "revrev and Fn.id" >: suites (* any tree of tests can be named with >: *)
Here is the type of test
under the hood (from the docs) which should make clear why the above works:
type test =
| TestCase of test_fun
| TestList of test list
| TestLabel of string * test
Tangent: defining infix operators
- The OUnit infix operators
>:
/>::
/>:::
are just like+
,^
etc - Using them arguably makes the code more readable, so consider defining your own infix operators
# #require "ounit2";;
# open OUnit2;;
# (>::) ;;
- : string -> test_fun -> test = <fun>
# (>:::);;
- : string -> test list -> test = <fun>
- There is no magic to this, you can also do it:
utop # let (^^) x y = x + y;;
val ( ^^ ) : int -> int -> int = <fun>
utop # 3 ^^ 5;;
- : int = 8
- Note that unlike in C++ we are not overloading operators,
^^
only works on two ints now. - The old version of
^^
for printing is now shadowed so is not directly accessible. - So, new infix ops are always defined within a module to avoid overlap
- OCaml will eventually have true operator overloading but it is still in the development pipe
The different assert_X statements possible in OUnit2
- We used
assert_equal
above which is the OUnit function to check things being equal assert_bool
which is like theassert
OCaml command:assert_bool "name that test" (0=0)
for example- If you want to verify some code raises an exception, use
assert_raises
- To perform acceptance testing (I/O), use
assert_command
to run a shell command and compare against output- in the A4
exec_tests.ml
code we provided you can see we are usingassert_command
to test the executable
- in the A4
- If you need fixed setup/teardown code bracketing a group of tests to setup e.g. files:
bracket_tmpfile
As always, see the documentation for more details:
OUnit2 API docs
Testing executables with cram
- As mentioned above
OUnit
can be used to test executables:OUnit2.assert_command
can run any shell command (in particular, your OCaml.exe
file) dune
also contains an extension calledcram
which allows for output to be compared against expected output for a given input- It is very general, you just specify the shell command to run and expected output
- See the
cram
docs if you are interested - An example in a file
cramtest.t
. Non-indented lines are comments, $ is the input and after the input is implicitly the expected output (4 here)We first create a test artifact called "foo" $ cat >foo <<EOF > foo > bar > baz > EOF After creating the fixture, we want to verify that ``wc`` gives us the right result: $ wc -l foo | awk '{ print $1 }' 4
Bisect for OCaml code coverage
- The
bisect_ppx
preprocessor can decorate your code with one hit-bit per line- it can then show which lines are “hit” upon running your test suite
- Add
(preprocess (pps bisect_ppx))
to library or executable declaration indune
to decorate- don’t add to your
(test ... )
dune declaration, you want to count lines hit in your code not in your test code!
- don’t add to your
- Then do a
dune test
which will generate the low-level hit-lines data in a file.- or
dune exe
and run your app if you want to see coverage there
- or
- Shell command
bisect-ppx-report html
generates a pretty report showing which lines hit in latest execution- open
_coverage/index.html
in your browser to see the report - if this command is not working make sure you did the
opam install bisect-ppx
in the course required installs
- open
- See Bisect docs for more details
- Note that if you have single lines of code that you know should not be run (e.g. invariants that should not fail) you can put
[@coverage off]
at the end of those lines. To turn coverage off on a singlelet
definition, put[@@coverage off]
immediately after the end of definition. To turn coverage off on an arbitrary range of lines in the file, put[@@@coverage off]
to turn it off and then[@@@coverage on]
to turn it back on. See the docs for details.
We will check how well my tests of the simple set example covered the code using Bisect. The only addition to code is the (preprocess (pps bisect_ppx))
added to src/dune
for the library.
Base_quickcheck and Random aka Property-Based Testing
The big picture of random testing
- Suppose we have one function
f
that we want to test. - We need to be able to generate random data which is the parameters of
f
- We run
f
on different random data many times (say 100 or 10000 times) - We need to know if the test worked or not on the random data
- So, tests usually verify that invariants hold or some bad exception not raised etc.
- This is another reason invariants are good, they are properties that can be quickchecked
Using
Base_quickcheck
Base_quickcheck
contains three key algorithms:- Generators,
Quickcheck.Generator
- make random data of desired distribution in given type - Shrinkers,
Quickcheck.Shrinker
- if a failing case is discovered, try to make it smaller (we will not cover these in detail) - Runner,
Quickcheck.test
etc, which runs some fixed number (10,000 by default) of random tests and shrinks failures.
- Generators,
-
We will look at several examples of the
Base_quickcheck
library in action in quickcheck_examples.ml - Base_quickcheck docs
- The Real World OCaml book has a short tutorial (note it uses
ppx_inline_tests
and notOUnit2
)
Fuzz testing vs Quickcheck
- Fuzz testing aka fuzzing is to acceptance tests as quickcheck is to unit tests
- fuzzers feed random inputs into an app to see what it does (acceptance test modality)
- quickcheckers on the other hand are testing single functions (unit test modailty)
- Industry fuzz testers do a lot more more than generate totally random data
- They may be aware that the string input should fit a particular grammar, e.g. html
- They may be combined with a coverage tool and work to find random data inputs covering all the code