cprover
CBMC User Manual
Author
Daniel Kroening

This tutorial is intended for users of CProver primarily focused on CBMC.

Introduction

Motivation

Numerous tools to hunt down functional design flaws in silicon have been available for many years, mainly due to the enormous cost of hardware bugs. The use of such tools is wide-spread. In contrast, the market for tools that address the need for quality software is still in its infancy.

Research in software quality has an enormous breadth. We focus the presentation using two criteria:

  1. We believe that any form of quality requires a specific guarantee, in theory and practice.
  2. The sheer size of software designs requires techniques that are highly automated.

In practice, quality guarantees usually do not refer to "total correctness" of a design, as ensuring the absence of all bugs is too expensive for most applications. In contrast, a guarantee of the absence of specific flaws is achievable, and is a good metric of quality.

Bounded Model Checking with CBMC

CBMC implements a technique called Bounded Model Checking (BMC). In BMC, the transition relation for a complex state machine and its specification are jointly unwound to obtain a Boolean formula, which is then checked for satisfiability by using an efficient SAT procedure. If the formula is satisfiable, a counterexample is extracted from the output of the SAT procedure. If the formula is not satisfiable, the program can be unwound more to determine if a longer counterexample exists.

In many engineering domains, real-time guarantees are a strict requirement. An example is software embedded in automotive controllers. As a consequence, the loop constructs in these types of programs often have a strict bound on the number of iterations. CBMC is able to formally verify such bounds by means of unwinding assertions. Once this bound is established, CBMC is able to prove the absence of errors.

A more detailed description of how to apply CBMC to verify programs is here.

Example: Buffer Overflows

In order to give a brief overview of the capabilities of CBMC we start with a small example. The issue of buffer overflows has obtained wide public attention. A buffer is a contiguously-allocated chunk of memory, represented by an array or a pointer in C. Programs written in C do not provide automatic bounds checking on the buffer, which means a program can – accidentally or maliciously – write past a buffer. The following example is a perfectly valid C program (in the sense that a compiler compiles it without any errors):

int main()
{
int buffer[10];
buffer[20] = 10;
}

However, the write access to an address outside the allocated memory region can lead to unexpected behavior. In particular, such bugs can be exploited to overwrite the return address of a function, thus enabling the execution of arbitrary user-induced code. CBMC is able to detect this problem and reports that the "upper bound property" of the buffer is violated. CBMC is capable of checking these lower and upper bounds, even for arrays with dynamic size. A detailed discussion of the properties that CBMC can check automatically is here.

Hardware/Software Co-Verification

Software programs often interact with hardware in a non-trivial manner, and many properties of the overall design only arise from the interplay of both components. CBMC and SATABS therefore support Co-Verification, i.e., are able to reason about a C/C++ program together with a circuit description given in Verilog.

These co-verification capabilities can also be applied to perform refinement proofs. Software programs are often used as high-level descriptions of circuitry. While both describe the same functionality, the hardware implementation usually contains more detail. It is highly desirable to establish some form for equivalence between the two descriptions. Hardware/Software co-verification and equivalence checking with CBMC is described here.

Installation

Installing CBMC

Requirements

CBMC is available for Windows, i86 Linux, and MacOS X. CBMC requires a code pre-processing environment comprising of a suitable preprocessor and an a set of header files.

  1. Linux: the preprocessor and the header files typically come with a package called gcc, which must be installed prior to the installation of CBMC.
  2. Windows: The Windows version of CBMC requires the preprocessor cl.exe, which is part of Microsoft Visual Studio. We recommend the free Visual Studio Community 2013.
  3. MacOS: Install the XCode Command Line Utilities prior to installing CBMC. Just installing XCode alone is not enough.

Important note for Windows users: Visual Studio's cl.exe relies on a complex set of environment variables to identify the target architecture and the directories that contain the header files. You must run CBMC from within the Visual Studio Command Prompt.

Note that the distribution files for the Eclipse plugin include the CBMC executable. Therefore, if you intend to run CBMC exclusively within Eclipse, you can skip the installation of the CBMC executable. However, you still have to install the compiler environment as described above.

Installing the CBMC Binaries

  1. Download CBMC for your operating system. The binaries are available from http://www.cprover.org/cbmc/.
  2. Unzip/untar the archive into a directory of your choice. We recommend to add this directory to your PATH environment variable.

You are now ready to use CBMC!

Building CBMC from Source

See Compilation and Development

Installing the Eclipse Plugin

Requirements

We provide a graphical user interface to CBMC which is realized as a plugin to the Eclipse framework. Eclipse is available at http://www.eclipse.org. We do not provide installation instructions for Eclipse (basically, you only have to download the current version and extract the files to your hard-disk) and assume that you have already installed the current version.

Important note for Windows users: Visual Studio's cl.exe relies on a complex set of environment variables to identify the target architecture and the directories that contain the header files. You must run Eclipse from within the Visual Studio Command Prompt.

Installing the Eclipse Plugin

The installation instructions for the Eclipse Plugin, including the link to the download site, are available here. This includes a small tutorial on how to use the Eclipse plugin.

CBMC: Bounded Model Checking for C, C++ and Java

A Short Tutorial

First Steps

We assume you have already installed CBMC and the necessary support files on your system. If not so, please follow the instructions here.

Like a compiler, CBMC takes the names of .c files as command line arguments. CBMC then translates the program and merges the function definitions from the various .c files, just like a linker. But instead of producing a binary for execution, CBMC performs symbolic simulation on the program.

As an example, consider the following simple program, named file1.c:

int puts(const char *s) { }

int main(int argc, char **argv) {
  puts(argv[2]);
}

Of course, this program is faulty, as the argv array might have fewer than three elements, and then the array access argv[2] is out of bounds. Now, run CBMC as follows:

cbmc file1.c --show-properties --bounds-check --pointer-check

The two options --bounds-check and --pointer-check instruct CBMC to look for errors related to pointers and array bounds. CBMC will print the list of properties it checks. Note that it lists, among others, a property labelled with "object bounds in argv" together with the location of the faulty array access. As you can see, CBMC largely determines the property it needs to check itself. This is realized by means of a preliminary static analysis, which relies on computing a fixed point on various abstract domains. More detail on automatically generated properties is provided here.

Note that these automatically generated properties need not necessarily correspond to bugs – these are just potential flaws, as abstract interpretation might be imprecise. Whether these properties hold or correspond to actual bugs needs to be determined by further analysis.

CBMC performs this analysis using symbolic simulation, which corresponds to a translation of the program into a formula. The formula is then combined with the property. Let's look at the formula that is generated by CBMC's symbolic simulation:

cbmc file1.c --show-vcc --bounds-check --pointer-check

With this option, CBMC performs the symbolic simulation and prints the verification conditions on the screen. A verification condition needs to be proven to be valid by a decision procedure in order to assert that the corresponding property holds. Let's run the decision procedure:

cbmc file1.c --bounds-check --pointer-check

CBMC transforms the equation you have seen before into CNF and passes it to a SAT solver (more background on this step is in the book on Decision Procedures). It then determines which of the properties that it has generated for the program hold and which do not. Using the SAT solver, CBMC detects that the property for the object bounds of argv does not hold, and will thus print a line as follows:

[main.pointer_dereference.6] dereference failure: object bounds in argv[(signed long int)2]: FAILURE

Counterexample Traces

Let us have a closer look at this property and why it fails. To aid the understanding of the problem, CBMC can generate a counterexample trace for failed properties. To obtain this trace, run

cbmc file1.c --bounds-check --trace

CBMC then prints a counterexample trace, i.e., a program trace that begins with main and ends in a state which violates the property. In our example, the program trace ends in the faulty array access. It also gives the values the input variables must have for the bug to occur. In this example, argc must be one to trigger the out-of-bounds array access. If you add a branch to the example that requires that argc>=3, the bug is fixed and CBMC will report that the proofs of all properties have been successful.

Verifying Modules

In the example above, we used a program that starts with a main function. However, CBMC is aimed at embedded software, and these kinds of programs usually have different entry points. Furthermore, CBMC is also useful for verifying program modules. Consider the following example, called file2.c:

int array[10];
int sum() {
  unsigned i, sum;

  sum=0;
  for(i=0; i<10; i++)
    sum+=array[i];
}

In order to set the entry point to the sum function, run

cbmc file2.c --function sum --bounds-check

It is often necessary to build a suitable harness for the function in order to set up the environment appropriately.

Loop Unwinding

When running the previous example, you will have noted that CBMC unwinds the for loop in the program. As CBMC performs Bounded Model Checking, all loops have to have a finite upper run-time bound in order to guarantee that all bugs are found. CBMC can optionally check that enough unwinding is performed. As an example, consider the program binsearch.c:

int binsearch(int x) {
  int a[16];
  signed low=0, high=16;

  while(low<high) {
    signed middle=low+((high-low)>>1);

    if(a[middle]<x)
      high=middle;
    else if(a[middle]>x)
      low=middle+1;
    else // a[middle]==x
      return middle;
  }

  return -1;
}

If you run CBMC on this function, you will notice that the unwinding does not stop on its own. The built-in simplifier is not able to determine a run time bound for this loop. The unwinding bound has to be given as a command line argument:

cbmc binsearch.c --function binsearch --unwind 6 --bounds-check --unwinding-assertions

CBMC verifies that verifies the array accesses are within the bounds; note that this actually depends on the result of the right shift. In addition, as CBMC is given the option --unwinding-assertions, it also checks that enough unwinding is done, i.e., it proves a run-time bound. For any lower unwinding bound, there are traces that require more loop iterations. Thus, CBMC will report that the unwinding assertion has failed. As usual, a counterexample trace that documents this can be obtained with the option --property.

Unbounded Loops

CBMC can also be used for programs with unbounded loops. In this case, CBMC is used for bug hunting only; CBMC does not attempt to find all bugs. The following program (lock-example.c) is an example of a program with a user-specified property:

_Bool nondet_bool();
_Bool LOCK = 0;
_Bool lock() {
if(nondet_bool()) {
assert(!LOCK);
LOCK=1;
return 1; }
return 0;
}
void unlock() {
assert(LOCK);
LOCK=0;
}
int main() {
unsigned got_lock = 0;
int times;
while(times > 0) {
if(lock()) {
got_lock++;
/* critical section */
}
if(got_lock!=0)
unlock();
got_lock--;
times--;
}
}

The while loop in the main function has no (useful) run-time bound. Thus, a bound has to be set on the amount of unwinding that CBMC performs. There are two ways to do so:

  1. The --unwind command-line parameter can to be used to limit the number of times loops are unwound.
  2. The --depth command-line parameter can be used to limit the number of program steps to be processed.

Given the option --unwinding-assertions, CBMC checks whether the argument to --unwind is large enough to cover all program paths. If the argument is too small, CBMC will detect that not enough unwinding is done reports that an unwinding assertion has failed.

Reconsider the example. For a loop unwinding bound of one, no bug is found. But already for a bound of two, CBMC detects a trace that violates an assertion. Without unwinding assertions, or when using the --depth command line switch, CBMC does not prove the program correct, but it can be helpful to find program bugs. The various command line options that CBMC offers for loop unwinding are described in the section on understanding loop unwinding.

A Note About Compilers and the ANSI-C Library

Most C programs make use of functions provided by a library; instances are functions from the standard ANSI-C library such as malloc or printf. The verification of programs that use such functions has two requirements:

  1. Appropriate header files have to be provided. These header files contain declarations of the functions that are to be used.
  2. Appropriate definitions have to be provided.

Most C compilers come with header files for the ANSI-C library functions. We briefly discuss how to obtain/install these library files.

Linux

Linux systems that are able to compile software are usually equipped with the appropriate header files. Consult the documentation of your distribution on how to install the compiler and the header files. First try to compile some significant program before attempting to verify it.

Windows

On Microsoft Windows, CBMC is pre-configured to use the compiler that is part of Microsoft's Visual Studio. Microsoft's Visual Studio Community is fully featured and available for download for free from the Microsoft webpage. Visual Studio installs the usual set of header files together with the compiler. However, the Visual Studio compiler requires a large set of environment variables to function correctly. It is therefore required to run CBMC from the Visual Studio Command Prompt, which can be found in the menu Visual Studio Tools.

Note that in both cases, only header files are available. CBMC only comes with a small set of definitions, which includes functions such as malloc. Detailed information about the built-in definitions is here.

Command Line Interface

This section describes the command line interface of CBMC. Like a C compiler, CBMC takes the names of the .c source files as arguments. Additional options allow to customize the behavior of CBMC. Use cbmc --help to get a full list of the available options.

Structured output can be obtained from CBMC using the option --xml-ui. Any output from CBMC (e.g., counterexamples) will then use an XML representation.

Further Reading

We also have a list of interesting applications of CBMC.

Understanding Loop Unwinding

Iteration-based Unwinding

The basic idea of CBMC is to model the computation of the programs up to a particular depth. Technically, this is achieved by a process that essentially amounts to unwinding loops. This concept is best illustrated with a generic example:

int main(int argc, char **argv) {
while(cond) {
BODY CODE
}
}

A BMC instance that will find bugs with up to five iterations of the loop would contain five copies of the loop body, and essentially corresponds to checking the following loop-free program:

int main(int argc, char **argv) {
if(cond) {
BODY CODE COPY 1
if(cond) {
BODY CODE COPY 2
if(cond) {
BODY CODE COPY 3
if(cond) {
BODY CODE COPY 4
if(cond) {
BODY CODE COPY 5
}
}
}
}
}
}

Note the use of the if statement to prevent the execution of the loop body in the case that the loop ends before five iterations are executed. The construction above is meant to produce a program that is trace equivalent with the original programs for those traces that contain up to five iterations of the loop.

In many cases, CBMC is able to automatically determine an upper bound on the number of loop iterations. This may even work when the number of loop unwindings is not constant. Consider the following example:

_Bool f();
int main() {
for(int i=0; i<100; i++) {
if(f()) break;
}
assert(0);
}

The loop in the program above has an obvious upper bound on the number of iterations, but note that the loop may abort prematurely depending on the value that is returned by f(). CBMC is nevertheless able to automatically unwind the loop to completion.

This automatic detection of the unwinding bound may fail if the number of loop iterations is highly data-dependent. Furthermore, the number of iterations that are executed by any given loop may be too large or may simply be unbounded. For this case, CBMC offers the command-line option --unwind B, where B denotes a number that corresponds to the maximal number of loop unwindings CBMC performs on any loop.

Note that the number of unwindings is measured by counting the number of backjumps. In the example above, note that the condition i<100 is in fact evaluated 101 times before the loop terminates. Thus, the loop requires a limit of 101, and not 100.

Setting Separate Unwinding Limits

The setting given with --unwind is used globally, that is, for all loops in the program. In order to set individual limits for the loops, first use

--show-loops

to obtain a list of all loops in the program. Then identify the loops you need to set a separate bound for, and note their loop ID. Then use

--unwindset L:B,L:B,...

where L denotes a loop ID and B denotes the bound for that loop.

As an example, consider a program with two loops in the function main:

--unwindset c::main.0:10,c::main.1:20

This sets a bound of 10 for the first loop, and a bound of 20 for the second loop.

What if the number of unwindings specified is too small? In this case, bugs that require paths that are deeper may be missed. In order to address this problem, CBMC can optionally insert checks that the given unwinding bound is actually sufficiently large. These checks are called unwinding assertions, and are enabled with the option --unwinding-assertions. Continuing the generic example above, this unwinding assertion for a bound of five corresponds to checking the following loop-free program:

int main(int argc, char **argv) {
if(cond) {
BODY CODE COPY 1
if(cond) {
BODY CODE COPY 2
if(cond) {
BODY CODE COPY 3
if(cond) {
BODY CODE COPY 4
if(cond) {
BODY CODE COPY 5
assert(!cond);
}
}
}
}
}
}

The unwinding assertions can be verified just like any other generated assertion. If all of them are proven to hold, the given loop bounds are sufficient for the program. This establishes a high-level worst-case execution time (WCET).

In some cases, it is desirable to cut off very deep loops in favor of code that follows the loop. As an example, consider the following program:

int main() {
for(int i=0; i<10000; i++) {
BODY CODE
}
assert(0);
}

In the example above, small values of --unwind will prevent that the assertion is reached. If the code in the loop is considered irrelevant to the later assertion, use the option

--partial-loops

This option will allow paths that execute loops only partially, enabling a counterexample for the assertion above even for small unwinding bounds. The disadvantage of using this option is that the resulting path may be spurious, i.e., may not exist in the original program.

Depth-based Unwinding

The loop-based unwinding bound is not always appropriate. In particular, it is often difficult to control the size of the generated formula when using the --unwind option. The option

--depth nr

specifies an unwinding bound in terms of the number of instructions that are executed on a given path, irrespectively of the number of loop iterations. Note that CBMC uses the number of instructions in the control-flow graph as the criterion, not the number of instructions in the source code.

COVER: Test Suite Generation with CBMC

A Small Tutorial with A Case Study

We assume that CBMC is installed on your system. If not so, follow these instructions.

CBMC can be used to automatically generate test cases that satisfy a certain code coverage criteria. Common coverage criteria include branch coverage, condition coverage and Modified Condition/Decision Coverage (MC/DC). Among others, MC/DC is required by several avionics software development guidelines to ensure adequate testing of safety critical software. Briefly, in order to satisfy MC/DC, for every conditional statement containing boolean decisions, each Boolean variable should be evaluated one time to "true" and one time to "false", in a way that affects the outcome of the decision.

In the following, we are going to demonstrate how to apply the test suite generation functionality in CBMC, by means of a case study. The following program is an excerpt from a real-time embedded benchmark PapaBench, and implements part of a fly-by-wire autopilot for an Unmanned Aerial Vehicle (UAV). It is adjusted mildly for our purposes.

The aim of function climb_pid_run is to control the vertical climb of the UAV. Details on the theory behind this operation are documented in the wiki for the Paparazzi UAV project. The behaviour of this simple controller, supposing that the desired speed is 0.5 meters per second, is plotted in the Figure below.

pid.png
The pid controller
01: // CONSTANTS:
02: #define MAX_CLIMB_SUM_ERR 10
03: #define MAX_CLIMB 1
04:
05: #define CLOCK 16
06: #define MAX_PPRZ (CLOCK*600)
07:
08: #define CLIMB_LEVEL_GAZ 0.31
09: #define CLIMB_GAZ_OF_CLIMB 0.75
10: #define CLIMB_PITCH_OF_VZ_PGAIN 0.05
11: #define CLIMB_PGAIN -0.03
12: #define CLIMB_IGAIN 0.1
13:
14: const float pitch_of_vz_pgain=CLIMB_PITCH_OF_VZ_PGAIN;
15: const float climb_pgain=CLIMB_PGAIN;
16: const float climb_igain=CLIMB_IGAIN;
17: const float nav_pitch=0;
18:
19: /** PID function INPUTS */
20: // The user input: target speed in vertical direction
21: float desired_climb;
22: // Vertical speed of the UAV detected by GPS sensor
23: float estimator_z_dot;
24:
25: /** PID function OUTPUTS */
26: float desired_gaz;
27: float desired_pitch;
28:
29: /** The state variable: accumulated error in the control */
30: float climb_sum_err=0;
31:
32: /** Computes desired_gaz and desired_pitch */
33: void climb_pid_run()
34: {
35:
36: float err=estimator_z_dot-desired_climb;
37:
38: float fgaz=climb_pgain*(err+climb_igain*climb_sum_err)+CLIMB_LEVEL_GAZ+CLIMB_GAZ_OF_CLIMB*desired_climb;
39:
40: float pprz=fgaz*MAX_PPRZ;
41: desired_gaz=((pprz>=0 && pprz<=MAX_PPRZ) ? pprz : (pprz>MAX_PPRZ ? MAX_PPRZ : 0));
42:
43: /** pitch offset for climb */
44: float pitch_of_vz=(desired_climb>0) ? desired_climb*pitch_of_vz_pgain : 0;
45: desired_pitch=nav_pitch+pitch_of_vz;
46:
47: climb_sum_err=err+climb_sum_err;
48: if (climb_sum_err>MAX_CLIMB_SUM_ERR) climb_sum_err=MAX_CLIMB_SUM_ERR;
49: if (climb_sum_err<-MAX_CLIMB_SUM_ERR) climb_sum_err=-MAX_CLIMB_SUM_ERR;
50:
51: }
52:
53: int main()
54: {
55:
56: while(1)
57: {
58: /** Non-deterministic input values */
59: desired_climb=nondet_float();
60: estimator_z_dot=nondet_float();
61:
62: /** Range of input values */
63: __CPROVER_assume(desired_climb>=-MAX_CLIMB && desired_climb<=MAX_CLIMB);
64: __CPROVER_assume(estimator_z_dot>=-MAX_CLIMB && estimator_z_dot<=MAX_CLIMB);
65:
66: __CPROVER_input("desired_climb", desired_climb);
67: __CPROVER_input("estimator_z_dot", estimator_z_dot);
68:
69: climb_pid_run();
70:
71: __CPROVER_output("desired_gaz", desired_gaz);
72: __CPROVER_output("desired_pitch", desired_pitch);
73:
74: }
75:
76: return 0;
77: }

In order to test the PID controller, we construct a main control loop, which repeatedly invokes the function climb_pid_run (line 69). This PID function has two input variables: the desired speed desired_climb and the estimated speed estimated_z_dot. In the beginning of each loop iteration, values of the inputs are assigned non-deterministically. Subsequently, the __CPROVER_assume statement in lines 63 and 64 guarantees that both values are bounded within a valid range. The __CPROVER_input and __CPROVER_output will help clarify the inputs and outputs of interest for generating test suites.

To demonstrate the automatic test suite generation in CBMC, we call the following command and we are going to explain the command line options one by one.

cbmc pid.c --cover mcdc --unwind 6 --xml-ui

The option --cover mcdc specifies the code coverage criterion. There are four conditional statements in the PID function: in line 41, line 44, line 48 and line 49. To satisfy MC/DC, the test suite has to meet multiple requirements. For instance, each conditional statement needs to evaluate to true and false. Consider the condition "pprz>=0 && pprz<=MAX_PPRZ" in line 41. CBMC instruments three coverage goals to control the respective evaluated results of "pprz>=0" and "pprz<=MAX_PPRZ". We list them in below and they satisfy the MC/DC rules. Note that MAX_PPRZ is defined as 16 * 600 in line 06 of the program.

!(pprz >= (float)0) && pprz <= (float)(16 * 600)  id="climb_pid_run.coverage.1"
pprz >= (float)0 && !(pprz <= (float)(16 * 600))  id="climb_pid_run.coverage.2"
pprz >= (float)0 && pprz <= (float)(16 * 600)     id="climb_pid_run.coverage.3"

The "id" of each coverage goal is automatically assigned by CBMC. For every coverage goal, a test suite (if there exists) that satisfies such a goal is printed out in XML format, as the parameter --xml-ui is given. Multiple coverage goals can share a test suite, when the corresponding execution of the program satisfies all these goals at the same time.

In the end, the following test suites are automatically generated for testing the PID controller. A test suite consists of a sequence of input parameters that are passed to the PID function climb_pid_run at each loop iteration. For example, Test 1 covers the MC/DC goal with id="climb\_pid\_run.coverage.1". The complete output from CBMC is in pid_test_suites.xml, where every test suite and the coverage goals it is for are clearly described.

Test suite:
Test 1.
  (iteration 1) desired_climb=-1.000000f, estimator_z_dot=1.000000f

Test 2.
  (iteration 1) desired_climb=-1.000000f, estimator_z_dot=1.000000f
  (iteration 2) desired_climb=1.000000f, estimator_z_dot=-1.000000f

Test 3.
  (iteration 1) desired_climb=0.000000f, estimator_z_dot=-1.000000f
  (iteration 2) desired_climb=1.000000f, estimator_z_dot=-1.000000f

Test 4.
  (iteration 1) desired_climb=1.000000f, estimator_z_dot=-1.000000f
  (iteration 2) desired_climb=1.000000f, estimator_z_dot=-1.000000f
  (iteration 3) desired_climb=1.000000f, estimator_z_dot=-1.000000f
  (iteration 4) desired_climb=1.000000f, estimator_z_dot=-1.000000f
  (iteration 5) desired_climb=0.000000f, estimator_z_dot=-1.000000f
  (iteration 6) desired_climb=1.000000f, estimator_z_dot=-1.000000f

Test 5.
  (iteration 1) desired_climb=-1.000000f, estimator_z_dot=1.000000f
  (iteration 2) desired_climb=-1.000000f, estimator_z_dot=1.000000f
  (iteration 3) desired_climb=-1.000000f, estimator_z_dot=1.000000f
  (iteration 4) desired_climb=-1.000000f, estimator_z_dot=1.000000f
  (iteration 5) desired_climb=-1.000000f, estimator_z_dot=1.000000f
  (iteration 6) desired_climb=-1.000000f, estimator_z_dot=1.000000f

The option --unwind 6 unwinds the loop inside the main function body six times. In order to achieve the complete coverage on all the instrumented goals in the PID function climb_pid_run, the loop must be unwound sufficient enough times. For example, climb_pid_run needs to be called at least six times for evaluating the condition climb_sum_err>MAX_CLIMB_SUM_ERR in line 48 to true. This corresponds to the Test 5. An introduction to the use of loop unwinding can be found in Understanding Loop Unwinding.

In this small tutorial, we present the automatic test suite generation functionality of CBMC, by applying the MC/DC code coverage criterion to a PID controller case study. In addition to --cover mcdc, other coverage criteria like branch, decision, path etc. are also available when calling CBMC.

Coverage Criteria

The table below summarizes the coverage criteria that CBMC supports.

Criterion Definition
assertion For every assertion, generate a test that reaches it
location For every location, generate a test that reaches it
branch Generate a test for every branch outcome
decision Generate a test for both outcomes of every Boolean expression that is not an operand of a propositional connective
condition Generate a test for both outcomes of every Boolean expression
mcdc Modified Condition/Decision Coverage (MC/DC)
path Bounded path coverage
cover Generate a test for every __CPROVER_cover statement

Modelling

Nondeterminism

Rationale

Programs typically read inputs from an environment. These inputs can take the form of data read from a file, keyboard or network socket, or arguments passed on the command line. It is usually desirable to analyze the program for any choice of these inputs. In Model Checking, inputs are therefore modeled by means of nondeterminism, which means that the value of the input is not specified. The program may follow any computation that results from any choice of inputs.

Sources of Nondeterminism

The CPROVER tools support the following sources of nondeterminism:

  • functions that read inputs from the environments;
  • the thread schedule in concurrent programs;
  • initial values of local-scoped variables and memory allocated with malloc;
  • initial values of variables that are extern in all compilation units;
  • explicit functions for generating nondeterminism.

The CPROVER tools are shipped with a number of stubs for the most commonly used library functions. When executing a statement such as getchar(), a nondeterministic value is chosen instead of reading a character from the keyboard.

When desired, nondeterminism can be introduced explicitly into the program by means of functions that begin with the prefix nondet_. As an example, the following function returns a nondeterministically chosen unsigned short int:

unsigned short int nondet_ushortint();

Note that the body of the function is not defined. The name of the function itself is irrelevant (save for the prefix), but must be unique. Also note that a nondeterministic choice is not to be confused with a probabilistic (or random) choice.

Uninterpreted Functions

It may be necessary to check parts of a program independently. Nondeterminism can be used to over-approximate the behaviour of part of the system which is not being checked. Rather than calling a complex or unrelated function, a nondeterministic stub is used. However, separate calls to the function can return different results, even for the same inputs. If the function output only depends on its inputs then this can introduce spurious errors. To avoid this problem, functions whose names begin with the prefix __CPROVER_uninterpreted_ are treated as uninterpreted functions. Their value is non-deterministic but different invocations will return the same value if their inputs are the same. Note that uninterpreted functions are not supported by all back-end solvers.

Modeling with Assertions and Assumptions

Assertions

Assertions are statements within the program that attempt to capture the programmer's intent. The ANSI-C standard defines a header file assert.h, which offers a macro assert(cond). When executing a statement such as

assert(p!=NULL);

the execution is aborted with an error message if the condition evaluates to false, i.e., if p is NULL in the example above. The CPROVER tools can check the validity of the programmer-annotated assertions statically. Specifically, the CPROVER tools will check that the assertions hold for any nondeterministic choice that the program can make. The static assertion checks can be disabled using the --no-assertions command line option.

In addition, there is a CPROVER-specific way to specify assertions, using the built-in function __CPROVER_assert:

__CPROVER_assert(p!=NULL, "p is not NULL");

The (mandatory) string that is passed as the second argument provides an informal description of the assertion. It is shown in the list of properties together with the condition.

The assertion language of the CPROVER tools is identical to the language used for expressions. Note that nondeterminism can be exploited in order to check a range of choices. As an example, the following code fragment asserts that all elements of the array are zero:

int a[100], i;

...

i=nondet_uint();
if(i>=0 && i<100)
  assert(a[i]==0);

The nondeterministic choice will guess the element of the array that is nonzero. The code fragment above is therefore equivalent to

int a[100], i;

...

for(i=0; i<100; i++)
  assert(a[i]==0);

Future CPROVER releases will support explicit quantifiers with a syntax that resembles Spec#:

__CPROVER_forall { *type identifier* ; *expression* }
__CPROVER_exists { *type identifier* ; *expression* }

Assumptions

Assumptions are used to restrict nondeterministic choices made by the program. As an example, suppose we wish to model a nondeterministic choice that returns a number from 1 to 100. There is no integer type with this range. We therefore use __CPROVER_assume to restrict the range of a nondeterministically chosen integer:

unsigned int nondet_uint();

unsigned int one_to_hundred()
{
  unsigned int result=nondet_uint();
  __CPROVER_assume(result>=1 && result<=100);
  return result;
}

The function above returns the desired integer from 1 to 100. You must ensure that the condition given as an assumption is actually satisfiable by some nondeterministic choice, or otherwise the model checking step will pass vacuously.

Also note that assumptions are never retroactive: They only affect assertions (or other properties) that follow them in program order. This is best illustrated with an example. In the following fragment, the assumption has no effect on the assertion, which means that the assertion will fail:

x=nondet_uint();
assert(x==100);
__CPROVER_assume(x==100);

Assumptions do restrict the search space, but only for assertions that follow. As an example, the following program will pass:

int main() {
  int x;

  __CPROVER_assume(x>=1 && x<=100000);

  x*=-1;

  __CPROVER_assert(x<0, "x is negative");
}

Beware that nondeterminism cannot be used to obtain the effect of universal quantification in assumptions. As an example,

int main() {
  int a[10], x, y;

  x=nondet_int();
  y=nondet_int();
  __CPROVER_assume(x>=0 && x<10 && y>=0 && y<10);

  __CPROVER_assume(a[x]>=0);

  assert(a[y]>=0);
}

fails, as there is a choice of x and y which results in a counterexample (any choice in which x and y are different).

Pointer Model

Pointers in C

C programs (and sometimes C++ programs as well) make intensive use of pointers in order to decouple program code from specific data. A pointer variable does not store data such as numbers or letters, but instead points to a location in memory that hold the relevant data. This section describes the way the CPROVER tools model pointers.

Objects and Offsets

The CPROVER tools represent pointers as a pair. The first member of the pair is the object the pointer points to, and the second is the offset within the object.

In C, objects are simply continuous fragments of memory (this definition of "object" is not to be confused with the use of the term in object-oriented programming). Variables of any type are guaranteed to be stored as one object, irrespectively of their type. As an example, all members of a struct or array belong to the same object. CPROVER simply assigns a number to each active object. The object number of a pointer p can be extracted using the expression __CPROVER_POINTER_OBJECT(p). As a consequence, pointers to different objects are always different, which is not sound.

The offset (the second member of the pair that forms a pointer) is relative to the beginning of the object; it uses byte granularity. As an example, the code fragment

unsigned array[10];
char *p;

p=(char *)(array+1);
p++;

will result in a pointer with offset 5. The offset of a pointer p can be extracted using the expression __CPROVER_POINTER_OFFSET(p).

Dereferencing Pointers

The CPROVER tools require that pointers that are dereferenced point to a valid object. Assertions that check this requirement can be generated using the option –pointer-check and, if desired, –bounds-check. These options will ensure that NULL pointers are not dereferenced, and that dynamically allocated objects have not yet been deallocated.

Furthermore, the CPROVER tools check that dynamically allocated memory is not deallocated twice. The goto-instrument tool is also able to add checks for memory leaks, i.e., it detects dynamically allocated objects that are not deallocated once the program terminates.

The CPROVER tools support pointer typecasts. Most casts are supported, with the following exceptions:

  1. One notable exception is that pointers can only be accessed using a pointer type. The conversion of a pointer into an integer-type using a pointer typecast is not supported.
  2. Casts from integers to pointers yield a pointer that is either NULL (if the integer is zero) or that point into a special array for modeling memory-mapped I/O. Such pointers are assumed not to overlap with any other objects. This is, of course, only sound if a corresponding range check is instrumented.
  3. Accesses to arrays via pointers that have the array subtype need to be well-aligned.

Pointers in Open Programs

It is frequently desired to validate an open program, i.e., a fragment of a program. Some variables are left undefined. In case an undefined pointer is dereferenced, CBMC assumes that the pointer points to a separate object of appropriate type with unbounded size. The object is assumed not to alias with any other object. This assumption may obviously be wrong in specific extensions of the program.

Floating Point

The CPROVER tools support bit-accurate reasoning about IEEE-754 floating-point and fixed-point arithmetic. The C standard contains a number of areas of implementation-defined behaviour with regard to floating-point arithmetic:

  • CPROVER supports C99 Appendix F, and thus, the __STD_IEC_559__ macro is defined. This means that the C float data type maps to IEEE 754 binary32 and double maps to binary64 and operations on them are as specified in IEEE 754.
  • long double can be configured to be binary64, binary128 (quad precision) or a 96 bit type with 15 exponent bits and 80 significant bits. The last is an approximation of Intel's x87 extended precision double data type. As the C standard allows a implementations a fairly wide set of options for long double, it is best avoided for both portable code and bit-precise analysis. The default is to match the build architecture as closely as possible.
  • In CPROVER, floating-point expressions are evaluated at the 'natural precision' (the greatest of the arguments) and not at a higher precision. This corresponds to FLT_EVAL_METHOD set to 0. Note that this is a different policy to some platforms (see below).
  • Expression contraction (for example, converting x * y + c to fma(x,y,c)) is not performed. In effect, the FP_CONTRACT pragma is always off.
  • Constant expressions are evaluated at `run' time wherever possible and so will respect changes in the rounding mode. In effect, the FENV_ACCESS pragma is always off. Note that floating point constants are treated as doubles (unless they are followed by f when they are float) as specified in the C standard. goto-cc supports -fsingle-precision-constant, which allows the (non-standard) treatment of constants as floats.
  • Casts from int to float and float to float make use of the current rounding mode. Note that the standard requires that casts from float to int use round-to-zero (i.e. truncation).

x86 and Other Platform-specific Issues

Not all platforms have the same implementation-defined behaviour as CPROVER. This can cause mismatches between the verification environment and the execution environment. If this occurs, check the compiler manual for the choices listed above. There are two common cases that can cause these problems: 32-bit x86 code and the use of unsafe optimisations.

Many compilers that target 32-bit x86 platforms employ a different evaluation method. The extended precision format of the x87 unit is used for all computations regardless of their native precision. Most of the time, this results in more accurate results and avoids edge cases. However, it can result in some obscure and difficult to debug behaviour. Checking if the FLT_EVAL_METHOD macro is non-zero (for these platforms it will typically be 2), should warn of these problems. Changing the compiler flags to use the SSE registers will resolve many of them, give a more standards-compliant platform and will likely perform better. Thus it is highly recommended. Use -msse2 -mfpmath=sse to enable this option for GCC. Visual C++ does not have an option to force the exclusive use of SSE instructions, but /arch:SSE2 will pick SSE instructions "when it \[the compiler\] determines that it is faster to use the SSE/SSE2 instructions" and is thus better than /arch:IA32, which exclusively uses the x87 unit.

The other common cause of discrepancy between CPROVER results and the actual platform are the use of unsafe optimisations. Some higher optimisation levels enable transformations that are unsound with respect to the IEEE-754 standard. Consult the compiler manual and disable any optimisations that are described as unsafe (for example, the GCC options -ffast-math). The options -ffp-contract=off (which replaces -mno-fused-madd), -frounding-math and -fsignaling-nans are needed for GCC to be strictly compliant with IEEE-754.

Rounding Mode

CPROVER supports the four rounding modes given by IEEE-754 1985; round to nearest (ties to even), round up, round down and round towards zero. By default, round to nearest is used. However, command line options (--round-to-zero, etc.) can be used to over-ride this. If more control is needed, CPROVER has models of fesetround (for POSIX systems) and _controlfp (for Windows), which can be used to change the rounding mode during program execution. Furthermore, the inline assembly commands fstcw/fnstcw/fldcw (on x86) can be used.

The rounding mode is stored in the (thread local) variable __CPROVER_rounding_mode, but users are strongly advised not to use this directly.

Math Library

CPROVER implements some of math.h, including fabs, fpclassify and signbit. It has very limited support for elementary functions. Care must be taken when verifying properties that are dependent on these functions as the accuracy of implementations can vary considerably. The C compilers can (and many do) say that the accuracy of these functions is unknown.

Fixed-point Arithmetic

CPROVER also has support for fixed-point types. The --fixedbv flag switches float, double and long double to fixed-point types. The length of these types is platform specific. The upper half of each type is the integer component and the lower half is the fractional part.

Hardware and Software Equivalence and Co-Verification

Introduction

A common hardware design approach employed by many companies is to first write a quick prototype that behaves like the planned circuit in a language like ANSI-C. This program is then used for extensive testing and debugging, in particular of any embedded software that will later on be shipped with the circuit. An example is the hardware of a cell phone and its software. After testing and debugging of the program, the actual hardware design is written using hardware description languages like VHDL or Verilog.

Thus, there are two implementations of the same design: one written in ANSI-C, which is written for simulation, and one written in register transfer level HDL, which is the actual product. The ANSI-C implementation is usually thoroughly tested and debugged.

Due to market constraints, companies aim to sell the chip as soon as possible, i.e., shortly after the HDL implementation is designed. There is usually little time for additional debugging and testing of the HDL implementation. Thus, an automated, or nearly automated way of establishing the consistency of the HDL implementation is highly desirable.

This motivates the verification problem: we want to verify the consistency of the HDL implementation, i.e., the product, [using the ANSI-C implementation as a reference](http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=936243&userType=inst). Es­ta­bli­shing the consistency does not re­quire a formal specification. However, formal methods to verify either the hardware or software design are still desirable.

Related Work

There have been several attempts in the past to tackle the problem. Semeria et al. describe a tool for verifying the combinational equivalence of RTL-C and an HDL. They translate the C code into HDL and use standard equivalence checkers to establish the equivalence. The C code has to be very close to a hardware description (RTL level), which implies that the source and target have to be implemented in a very similar way. There are also variants of C specifically for this purpose. The SystemC standard defines a subset of C++ that can be used for synthesis. Further variants of ANSI-C for specifying hardware are SpecC and Handel C, among others.

The concept of verifying the equivalence of a software implementation and a synchronous transition system was introduced by Pnueli, Siegel, and Shtrichman. The C program is re­quired to be in a very specific form, since a mechanical translation is assumed.

In 2000, Currie, Hu, and Rajan transform DSP assembly language into an equation for the Stanford Validity Checker. The symbolic execution of programs for comparison with RTL is now common practice.

The previous work focuses on a small subset of ANSI-C that is particularly close to register transfer language. Thus, the designer is often re­quired to rewrite the C program manually in order to comply with these constraints. We extend the methodology to handle the full set of ANSI-C language features. This is a challenge in the presence of complex, dynamic data structures and pointers that may dynamically point to multiple objects. Furthermore, our methodology allows arbitrary loop constructs.

Further Material

We provide a small tutorial and a description on how to synchronize inputs between the C model and the Verilog model. There is also a collection of benchmark problems available.

Further Reading

Tutorial

Verilog vs. ANSI-C

We assume that CBMC is installed on your system. If not so, follow these instructions.

The following Verilog module implements a 4-bit counter (counter.v):

module top(input clk);

  reg [3:0] counter;

  initial counter=0;

  always @(posedge clk)
    counter=counter+1;

endmodule

HW-CBMC can take Verilog modules as the one above as additional input. Similar as in co-simulation, the data in the Verilog modules is available to the C program by means of global variables. For the example above, the following C fragment shows the definition of the variable that holds the value of the counter register:

struct module_top {
  unsigned int counter;
};

extern struct module_top top;

Using this definition, the value of the counter register in the Verilog fragment above can be accessed as top.counter. Please note that the name of the variable must match the name of the top module. The C program only has a view of one state of the Verilog model. The Verilog model makes a transition once the function next_timeframe() is called.

As CBMC performs Bounded Model Checking, the number of timeframes available for analysis must be bounded (SATABS). As it is desirable to change the bound to adjust it to the available computing capacity, the bound is given on the command line and not as part of the C program. This makes it easy to use only one C program for arbitrary bounds. The actual bound is available in the C program using the following declaration:

extern const unsigned int bound;

Also note that the fragment above declares a constant variable of struct type. Thus, the C program can only read the trace values and is not able to modify them. We will later on describe how to drive inputs of the Verilog module from within the C program.

As described in previous chapters, assertions can be used to verify properties of the Verilog trace. As an example, the following program checks two values of the trace of the counter module (counter.c):

void next_timeframe();

struct module_top {
  unsigned int counter;
};

extern struct module_top top;

int main() {
  next_timeframe();
  next_timeframe();
  assert(top.counter==2);
  next_timeframe();
  assert(top.counter==3);
}

The following CBMC command line checks these assertions with a bound of 20:

hw-cbmc counter.c counter.v --module top --bound 20

Note that a specific version of CBMC is used, called hw-cbmc. The module name given must match the name of the module in the Verilog file. Multiple Verilog files can be given on the command line.

The --bound parameter is not to be confused with the --unwind parameter. While the --unwind parameter specifies the maximum unwinding depth for loops within the C program, the --bound parameter specifies the number of times the transition relation of the Verilog module is to be unwound.

Counterexamples

For the given example, the verification is successful. If the first assertion is changed to

assert(top.counter==10);

and the bound on the command line is changed to 6, CBMC will produce a counterexample. CBMC produces two traces: One for the C program, which matches the traces described earlier, and a separate trace for the Verilog module. The values of the registers in the Verilog module are also shown in the C trace as part of the initial state.

Initial State
----------------------------------------------------
  bound=6 (00000000000000000000000000000110)
  counter={ 0, 1, 2, 3, 4, 5, 6 }

Failed assertion: assertion line 6 function main

Transition system state 0
----------------------------------------------------
  counter=0 (0000)

Transition system state 1
----------------------------------------------------
  counter=1 (0001)

Transition system state 2
----------------------------------------------------
  counter=2 (0010)

Transition system state 3
----------------------------------------------------
  counter=3 (0011)

Transition system state 4
----------------------------------------------------
  counter=4 (0100)

Transition system state 5
----------------------------------------------------
  counter=5 (0101)

Transition system state 6
----------------------------------------------------
  counter=6 (0110)

Using the Bound

The following program is using the bound variable to check the counter value in all cycles:

void next_timeframe();
extern const unsigned int bound;

struct module_top {
  unsigned int counter;
};

extern struct module_top top;

int main() {
  unsigned cycle;

  for(cycle=0; cycle<bound; cycle++) {
    assert(top.counter==(cycle & 15));
    next_timeframe();
  }
}

CBMC performs bounds checking, and restricts the number of times that next_timeframe() can be called. SATABS does not re­quire a bound, and thus, next_timeframe() can be called arbitrarily many times.

Mapping Variables

Mapping Variables within the Module Hierarchy

Verilog modules are hierarchical. The extern declarations shown above only allow reading the values of signals and registers that are in the top module. In order to read values from sub-modules, CBMC uses structures.

As an example, consider the following Verilog file (heirarchy.v):

module counter(input clk, input [7:0] increment);

  reg [7:0] counter;

  initial counter=0;

  always @(posedge clk)
    counter=counter+increment;

endmodule

module top(input clk);

  counter c1(clk, 1);
  counter c2(clk, 2);

endmodule

The file has two modules: a top module and a counter module. The counter module is instantiated twice within the top module. A reference to the register counter within the C program would be ambiguous, as the two module instances have separate instances of the register. CBMC and SATABS use the following data structures for this example:

void next_timeframe();
extern const unsigned int bound;

struct counter {
  unsigned char increment;
  unsigned char counter;
};

struct module_top {
  struct module_counter c1, c2;
};

extern struct module_top top;

int main() {
  next_timeframe();
  next_timeframe();
  next_timeframe();
  assert(top.c1.counter==3);
  assert(top.c2.counter==6);
}

The main function reads both counter values for cycle 3. A deeper hierarchy (modules in modules) is realized by using additional structure members. Writing these data structures for large Verilog designs is error prone, and thus, HW-CBMC can generate them automatically. The declarations above are generated using the command line

hw-cbmc --gen-interface --module top hierarchy.v

Mapping Verilog Vectors to Arrays or Scalars

In Verilog, a definition such as

wire [31:0] x;

can be used for arithmetic (as in x+10) and as array of Booleans (as in x[2]). ANSI-C does not allow both, so when mapping variables from Verilog to C, the user has to choose one option for each such variable. As an example, the C declaration

unsigned int x;

will allow using x in arithmetic expressions, while the C declaration

__CPROVER_bool x[32];

will allow accessing the individual bits of x using the syntax x[bit]. The --gen-interface option of HW-CBMC will generate the first variant if the vector has the same size as one of the standard integer types, and will use the __CPROVER_bitvector[] type if not so. This choice can be changed by adjusting the declaration accordingly. Note that both SpecC and SystemC offer bit-extraction operators, which means that it unnecessary to use the declaration as array in order to access individual bits of a vector.

Synchronizing Inputs

Driving Primary Inputs

The examples in the tutorial are trivial in the sense that the model has only one possible trace. The initial state is deterministic, and there is only one possible transition, so the verification problem can be solved by testing a single run. In contrast, consider the following Verilog module:

module top(input clk, input i);

  reg [3:0] counter;

  initial counter=0;

  always @(posedge clk)
    if(i)
      counter=counter+1;

endmodule

The module above has an input named i. The top-level inputs of the Verilog design have to be generated by the C program. This is done by assigning the desired values to the corresponding struct member, and then calling the set_inputs() function before calling next_timeframe(). Consider the following example:

void next_timeframe();
void set_inputs();
extern const unsigned int bound;

struct module_top {
  unsigned int counter;
  _Bool i;
};

extern struct module_top top;

int main() {
  assert(top.counter==0);

  top.i=1;
  set_inputs(); next_timeframe();
  assert(top.counter==1);

  top.i=1;
  set_inputs(); next_timeframe();
  assert(top.counter==2);

  top.i=0;
  set_inputs(); next_timeframe();
  assert(top.counter==2);
}

As an example, consider a Verilog module that has a signal reset as an input, which is active-low. The following C fragment drives this input to be active in the first cycle, and not active in any subsequent cycle:

  top.resetn=0;
  set_inputs(); next_timeframe();

  for(i=1; i<bound; i++) {
    top.resetn=1;
    set_inputs(); next_timeframe();
  }

Note that the value of the input must be set before calling next_timeframe(). The effect of the input values on values derived in a combinatorial way is immediately visible. The effect on clocked values becomes visible in the next time frame.

Using Nondeterminism

The examples above use particular, constant values to drive the primary inputs. In order to check the behavior of the Verilog model for more than one specific input, use nondeterminism.

Build Systems, Libraries, and Instrumentation

Build Systems and Libraries

The Problem

Similar to unit testing, the model checking approach requires the user to clearly define what parts of the program should be tested and what the behaviour of these parts is. This requirement has following reasons:

  • Despite recent advances, the size of the programs that model checkers can cope with is still restricted.
  • Typically, you want to verify your program and not the libraries or the operating that it uses (the correctness of these libraries and the OS us usually addressed separately).
  • CBMC and SATABS cannot verify binary libraries.
  • CBMC and SATABS does not provide a model for the hardware (e.g., hard disk, input/output devices) the tested program runs on. Since CBMC and SATABS are supposed to examine the behavior of the tested program for all possible inputs and outputs, it is reasonable to model input and output by means of non-deterministic choice.

Further Reading

Existing software projects usually do not come in a single source file that may simply be passed to a model checker, but is a collection of files held together by a build system. The extraction of models from such a build system using goto-cc is described here.

Integration into Build Systems with goto-cc

Existing software projects usually do not come in a single source file that may simply be passed to a model checker. They rather come in a multitude of source files in different directories and refer to external libraries and system-wide options. A build system then collects the configuration options from the system and compiles the software according to build rules.

The most prevalent build tool on Unix (-based) systems surely is the make utility. This tool uses build rules given in a Makefile that comes with the software sources. Running software verification tools on projects like these is greatly simplified by a compiler that first collects all the necessary models into a single model file. goto-cc is such a model file extractor, which can seamlessly replace gcc and cl.exe in Makefiles. The normal build system for the project may be used to build the software, but the outcome will be a model file with suitable detail for verification, as opposed to a flat executable program. Note that goto-cc comes in different variants depending on the compilation environment. These variants are described here.

Example: Building wu-ftpd

This example assumes a Unix-like machine.

  1. Download the sources of wu-ftpd from here.
  2. Unpack the sources by running tar xfz wu-ftpd-current.tar.gz
  3. Change to the source directory, by entering, e.g., cd wu-ftpd-2.6.2
  4. Configure the project for verification by running

    ./configure YACC=byacc CC=goto-cc --host=none-none-none

  5. Build the project by running make. This creates multiple model files in the src directory. Among them is a model for the main executable ftpd.
  6. Run a model-checker, e.g., CBMC, on the model file:

    `cbmc src/ftpd`
    

    CBMC automatically recognizes that the file is a goto binary.

Important Notes

More elaborate build or configuration scripts often make use of features of the compiler or the system library to detect configuration options automatically, e.g., in a configure script. Replacing gcc by goto-cc at this stage may confuse the script, or detect wrong options. For example, missing library functions do not cause goto-cc to throw an error (only to issue a warning). Because of this, configuration scripts sometimes falsely assume the availability of a system function or library.

In the case of this or similar problems, it is more advisable to configure the project using the normal routine, and replacing the compiler setting manually in the generated Makefiles, e.g., by replacing lines like CC=gcc by CC=goto-cc.

A helpful command that accomplishes this task successfully for many projects is the following:

for i in `find . -name Makefile`; do   sed -e 's/^\(\s*CC[ \t]*=\)\(.*$\)/\1goto-cc/g' -i $i done

Here are additional examples on how to use goto-cc:

A description of how to integrate goto-cc into Microsoft's Visual Studio is here.

Linking Libraries

Some software projects come with their own libraries; also, the goal may be to analyze a library by itself. For this purpose it is possible to use goto-cc to link multiple model files into a library of model files. An object file can then be linked against this model library. For this purpose, goto-cc also features a linker mode.

To enable this linker mode, create a link to the goto-cc binary by the name of goto-ld (Linux and Mac) or copy the goto-cc binary to goto-link.exe (Windows). The goto-ld tool can now be used as a seamless replacement for the ld tool present on most Unix (-based) systems and for the link tool on Windows.

The default linker may need to be replaced by goto-ld or goto-link.exe in the build script, which can be achieved in much the same way as replacing the compiler.

Integration into Visual Studio 2008 to 2012

Visual Studio version 2008 onwards comes with a new XML-based build system called MSBuild. The MSBuild system is also activated when triggering a build from the Visual Studio GUI. The project files created by the Visual Studio GUI are used as input by the MSBuild tool.

The MSBuild system can be used to generate goto-binaries from your Visual Studio project as follows:

  1. Install the goto-cl.exe and goto-link.exe binaries in some directory that is contained in the PATH environment variable.
  2. Add a configuration for the goto-cc build for your project in the configuration manager, named "goto-cc".
  3. Open the Visual Studio Command Prompt (in the Tools menu).
  4. Locate the directory that contains the project. Change into this directory using "CD".
  5. Type

    msbuild /p:CLToolExe=goto-cl.exe /p:LinkToolExe=goto-link.exe    /p:Flavor=goto-cc /p:Platform=x86
    

    The platform can be adjusted as required; the "Flavor" given should match the configuration that was created earlier.

Note that the recent versions of goto-cc also support file names with non-ASCII (Unicode) characters on Windows platforms.

Variants of goto-cc

The goto-cc utility comes in several variants, summarised in the following table.

Executable Environment Prepro
goto-cc gcc (control-flow graph only) gcc -E
goto-gcc gcc ("hybrid" executable) gcc -E
goto-armcc ARM RVDS armcc -E
goto-cl Visual Studio cl /E
goto-cw Freescale CodeWarrior mwcceppc

The primary difference between the variants is the preprocessor called. Furthermore, the language recognized varies slightly. The variants can be obtained by simply renaming the goto-cc executable. On Linux/MacOS, the variants can be obtained by creating a symbolic link.

The "hybrid" executables contain both the control-flow graph for verification purposes and the usual, executable machine code.

Architectural Settings

The behavior of a C/C++ program depends on a number of parameters that are specific to the architecture the program was compiled for. The three most important architectural parameters are:

  • The width of the various scalar types; e.g., compare the value of sizeof(long int) on various machines.
  • The width of pointers; e.g., compare the value of sizeof(int *) on various machines.
  • The endianness of the architecture.

In general, the CPROVER tools attempt to adopt the settings of the particular architecture the tool itself was compiled for. For example, when running a 64 bit binary of CBMC on Linux, the program will be processed assuming that sizeof(long int)==8.

As a consequence of these architectural parameters, you may observe different verification results for an identical program when running CBMC on different machines. In order to get consistent results, or when aiming at validating a program written for a different platform, the following command-line arguments can be passed to the CPROVER tools:

  • The word-width can be set with --16, --32, --64.
  • The endianness can be defined with --little-endian and --big-endian.

When using a goto binary, CBMC and the other tools read the configuration from the binary, i.e., the setting when running goto-cc is the one that matters; the option given to the model checker is ignored in this case.

In order to see the effect of the options --16, --32 and --64, pass the following program to CBMC:

#include <stdio.h>
#include <assert.h>

int main() {
  printf("sizeof(long int): %d\n", (int)sizeof(long int));
  printf("sizeof(int *): %d\n", (int)sizeof(int *));
  assert(0);
}

The counterexample trace contains the strings printed by the printf command.

The effects of endianness are more subtle. Try the following program with --big-endian and --little-endian:

#include <stdio.h>
#include <assert.h>

int main() {
  int i=0x01020304;
  char *p=(char *)&i;
  printf("Bytes of i: %d, %d, %d, %d\n", p[0], p[1], p[2], p[3]);
  assert(0);
}

Property Instrumentation

Properties

We have mentioned properties several times so far, but we never explained what kind of properties CBMC and SATABS can verify. We cover this topic in more detail in this section.

Both CBMC and SATABS use assertions to specify program properties. Assertions are properties of the state of the program when the program reaches a particular program location. Assertions are often written by the programmer by means of the assert macro.

In addition to the assertions written by the programmer, assertions for specific properties can also be generated automatically by CBMC and SATABS, often relieving the programmer from writing "obvious" assertions.

CBMC and SATABS come with an assertion generator called goto-instrument, which performs a conservative static analysis to determine program locations that potentially contain a bug. Due to the imprecision of the static analysis, it is important to emphasize that these generated assertions are only potential bugs, and that the Model Checker first needs to confirm that they are indeed genuine bugs.

The assertion generator can generate assertions for the verification of the following properties:

  • Buffer overflows. For each array access, check whether the upper and lower bounds are violated.
  • Pointer safety. Search for NULL-pointer dereferences or dereferences of other invalid pointers.
  • Division by zero. Check whether there is a division by zero in the program.
  • Not-a-Number. Check whether floating-point computation may result in NaNs.
  • Unitialized local. Check whether the program uses an uninitialized local variable.
  • Data race. Check whether a concurrent program accesses a shared variable at the same time in two threads.

We refrain from explaining the properties above in detail. Most of them relate to behaviors that are left undefined by the respective language semantics. For a discussion on why these behaviors are usually very undesirable, read this blog post by John Regehr.

All the properties described above are reachability properties. They are always of the form

"*Is there a path through the program such that property ... is violated?*"

The counterexamples to such properties are always program paths. Users of the Eclipse plugin can step through these counterexamples in a way that is similar to debugging programs. The installation of this plugin is explained here.

Using goto-instrument

The goto-instrument static analyzer operates on goto-binaries, which is a binary representation of control-flow graphs. The goto-binary is extracted from program source code using goto-cc, which is explained here. Given a goto-program, goto-instrument operates as follows:

  1. A goto-binary is read in.
  2. The specified static analyses are performed.
  3. Any potential bugs found are transformed into corresponding assertions, and are added into the program.
  4. A new goto-binary (with assertions) is written to disc.

As an example, we begin with small C program we call expr.c (taken from here):

int *ptr;

int main(void) {
  if (ptr)
    *ptr = 0;
  if (!ptr)
    *ptr = 1;
}

The program contains an obvious NULL-pointer dereference. We first compile the example program with goto-cc and then instrument the resulting goto-binary with pointer checks.

goto-cc expr.c -o in.gb   goto-instrument in.gb out.gb --pointer-check

We can now get a list of the assertions that have been generated as follows:

goto-instrument out.gb --show-properties

Using either CBMC or SATABS on out.gb, we can obtain a counterexample trace for the NULL-pointer dereference:

cbmc out.gb

The goto-instrument program supports the following checks:

Flag Check --------------------------—
--no-assertions ignore user assertions
--bounds-check add array bounds checks
--div-by-zero-check add division by zero checks
--pointer-check add pointer checks
--signed-overflow-check add arithmetic over- and underflow checks
--unsigned-overflow-check add arithmetic over- and underflow checks
--undefined-shift-check add range checks for shift distances
--nan-check add floating-point NaN checks
--uninitialized-check add checks for uninitialized locals (experimental)
--error-label label check that given label is unreachable

Generating function bodies

Sometimes implementations for called functions are not available in the goto program, or it is desirable to replace bodies of functions with certain predetermined stubs (for example to confirm that these functions are never called, or to indicate that these functions will never return). For this purpose goto-instrument provides the --generate-function-body option, that takes a regular expression (in ECMAScript syntax) that describes the names of the functions to generate. Note that this will only generate bodies for functions that do not already have one; If one wishes to replace the body of a function with an existing definition, the --remove-function-body option can be used to remove the body of the function prior to generating a new one.

The shape of the stub itself can be chosen with the --generate-function-body-options parameter, which can take the following values:

Option Result --------------------------—
nondet-return Do nothing and return a nondet result (this is the default)
assert-false Make the body contain an assert(false)
assume-false Make the body contain an assume(false)
assert-false-assume-false Combines assert-false and assume-false
havoc Set the contents of parameters and globals to nondet

The various combinations of assert-false and assume-false can be used to indicate that functions shouldn't be called, that they will never return or both.

Example: We have a program like this:

// error_example.c
#include <stdlib.h>

void api_error(void);
void internal_error(void);

int main(void)
{
  int arr[10] = {1,2,3,4,5, 6, 7, 8, 9, 10};
  int sum = 0;
  for(int i = 1; i < 10; ++i)
  {
    sum += arr[i];
  }
  if(sum != 55)
  {
    // we made a mistake when calculating the sum
    internal_error();
  }
  if(rand() < 0)
  {
    // we think this cannot happen
    api_error();
  }
  return 0;
}

Now, we can compile the program and detect that the error functions are indeed called by invoking these commands

goto-cc error_example.c -o error_example.goto
# Replace all functions ending with _error
# (Excluding those starting with __)
# With ones that have an assert(false) body
goto-instrument error_example.goto error_example_replaced.goto \
  --generate-function-body '(?!__).*_error' \
  --generate-function-body-options assert-false
cbmc error_example_replaced.goto

Which gets us the output

** Results: [internal_error.assertion.1] assertion false: FAILURE [api_error.assertion.1] assertion false: FAILURE

** 2 of 2 failed (2 iterations) VERIFICATION FAILED

As opposed to the verification success we would have gotten without the generation.

The havoc option takes further parameters globals and params with this syntax: havoc[,globals:<regex>][,params:<regex>] (where the square brackets indicate an optional part). The regular expressions have the same format as the those for the --generate-function-body option and indicate which globals and function parameters should be set to nondet. All regular expressions require exact matches (i.e. the regular expression a|b will match 'a' and 'b' but not 'adrian' or 'bertha').

Example: With a C program like this

struct Complex {
  double real;
  double imag;
};

struct Complex AGlobalComplex;
int do_something_with_complex(struct Complex *complex);

And the command line

goto-instrument in.goto out.goto
  --generate-function-body do_something_with_complex
  --generate-function-body-options
    'havoc,params:.*,globals:AGlobalComplex'

The goto code equivalent of the following will be generated:

int do_something_with_complex(struct Complex *complex)
{
  if(complex)
  {
    complex->real = nondet_double();
    complex->imag = nondet_double();
  }
  AGlobalComplex.real = nondet_double();
  AGlobalComplex.imag = nondet_double();
  return nondet_int();
}

A note on limitations: Because only static information is used for code generation, arrays of unknown size and pointers will not be affected by this; Which means that for code like this:

struct Node {
  int val;
  struct Node *next;
};

void do_something_with_node(struct Node *node);

Code like this will be generated:

void do_something_with_node(struct Node *node)
{
   if(node)
   {
     node->val = nondet_int();
     node->next = nondet_0();
   }
}

Note that no attempt to follow the next pointer is made. If an array of unknown (or 0) size is encountered, a diagnostic is emitted and the array is not further examined.

Some care must be taken when choosing the regular expressions for globals and functions. Names starting with __ are reserved for internal purposes; For example, replacing functions or setting global variables with the __CPROVER prefix might make analysis impossible. To avoid doing this by accident, negative lookahead can be used. For example, (?!__).* matches all names not starting with __.

The CPROVER API Reference

The following sections summarize the functions available to programs that are passed to the CPROVER tools.

Functions

__CPROVER_assume, __CPROVER_assert, assert

void __CPROVER_assume(_Bool assumption);
void __CPROVER_assert(_Bool assertion, const char *description);
void assert(_Bool assertion);

The function **__CPROVER_assume** adds an expression as a constraint to the program. If the expression evaluates to false, the execution aborts without failure. More detail on the use of assumptions is in the section on Assumptions and Assertions.

__CPROVER_r_ok, __CPROVER_w_ok

void __CPROVER_r_ok(const void *, size_t size);
void __CPROVER_w_ok(cosnt void *, size_t size);

The function **__CPROVER_r_ok** returns true if reading the piece of memory starting at the given pointer with the given size is safe. **__CPROVER_w_ok** does the same with writing.

__CPROVER_same_object, __CPROVER_POINTER_OBJECT, __CPROVER_POINTER_OFFSET, __CPROVER_DYNAMIC_OBJECT

_Bool __CPROVER_same_object(const void *, const void *);
__CPROVER_size_t __CPROVER_POINTER_OBJECT(const void *p);
__CPROVER_ssize_t __CPROVER_POINTER_OFFSET(const void *p);
_Bool __CPROVER_DYNAMIC_OBJECT(const void *p);

The function **__CPROVER_same_object** returns true if the two pointers given as arguments point to the same object. The function **__CPROVER_POINTER_OFFSET** returns the offset of the given pointer relative to the base address of the object. The function **__CPROVER_DYNAMIC_OBJECT** returns true if the pointer passed as arguments points to a dynamically allocated object.

__CPROVER_is_zero_string, __CPROVER_zero_string_length, __CPROVER_buffer_size

_Bool __CPROVER_is_zero_string(const void *);
__CPROVER_size_t __CPROVER_zero_string_length(const void *);
__CPROVER_size_t __CPROVER_buffer_size(const void *);

__CPROVER_initialize

void __CPROVER_initialize(void);

The function **__CPROVER_initialize** computes the initial state of the program. It is called prior to calling the main procedure of the program.

__CPROVER_input, __CPROVER_output

void __CPROVER_input(const char *id, ...);
void __CPROVER_output(const char *id, ...);

The functions **__CPROVER_input** and **__CPROVER_output** are used to report an input or output value. Note that they do not generate input or output values. The first argument is a string constant to distinguish multiple inputs and outputs (inputs are typically generated using nondeterminism, as described here). The string constant is followed by an arbitrary number of values of arbitrary types.

__CPROVER_cover

void __CPROVER_cover(_Bool condition);

This statement defines a custom coverage criterion, for usage with the test suite generation feature.

__CPROVER_isnan, __CPROVER_isfinite, __CPROVER_isinf, __CPROVER_isnormal, __CPROVER_sign

_Bool __CPROVER_isnan(double f);
_Bool __CPROVER_isfinite(double f);
_Bool __CPROVER_isinf(double f);
_Bool __CPROVER_isnormal(double f);
_Bool __CPROVER_sign(double f);

The function **__CPROVER_isnan** returns true if the double-precision floating-point number passed as argument is a NaN.

The function **__CPROVER_isfinite** returns true if the double-precision floating-point number passed as argument is a finite number.

This function **__CPROVER_isinf** returns true if the double-precision floating-point number passed as argument is plus or minus infinity.

The function **__CPROVER_isnormal** returns true if the double-precision floating-point number passed as argument is a normal number.

This function **__CPROVER_sign** returns true if the double-precision floating-point number passed as argument is negative.

__CPROVER_abs, __CPROVER_labs, __CPROVER_fabs, __CPROVER_fabsl, __CPROVER_fabsf

int __CPROVER_abs(int x);
long int __CPROVER_labs(long int x);
double __CPROVER_fabs(double x);
long double __CPROVER_fabsl(long double x);
float __CPROVER_fabsf(float x);

These functions return the absolute value of the given argument.

__CPROVER_array_equal, __CPROVER_array_copy, __CPROVER_array_set

_Bool __CPROVER_array_equal(const void array1[], const void array2[]);
void __CPROVER_array_copy(const void dest[], const void src[]);
void __CPROVER_array_set(const void dest[], value);

The function **__CPROVER_array_equal** returns true if the values stored in the given arrays are equal. The function **__CPROVER_array_copy** copies the contents of the array src to the array dest. The function **__CPROVER_array_set** initializes the array dest with the given value.

Uninterpreted Functions

Uninterpreted functions are documented here.

Predefined Types and Symbols

__CPROVER_bitvector

__CPROVER_bitvector [ expression ]

This type is only available in the C frontend. It is used to specify a bit vector with arbitrary but fixed size. The usual integer type modifiers signed and unsigned can be applied. The usual arithmetic promotions will be applied to operands of this type.

__CPROVER_floatbv

__CPROVER_floatbv [ expression ] [ expression ]

This type is only available in the C frontend. It is used to specify an IEEE-754 floating point number with arbitrary but fixed size. The first parameter is the total size (in bits) of the number, and the second is the size (in bits) of the mantissa, or significand (not including the hidden bit, thus for single precision this should be 23).

__CPROVER_fixedbv

 __CPROVER_fixedbv [ expression ] [ expression ]

This type is only available in the C frontend. It is used to specify a fixed-point bit vector with arbitrary but fixed size. The first parameter is the total size (in bits) of the type, and the second is the number of bits after the radix point.

__CPROVER_size_t

The type of sizeof expressions.

__CPROVER_rounding_mode

 extern int __CPROVER_rounding_mode;

This variable contains the IEEE floating-point arithmetic rounding mode.

__CPROVER_constant_infinity_uint

This is a constant that models a large unsigned integer.

__CPROVER_integer, __CPROVER_rational

**__CPROVER_integer** is an unbounded, signed integer type. **__CPROVER_rational** is an unbounded, signed rational number type.

__CPROVER_memory

extern unsigned char __CPROVER_memory[];

This array models the contents of integer-addressed memory.

__CPROVER::unsignedbv<N> (C++ only)

This type is the equivalent of unsigned __CPROVER_bitvector[N] in the C++ front-end.

__CPROVER::signedbv<N> (C++ only)

This type is the equivalent of signed __CPROVER_bitvector[N] in the C++ front-end.

__CPROVER::fixedbv<N> (C++ only)

This type is the equivalent of **__CPROVER_fixedbv[N,m]** in the C++ front-end.

Concurrency

Asynchronous threads are created by preceding an instruction with a label with the prefix **__CPROVER_ASYNC_**.

goto-cc: Extracting Models from the Linux Kernel

The Linux kernel code consists of more than 11 million lines of low-level C and is frequently used to evaluate static analysis techniques. In the following, we show how to extract models from Linux 2.6.39.

  1. First of all, you will need to make sure you have around 100 GB of free disc space available.
  2. Download the Kernel sources at http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.39.tar.bz2.
  3. Now do

    bunzip2 linux-2.6.39.tar.bz2\ tar xvf linux-2.6.39.tar\ cd linux-2.6.39

  4. Now ensure that you can actually compile a kernel by doing

    make defconfig\ make

    These steps need to succeed before you can try to extract models from the kernel.

  5. Now compile gcc-wrap.c and put the resulting binary into a directory that is in your PATH variable:

    lwp-download http://www.cprover.org/cprover-manual/gcc-wrap.c\ gcc gcc-wrap.c -o gcc-wrap\ cp gcc-wrap ~/bin/\

    This assumes that the directory ~/bin exists and is in your PATH variable.

  6. Now change the variable CC in the kernel Makefile as follows:
    CC = ~/bin/gcc-wrap
    
  7. Now do

    make clean
    make
    

    This will re-compile the kernel, but this time retaining the preprocessed source files.

  8. You can now compile the preprocessed source files with goto-cc as follows:

    find ./ -name .tmp_*.i > source-file-list
      for a in `cat source-file-list` ; do
        goto-cc -c $a -o $a.gb
    done
    

    Note that it is important that the word-size of the kernel configuration matches that of goto-cc. Otherwise, compile-time assertions will fail, generating the error message "bit field size is negative". For a kernel configured for a 64-bit word-width, pass the option –64 to goto-cc.

The resulting .gb files can be passed to any of the CPROVER tools.

goto-cc: Extracting Models from the Apache HTTPD

The Apache HTTPD is still the most frequently used web server. Together with the relevant libraries, it consists of around 0.4 million lines of C code. In the following, we show how to extract models from Apache HTTPD 2.4.2.

  1. First of all, we download the sources of Apache HTTPD and two supporting libraries and uncompress them:
    lwp-download http://www.mirrorservice.org/sites/ftp.apache.org/apr/apr-1.4.6.tar.bz2
    lwp-download http://www.mirrorservice.org/sites/ftp.apache.org/apr/apr-util-1.4.1.tar.bz2
    lwp-download http://mirror.catn.com/pub/apache/httpd/httpd-2.4.2.tar.bz2
    bunzip2 < apr-1.4.6.tar.bz2 | tar x
    bunzip2 < apr-util-1.4.1.tar.bz2 | tar x
    bunzip2 < httpd-2.4.2.tar.bz2 | tar x
    
  2. Now compile gcc-wrap.c and put the resulting binary into a directory that is in your PATH variable:

    lwp-download http://www.cprover.org/cprover-manual/gcc-wrap.c
    gcc gcc-wrap.c -o gcc-wrap
    cp gcc-wrap ~/bin/
    

    This assumes that the directory ~/bin exists and is in your PATH variable.

  3. We now build the sources with gcc:
    (cd apr-1.4.6; ./configure; make CC=gcc-wrap)
    (cd apr-util-1.4.1; ./configure --with-apr=../apr-1.4.6 ; make CC=gcc-wrap)
    (cd httpd-2.4.2; ./configure --with-apr=../apr-1.4.6 --with-apr-util=../apr-util-1.4.1 ; make CC=gcc-wrap)
    
  4. You can now compile the preprocessed source files with goto-cc as follows:
    find ./ -name *.i > source-file-list
      for a in `cat source-file-list` ; do
      goto-cc -c $a -o $a.gb
    done
    

The resulting .gb files can be passed to any of the CPROVER tools.