7. Other Input Formats

Why3 can be used to verify programs written in languages other than WhyML. Currently, Why3 supports tiny subsets of C and Python, respectively coined micro-C and micro-Python below. These were designed for teaching purposes. They come with their own specification languages, written in special comments. These input formats are described below.

Why3 also supports the format MLCFG, which is an extension of WhyML allowing to define function bodies in an unstructured style, with goto statements.

Any Why3 tool (why3 prove, why3 ide, etc.) can be passed a file with a suffix .c, .py or .mlcfg, which triggers the corresponding input format.

7.1. micro-C

Micro-C is a valid subset of ANSI C. Hence, micro-C files can be passed to a C compiler.

7.1.1. Syntax of micro-C

Logical annotations are inserted in special comments starting with //@ or /*@. In the following grammar, we only use the former kind, for simplicity, but both kinds are allowed.

file      ::=  decl*
decl      ::=  c_include | c_function | logic_declaration
c_include ::=  "#include" "<" file-name ">"

Directives #include are ignored during the translation to Why3. They are allowed anyway, such that a C source code using functions such as printf (see below) is accepted by a C compiler.

Function definition

c_function  ::=  return_type identifier "(" params? ")" spec* block
return_type ::=  "void" | "int"
params      ::=  param ("," param)*
param       ::=  "int" identifier | "int" identifier "[]"

Function specification

spec ::=  "//@" "requires" term ";"
          | "//@" "ensures"  term ";"
          | "//@" "variant"  term ("," term)* ";"

C expression

expr ::=  integer-literal | string-literal
          | identifier | identifier ( "++" | "--" ) | ( "++" | "--" ) identifier
          | identifier "[" expr "]"
          | identifier "[" expr "]" ( "++" | "--")
          | ( "++" | "--") identifier "[" expr "]"
          | "-" expr | "!" expr
          | expr ( "+" | "-" | "*" | "/" | "%" | "==" | "!=" | "<" | "<=" | ">" | ">=" | "&&" | "||" ) expr
          | identifier "(" (expr ("," expr)*)? ")"
          | "scanf" "(" '"%d"' "," "&" identifier ")"
          | "(" expr ")"

C statement

stmt       ::=  ";"
                | "return" expr ";"
                | "int" identifier ";"
                | "int" identifier "[" expr "]" ";"
                | "break" ";"
                | "if" "(" expr ")" stmt
                | "if" "(" expr ")" stmt "else" stmt
                | "while" "(" expr ")" loop_body
                | "for" "(" expr_stmt ";" expr ";" expr_stmt ")" loop_body
                | expr_stmt ";"
                | block
                | "//@" "label" identifier ";"
                | "//@" ( "assert" | "assume" | "check" ) term ";"
block      ::=  "{" stmt* "}"
expr_stmt  ::=  "int" identifier "=" expr
                | identifier assignop expr
                | identifier "[" expr "]" assignop expr
                | expr
assignop   ::=  "=" | "+=" | "-=" | "*=" | "/="
loop_body  ::=  loop_annot* stmt
                | "{" loop_annot* stmt* "}"
loop_annot ::=  "//@" "invariant" term ";"
                | "//@" "variant" term ("," term)* ";"

Note that the syntax for loop bodies allows the loop annotations to be placed either before the block or right at the beginning of the block.

Logic declarations

logic_declaration ::=  "//@" "function" "int" identifier "(" params ")" ";"
                       | "//@" "function" "int" identifier "(" params ")" "=" term ";"
                       | "//@" "predicate" identifier "(" params ")" ";"
                       | "//@" "predicate" identifier "(" params ")" "=" term ";"
                       | "//@" "axiom" identifier ":" term ";"
                       | "//@" "lemma" identifier ":" term ";"
                       | "//@" "goal"  identifier ":" term ";"

Logic functions are limited to a return type int.

Logical term

term   ::=  identifier
            | integer-literal
            | "true"
            | "false"
            | "(" term ")"
            | term "[" term "]"
            | term "[" term "<-" term "]"
            | "!" term
            | "old" "(" term ")"
            | "at" "(" term "," identifier ")"
            | "-" term
            | term ( "->" | "<->" | "||" | "&&" ) term
            | term ( "==" | "!=" | "<" | "<=" | ">" | ">=" ) term
            | term ( "+" | "-" | "*" | "/" | "% ) term
            | "if" term "then" term "else term
            | "let" identifier "=" term "in" term
            | ( "forall" | "exists" ) binder ("," binder)* "." term
            | identifier "(" (term ("," term)*)? ")"
binder ::=  identifier
            | identifier "[]"

7.1.2. Built-in functions and predicates

C code

  • scanf is limited to the syntax scanf("%d", &x).

  • printf is limited to printf(string-literal, expr1, ..., exprn). The string literal should contain exactly n occurrences of %d (not checked by Why3).

  • rand() returns a pseudo-random integer in the range 0 to RAND_MAX inclusive.

Logic

  • int length(int a[]) returns the length of array a.

  • int occurrence(int v, int a[]) returns the number of occurrences of the value v in array a.

7.2. micro-Python

Micro-Python is a valid subset of Python 3. Hence, micro-Python files can be passed to a Python interpreter.

7.2.1. Syntax of micro-Python

Notation: In the grammar of micro-Python given below, special symbols NEWLINE, INDENT, and DEDENT mark an end of line, the beginning of a new indentation block, and its end, respectively.

Logical annotations are inserted in special comments starting with #@.

file      ::=  decl*
decl      ::=  py_import | py_constant | py_function | stmt | logic_declaration
py_import ::=  "from" identifier "import" identifier ("," identifier)* NEWLINE

Directives import are ignored during the translation to Why3. They are allowed anyway, such that a Python source code using functions such as randint is accepted by a Python interpreter (see below).

Constant definition

py_constant ::=  "#@" "constant" NEWLINE identifier "=" expr NEWLINE

Function definition

py_function ::=  logic_def? "def" identifier "(" params? ")" return_type? ":" NEWLINE INDENT spec* stmt* DEDENT
params      ::=  param ("," param)*
param       ::=  identifier (":" py_type)?
return_type ::=  "->" py_type
py_type     ::=  identifier ("[" py_type ("," py_type)* "]")?
                 | "'" identifier

Function specification

logic_def ::=  "#@" "function" NEWLINE
spec      ::=  "#@" "requires" term NEWLINE
               | "#@" "ensures"  term NEWLINE
               | "#@" "variant"  term ("," term)* NEWLINE

Python expression

expr ::=  "None" | "True" | "False" | integer-literal | string-literal
          | identifier
          | identifier "[" expr "]"
          | "-" expr | "not" expr
          | expr ( "+" | "-" | "*" | "//" | "%" | "==" | "!=" | "<" | "<=" | ">" | ">=" | "and" | "or" ) expr
          | identifier "(" (expr ("," expr)*)? ")"
          | expr ("," expr)+
          | "[" (expr ("," expr)*)? "]"
          | expr "if" expr "else" expr
          | "(" expr ")"

Python statement

stmt        ::=  simple_stmt NEWLINE
                 | "if" expr ":" suite else_branch
                 | "while" expr ":" loop_body
                 | "for" identifier "in" expr ":" loop_body
else_branch ::=  /* nothing */
                 | "else:" suite
                 | "elif" expr ":" suite else_branch
suite       ::=  simple_stmt NEWLINE
                 | NEWLINE INDENT stmt stmt* DEDENT
simple_stmt ::=  expr
                 | "return" expr
                 | identifier "=" expr
                 | identifier "[" expr "]" "=" expr
                 | "break"
                 | "#@" "label" identifier
                 | "#@" ( "assert" | "assume" | "check" ) term
loop_body   ::=  simple_stmt NEWLINE
                 | NEWLINE INDENT loop_annot* stmt stmt* DEDENT
loop_annot  ::=  "#@" "invariant" term NEWLINE
                 | "#@" "variant" term ("," term)* NEWLINE

Logic declaration

logic_declaration ::=  "#@" "function" identifier "(" params ")" return_type? ("variant" "{" term "}")? ("=" term)? NEWLINE
                       | "#@" "predicate" identifier "(" params ")" ("=" term)? NEWLINE
                       | "#@" "axiom" identifier ":" term NEWLINE
                       | "#@" "lemma" identifier ":" term NEWLINE

Note that logic functions and predicates cannot be given definitions. Yet, they can be axiomatized, using toplevel assume statements.

Logical term

term ::=  identifier
          | integer-literal
          | "None"
          | "True"
          | "False"
          | "(" term ")"
          | term "[" term "]"
          | term "[" term "<-" term "]"
          | "not" term
          | "old" "(" term ")"
          | "at" "(" term "," identifier ")"
          | "-" term
          | term ( "->" | "<->" | "or" | "and" | "by" | "so" ) term
          | term ( "==" | "!=" | "<" | "<=" | ">" | ">=" ) term
          | term ( "+" | "-" | "*" | "//" | "%" ) term
          | "if" term "then" term "else term
          | "let" identifier "=" term "in" term
          | ( "forall" | "exists" ) param ("," param)* "." term
          | identifier "(" (term ("," term)*)? ")"

7.2.2. Built-in functions and predicates

Python code

  • built-in function pow over integers

  • len(l) returns the length of list l.

  • int(input()) reads an integer from standard input.

  • range(l, u) returns the list of integers from l inclusive to u exclusive. In particular, for x in range(l, u): is supported.

  • randint(l, u) returns a pseudo-random integer in the range l to u inclusive.

Logic

  • len(l) returns the length of list l.

  • occurrence(v, l) returns the number of occurrences of the value v in list l.

7.2.3. Limitations

Python lists are modeled as arrays, whose size cannot be modified.

7.3. MLCFG

The MLCFG language is an experimental extension of the regular WhyML language, in which the body of program functions can be coded using labelled blocks and goto statements. MLCFG can be used to encode algorithms which are presented in an unstructured fashion. It can also be used as a target for encoding unstructured programming languages in Why3, for example assembly code.

7.3.1. Syntax of the MLCFG language

The MLCFG syntax is an extension of the regular WhyML syntax. Every WhyML declaration is allowed, plus an additional declaration of program function of the following form, introduced by keywords let cfg:

let cfg f (x1: t1) ... (xn: tn): t
  requires { Pre }
  ensures  { Post }
=
 var y1: u1;
 ...
 var yk: uk;
 { instructions; terminator }
 L1 { instructions1; terminator1 }
 ...
 Lj { instructionsj; terminatorj }

It defines a program function f, with the usual syntax for its contract. The difference is the body, which is made of a sequence of declarations of mutable variables with their types, an initial block, composed of a zero or more instructions followed by a terminator, and a sequence of other blocks, each denoted by a label (\(L_1 \ldots L_j\) above). The instructions are semi-colon separated sequences of regular WhyML expressions of type unit, excluding return or absurd expressions, or code invariants:

  • a code invariant: invariant I { t } where I is a name and t a predicate. It is similar to an assert expression, meaning that t must hold when execution reaches this statement. Additionally, it acts as a cut in the generation of VC, similarly to a loop invariant. See example below.

Each block is ended by one of the following terminators:

  • a goto statement: goto L where L is one of the labels of the other blocks. It instructs to continue execution at the given block.

  • a switch statement, of the form

    switch (e)
    | pat1 -> terminator1
    ...
    | patk -> terminatork
    end
  • a return statement: return expr

  • an absurd statement: indicating that this block should be unreachable.

The extension of syntax is described by the following rules.

file        ::=  module*
module      ::=  "module" ident decl* "end"
decl        ::=  "let" "cfg" cfg_fundef
                 | "let" "rec" "cfg" cfg_fundef ("with" cfg_fundef)*
                 | "scope" ident decl* "end"
cfg_fundef  ::=  ident binder+ : type spec "=" vardecl* "{" block "}" labelblock*
vardecl     ::=  "var" ident* ":" type ";" | "ghost" "var" ident* ":" type ";"
block       ::=  (instruction ";")* terminator
labelblock  ::=  ident "{" block "}"
instruction ::=  expr
                 | "invariant" ident "{" term "}"
terminator  ::= 
                 | "return" expr
                 | "absurd"
                 | "goto" ident
                 | "switch" "(" expr ")" switch_case* "end"
switch_case ::=  "|" pattern "->" terminator

7.3.2. An example

The following example is directly inspired from the documentation of the ANSI C Specification Language (See [BFM+18], Section 2.4.2 Loop invariants, Example 2.27). It is itself inspired from the first example of Knuth’s MIX language, for which formal proofs were first investigated by J.-C. Filliâtre in 2007 ([Fil07]), and also revisited by T.-M.-T. Nguyen in her PhD thesis in 2012 ([Ngu12], Section 9.5 Translation from a CFG to Why, page 115).

This example aims at computing the maximum value of an array of integers. Its code in C is given below.

/*@ requires n >= 0 && \valid(a,0,n);
  @ ensures \forall integer j ; 0 <= j < n ==> \result >= a[j]);
  @*/
int max_array(int a[], int n) {
  int m, i = 0;
  goto L;
  do {
    if (a[i] > m) { L: m = a[i]; }
    /*@ invariant
      @   0 <= i < n && \forall integer j ; 0 <= j <= i ==> m >= a[j]);
      @*/
    i++;
  }
  while (i < n);
  return m;
}

The code can be viewed as a control-flow graph as shown in Fig. 7.1.

digraph G {
	graph [bgcolor=lightgray; rankdir="TB"];
	node [margin=0.05,shape=ellipse,style=filled,fillcolor="cyan"];
	"entry"     [pos="2,3!"];
	"L"         [pos="2,2!"];
	"invariant" [pos="2,1!"];
	"cond"      [pos="2,0!"];
        "do"        [pos="0,1!"];
        "exit"      [pos="4,0!"];

        "entry" -> "L"        [label=" i <- 0"];
	"L" -> "invariant"    [label=" m <- a[i]"];
	"invariant" -> "cond" [label=" i <- i+1"];
        "cond" -> "exit"      [label=" i >= n"];
	"cond" -> "do"        [label=" i < n"];
        "do" -> "L"           [label=" a[i] > m"; headport=e];
        "do" -> "invariant"   [label=" a[i] <= m"];
}

Fig. 7.1 Control-flow graph of the max_array function.

Below is a version of this code in the MLCFG language, where label L corresponds to node L, label L1 to node invariant, label L2 to node do.

let cfg max_array (a:array int) : (max: int, ghost ind:int)
  requires { length a > 0 }
  ensures  { 0 <= ind < length a }
  ensures  { forall j. 0 <= j < length a -> a[ind] >= a[j] }
=
var i m: int;
ghost var ind: int;
{
  i <- 0;
  goto L
}
L {
  m <- a[i];
  ind <- i;
  goto L1
}
L1 {
  invariant i_bounds   { 0 <= i < length a };
  invariant ind_bounds { 0 <= ind < length a };
  invariant m_and_ind  { m = a[ind] };
  invariant m_is_max   { forall j. 0 <= j <= i -> m >= a[j] };
                         (* yes, j <= i, not j < i ! *)
  i <- i + 1;
  switch (i < length a)
  | True  -> goto L2
  | False -> return (m, ind)
  end
}
L2 {
  switch (a[i] > m)
  | True  -> goto L
  | False -> goto L1
  end
}

The consecutive invariants act as a single cut in the generation of VCs.

7.3.3. Error messages

The translation from the MLCFG language to regular WhyML code may raise the following errors.

  • “cycle without invariant”: in order to perform the translation, any cycle on the control-flow graph must contain at least one invariant clause. It corresponds to the idea that any loop must contain a loop invariant.

  • “cycle without invariant (starting from I)”: same error as above, except that the cycle was not reachable from the start of the function body, but from the other invariant clause named \(I\).

  • “label L not found for goto”: there is a goto instruction to a non-existent label.

  • “unreachable code after goto”: any code occurring after a goto statement is unreachable and is not allowed.

  • “unsupported: trailing code after switch”: see limitations below.

7.3.4. Current limitations

  • There is no way to prove termination.

  • New keywords cfg, goto, switch, and var cannot be used as regular identifiers anymore.

  • Trailing code after switch is not supported. In principle, it should be possible to have a switch with type unit and to transfer the execution to the instructions after the switch for branches not containing goto. This is not yet supported. A workaround is to place the trailing instructions in another block and pose an extra goto to this block in all the branches that do not end with a goto.

  • Conditional statements if e then i1 else i2 are not yet supported, but can be simulated with switch (e) | True -> i1 | False -> i2 end.

7.3.5. Alternative translation scheme: Stackify

An alternative translation scheme from MLCFG to regular WhyML can be triggered by putting the attribute [@cfg:stackify] on a function. This method attempts to recover a more structured program body, reconstructing loops when possible.

7.3.6. Subregion analysis

Additional invariants on the generated WhyML code can be inferred by putting the attribute [@cfg:subregion_analysis] on a function. These invariants are derived by a static analysis of the subregions that are never modified in a loop.

7.4. S-expressions

This format is intended to be used only as an intermediate language format. Why3 accepts input files in the format of a big S-expression, mirroring the internal structure of parse trees, as they are defined in the module Ptree of the OCaml API. The S-expression in question is supposed to be generated by the API itself, that is the function Ptree.sexp_of_mlw_file. It is in fact exactly the format that is also printed using why3 pp --output with --output=sexp.

Notice that this input format is available only when Why3 is compiled with S-expressions support provided by the ppx_sexp_conv OCaml library.

Concerning the use of Why3 as an intermediate language, via the Ptree API, the use of S-expressions instead of the regular WhyML syntax is recommended since the printing and re-parsing of parse trees is automatically generated, hence the result is guaranteed to faithfully mirror the internal Ptree representation.