"Complexity Analysis for Java with AProVE" - Experiments

While AProVE is one of the most powerful tools for termination analysis of Java since many years, we now extend our approach in order to analyze the complexity of Java programs as well. Based on a symbolic execution of the program, we develop a novel transformation of (possibly heap-manipulating) Java programs to integer transition systems (ITSs). This allows us to use existing complexity analyzers for ITSs to infer runtime bounds for Java programs. We demonstrate the power of our implementation in AProVE by an extensive experimental evaluation.


References

Here is the full version of our paper Complexity Analysis for Java with AProVE (including all proofs).


Implementation in AProVE

We integrated our new technique in the termination and complexity analyzer AProVE, which is one of the most powerful termination and complexity tools for various kinds of programs and rewrite systems.

You can download a pre-configured VirtualBox image to test our implementation. To use it, please proceed as follows:

The output of AProVE has the form WORST_CASE(?,O(n^2)), for example. The first argument of WORST_CASE is an asymptotic lower bound on the worst-case runtime and the second argument is an asymptotic upper bound on the worst-case runtime. Here, "n" stands for the sum of the sizes of all object arguments and of the absolute values of all integer arguments. Currently, AProVE cannot yet infer lower runtime bounds for Java and thus, the first argument of WORST_CASE is always "?".


Custom Web Interface

Our implementation can also be accessed via a custom web interface. To access it, please select an example from our detailed evaluation page. Here, AProVE again displays an asymptotic lower and upper bound on the worst-case runtime. Please note that the server running the web interface is considerably slower than the computer used for the benchmarks.


Settings

In our experiments, we compare AProVE with the tool COSTA, which is, to the best of our knowledge, the only other available tool which analyzes the complexity of (possibly heap-manipulating) Java programs fully automatically.

We used a timeout of 60 seconds per example on an Intel Xeon with 4 cores at 2.33 GHz each and 16 GB of RAM.


Examples

We first evaluated AProVE on 212 non-recursive examples of the category "Java Bytecode" of the Termination Problem Data Base (TPDB), the collection of examples used at the annual Termination Competition. We omitted 80 examples from the folders BSOG_FoVeOOS_11 and Julia_11_iterative, as they almost exclusively contain non-terminating examples. Furthermore, we omitted the following eight examples, where the AProVE version from the Termination Competition 2016 can prove non-termination:

Unfortunately, comparing AProVE with COSTA on these examples is not possible. The reason is that the TPDB examples simulate numeric input arguments by the lengths of the strings in the argument args of main(String[] args), which is the entry point of the program. (So AProVE computes a complexity bound for the abstract state that corresponds to the start of the main method.) As COSTA abstracts arrays to their length, it loses all information about the elements of args and hence fails for almost all TPDB examples.

Hence, to compare with COSTA, we adapted these 212 non-recursive examples from the TPDB such that they do not rely on main's argument to simulate numeric inputs anymore. Instead of main, now a new entry point method with a suitable number of integer arguments is analyzed directly. Here, we excluded Julia_10_Iterative/RSA, as its source code is missing and hence an adaption was not possible (i.e., this results in a collection of 211 examples). Note that in many cases this adaption is not equivalent, as main's argument array can be arbitrarily long (and hence it can be used to simulate arbitrarily many numeric inputs), whereas the arity of the new entry point method is fixed.

Note that, in contrast to the quadratic upper bound presented in our paper, AProVE just proves an upper bound in O(n4) for SortCount from the TPDB. The reason is that in our paper, we directly analyze the method sort. In contrast, SortCount from the TPDB first constructs a list from the elements of main's only argument args and sorts it afterwards. While the runtime complexity of the TPDB version of SortCount is also quadratic, AProVE just infers a quadratic (instead of a linear) bound for the size of the constructed list and hence obtains a less precise runtime bound in O(n4) for the method main.

To run AProVE on the leading example from our paper (i.e., to analyze sort directly), please run leading sort 60 0 (to disable summaries) or leading sort 60 1 (to enable summaries) in our pre-configured VirtualBox image. Here, AProVE indeed infers a quadratic bound.


Experiments & Discussion

In the following tables, "n" stands for the sum of the sizes of all object arguments and of the absolute values of all integer arguments. In the case of AProVE, the size of an object o is the number of objects reachable from o plus the absolute values of all integers reachable from o. In the case of COSTA, the size of o is the length of the longest path starting in o.

O(1)O(n)O(n2)O(n3)O(n>3)?
31102151558
Table 1: AProVE's results, TPDB

Table 1 shows AProVE's results on the TPDB examples. In 154 of 212 cases (73%), AProVE finds a runtime bound within 7.5 s on average.

Click here for the detailed results of AProVE on the examples from the TPDB.

O(1)O(log n)O(n)O(n · log n)O(n2)O(n3)O(n>3)?
AProVE2801020132462
COSTA104453501143
Table 2: Adapted TPDB

Table 2 shows the results of COSTA and AProVE on the adapted TPDB examples. AProVE succeeds in 149 cases, whereas COSTA proves an upper bound in 68 cases and infers a smaller bound than AProVE in 4 cases.

Classical heap-manipulating examples where COSTA fails, but AProVE succeeds (even if we use COSTA's backend PUBS instead of CoFloCo and KoAT) include:

Click here for the detailed results of our comparison of COSTA and AProVE. Note that our web interface can only analyze programs with a main method as entry point. As the examples that we used to compare with COSTA do not have such an entry point, please use our pre-configured VirtualBox image to analyze them with AProVE.