"Analyzing Runtime Complexity via Innermost Runtime Complexity" - Experiments

While it is difficult to analyze the runtime complexity of term rewrite systems (TRSs) automatically, there exist many powerful techniques to infer upper bounds on the innermost runtime complexity of TRSs (i.e., on the lengths of rewrite sequences that follow an innermost evaluation strategy). In the paper "Analyzing Runtime Complexity via Innermost Runtime Complexity", we present a sufficient criterion which ensures that the runtime complexity of a TRS coincides with its innermost runtime complexity. Since this criterion can easily be checked automatically, one can now use all techniques and tools for innermost runtime complexity in order to analyze (full) runtime complexity as well. By extensive experiments with an implementation of our results in the tool AProVE, we show that this improves the state of the art of automated complexity analysis significantly.

References

F. Frohn and J. Giesl
Analyzing Runtime Complexity via Innermost Runtime Complexity
Full version including all proofs. Appeared as Technical Report AIB-2017-02, RWTH Aachen University, 2017.
F. Frohn, J. Giesl, J. Hensel, C. Aschermann, and T. Ströder
Lower Bounds for Runtime Complexity of Term Rewriting
Journal of Automated Reasoning, to appear. ©Springer-Verlag
Preliminary Version
J. Giesl, C. Aschermann, M. Brockschmidt, F. Emmes, F. Frohn, C. Fuhs, J. Hensel, C. Otto, M. Plücker, P. Schneider-Kamp, T. Ströder, S. Swiderski, and R. Thiemann.
Analyzing Program Termination and Complexity Automatically with AProVE
Journal of Automated Reasoning, 58(1):3-31, 2017. ©Springer-Verlag
The final publication is available at Springer via the DOI 10.1007/s10817-016-9388-y.
Preliminary Version

Implementation in AProVE

We integrated our new technique in the termination and complexity analyzer AProVE, which is one of the most powerful termination and complexity tools for TRSs. It can be accessed via a custom web interface. Please note that the server running the web interface is considerably slower than the computer used for the benchmarks.

Settings

We analyzed each example in five different settings. Each example was analyzed with our own tool AProVE as well as with TcT, since AProVE and TcT were the two most powerful tools in the category "Runtime Complexity - Full Rewriting" at the Termination Competition 2016. While both AProVE and TcT also support the inference of lower bounds, in the following experiments both tools were configured to just prove upper bounds. As suggested by the authors of TcT, we used version 3.2.0 of their tool. Moreover, we also experimented with a setting where we analyzed each example with TcT after preprocessing it with AProVE's implementation of the new techniques presented in our paper "Analyzing Runtime Complexity via Innermost Runtime Complexity". We refer to this setting as "TcT preproc". Moreover, we also consider the "union" of the results of AProVE and "TcT preproc" to estimate how the state of the art has improved by the techniques from our paper "Analyzing Runtime Complexity via Innermost Runtime Complexity". We refer to this setting as "AProVE & TcT". Finally, we analyzed each example using AProVE's techniques to prove lower bounds. We refer to this setting as "AProVE lower".

All tools were run on a 2.33 GHz Intel Xeon 5140 Debian Linux machine. For AProVE and TcT, we used a timeout of 60 seconds. For "TcT preproc", we used a timeout of 60 seconds for the preprocessing and an additional timeout of 60 seconds for TcT. For "AProVE lower", we used a timeout of 300 seconds. The reason for the latter is that "AProVE lower" does not compete with the other settings, as all other settings just infer upper bounds. In contrast, better lower bounds emphasize the quality of the obtained upper bounds.

Examples

We ran the tools on 899 TRSs from the category "Runtime Complexity - Full Rewriting" of the Termination Problem Data Base (TPDB 10.4), which is the collection of examples used at the Termination Competition 2016. Here, we omitted 60 TRSs which contain rules with extra variables on the right-hand side, since such rules are not allowed in classical term rewriting. In all settings but "AProVE lower", we also omitted 235 examples where AProVE is able to prove a super-polynomial lower bound. The reason is that AProVE and TcT just support polynomial upper bounds so far.

Experiments & Discussion

In the following tables, rc stands for the runtime complexity function, mapping a natural number n to the length of the longest evaluation sequence starting with a basic term of at most size n.

The above table compares TcT with AProVE. The entries below the diagonal correspond to examples where AProVE's results are better than TcT's (e.g., there are 65 examples where TcT could not prove any polynomial upper bound on rc whereas AProVE now infers a linear bound). Similarly, the entries on the diagonal denote examples where both tools obtain the same result, and the entries above the diagonal are examples where TcT is better than AProVE. Thus, AProVE yields better results in 118 cases and TcT yields better results in 43 cases.

The previous table showed that AProVE is now the most powerful tool for upper bounds on rc. However, this is only due to the results of our paper "Analyzing Runtime Complexity via Innermost Runtime Complexity". To demonstrate this, we show that TcT can outperform AProVE again by integrating the technique from this paper. To this end, the above table compares AProVE with "TcT preproc", i.e., with the results obtained by applying AProVE's implementation of the technique from our new paper and analyzing the resulting TRS with TcT afterwards. "TcT preproc" can prove upper bounds in 299 cases, while AProVE only succeeds in 270 cases. Hence, combining TcT with the new technique of our paper results in the most powerful tool for upper bounds on rc. However, the results of the tools are orthogonal: There are 20 examples where AProVE obtains better bounds and 53 examples where "TcT preproc" is better.

This table compares the results of all the different settings used to prove upper bounds with AProVE and TcT. Moreover, the "union" of AProVE and "TcT preproc" is presented separately ("AProVE & TcT"). Here, we used the best bound obtained by AProVE or "TcT preproc" for each example. The entries in the row "O(n^k) or less" mean that the corresponding tool proved at least the upper bound O(n^k), but maybe even a smaller upper bound (so the entry 229 in the second row, second column, means that AProVE proved constant or linear upper bounds in 229 cases). At the Termination Competition 2015, TcT was the only tool for upper bounds on rc and hence it represents the former state of the art for this task. Thus, the setting "AProVE & TcT" shows how the state of the art has improved by the technique presented in this paper. Compared to TcT, "AProVE & TcT" proves 99 additional upper bounds.

Finally, the above table compares "AProVE & TcT" with the lower bounds proved by AProVE. It shows that the upper bounds are tight in all but 35 cases. In 33 of these cases, the lower and upper bounds just differ by a factor of n. This comparison clearly shows that the quality of the bounds found by AProVE and TcT is very good.

Click here for more detailed results of all benchmarks. Clicking on the runtimes in the detailed table allows to access the output of the respective prover. Clicking on the name of the example in the first column of the detailed table opens the web interface of AProVE in order to run it on this particular example. Since the runtimes of the individual ("competing") techniques in AProVE may vary in each call, the results may also vary if AProVE is called repeatedly on the same example.