"Complexity Analysis for Term Rewriting by Integer Transition Systems" - Experiments and Proofs

We present a new method to infer upper bounds on the innermost runtime complexity of term rewrite systems (TRSs), which benefits from recent advances on complexity analysis of integer transition systems (ITSs). To this end, we develop a transformation from TRSs to a generalized notion of ITSs with (possibly non-tail) recursion, called RNTSs. To analyze their complexity, we introduce a modular technique which allows us to use existing tools for standard ITSs in order to infer complexity bounds for RNTSs. The key idea of our technique is a summarization method that allows us to analyze components of the transition system independently. We implemented our contributions in the tool AProVE and our experiments show that one can now infer bounds for significantly more TRSs than with previous state-of-the-art tools for term rewriting.


References

Here is the full version of our paper Complexity Analysis for Term Rewriting by Integer Transition Systems (including all proofs and a description of improvements to increase the precision of our abstraction from TRSs to RNTSs).


Implementation in AProVE

We integrated our new technique in the termination and complexity analyzer AProVE, which is one of the most powerful termination and complexity tools for TRSs. It can be accessed via a custom web interface. Please note that the server running the web interface is considerably slower than the computer used for the benchmarks.


Settings

In our experiments, we consider the previous version of AProVE (AProVE '16), a version using only the techniques from our paper (AProVE RNTS), and AProVE '17 which integrates the techniques from our paper into AProVE's previous approach to analyze innermost runtime complexity (irc). AProVE RNTS uses the external tools CoFloCo, KoAT, and PUBS to compute runtime bounds for the ITSs resulting from the transformation of RNTSs to ITSs presented in Sect. 4 of our paper. Note that CoFloCo can also infer complexity bounds for recursive ITSs directly, i.e., it does not require our transformation from RNTSs to ITSs. Therefore, as a fourth alternative, AProVE RNTS uses CoFloCo to analyze the RNTSs obtained from the transformation from TRSs to RNTSs presented in Sect. 3 of our paper directly. Additionally, we also compare with TcT 3.1.0, since AProVE and TcT were the most powerful complexity tools for TRSs at the Termination and Complexity Competition 2016.

Please note that all versions of AProVE always pre-process the analyzed TRS to remove rules with non-basic left-hand sides that are unreachable from basic terms (as described in our LPAR'17 paper Analyzing Runtime Complexity via Innermost Runtime Complexity).

We used a timeout of 300 seconds per example on an Intel Xeon with 4 cores at 2.33 GHz each and 16 GB of RAM.


Examples

We evaluated the settings described above on all 922 examples of the category "Runtime Complexity - Innermost Rewriting" of the Termination and Complexity Competition 2016. Here, we excluded the 100 examples where AProVE shows irc(n) ∈ Θ(ω).


Experiments & Discussion

In the following tables, irc stands for the innermost runtime complexity function, mapping a natural number n to the length of the longest innermost evaluation sequence starting with a basic term of at most size n.

Table: Comparison of Various Settings to Compute Upper Bounds

The table above shows the overall results, where AProVE & TcT represents the former state of the art, i.e., for each example here we took the best bound found by AProVE '16 or TcT. A row "≤ O(nk)" means that the corresponding tools proved an upper bound ≤ O(nk) (e.g., TcT proved constant or linear upper bounds in 276 cases). As one can see in the table, KoAT's ability to prove exponential bounds for ITSs also enables AProVE to infer exponential upper bounds for TRSs in some cases. Clearly, AProVE '17 is the most powerful tool, i.e., the contributions of this paper significantly improve the state of the art for complexity analysis of TRSs.

Table: AProVE RNTS vs. AProVE '16

The next table compares the techniques from this paper (AProVE RNTS) with AProVE '16 in detail. For example, the entry "42" in the row marked with "O(n)" and the column marked with "O(n2)" means that there are 42 examples where AProVE RNTS inferred a linear upper bound, whereas AProVE '16 only showed a quadratic upper bound.

While AProVE '16 proves more upper bounds than AProVE RNTS (439 vs. 422), both approaches are clearly orthogonal: AProVE RNTS infers smaller bounds in 171 cases (entries right of the diagonal) and AProVE '16 is better in 146 examples.

Table: AProVE RNTS vs. TcT

The table above compares the techniques from this paper with TcT. AProVE RNTS succeeds slightly more often (422 vs. 393), but similarly to the comparison with AProVE '16, both approaches are orthogonal: AProVE RNTS infers smaller bounds in 153 cases and TcT is better in 113 examples.

Table: AProVE RNTS vs. AProVE & TcT

To show that our technique is indeed orthogonal to all techniques applied by state-of-the-art complexity analysis tools, the table above compares AProVE RNTS with AProVE & TcT. AProVE RNTS proves better bounds than AProVE & TcT in 127 cases. In 102 of these cases, AProVE & TcT fails to prove any bound.

Due to its ability to infer non-linear size bounds, the technique from Sect. 4 of our paper is especially suitable to analyze examples where non-linear growth of data causes non-linear runtime. The following list contains examples with non-linear runtime where our technique from Sect. 4 obtains better results than all other considered techniques (i.e., than AProVE '16, TcT, and the combination of the abstraction from Sect. 3 of our paper with CoFloCo).

In fact, with the exception of raML/splitandsort.raml where TcT proves irc(n) ∈ O(n5), all other techniques failed to prove any upper bound for these examples in our experiments. In five of these 17 cases, our technique infers an exponential upper bound. Four of these five examples indeed have exponential complexity.

Click here for more detailed results of all benchmarks.