Disproving (Positive) Almost-Sure Termination of Probabilistic Term Rewriting via Random Walks

This web site contains additional information on our paper “Disproving (Positive) Almost-Sure Termination of Probabilistic Term Rewriting via Random Walks”. More precisely, we provide information on our implementation in the tool AProVE, and we describe our experiments.

Implementation in AProVE

The easiest way to run AProVE is via its web interface (for PTRSs in WST format or in ARI format).

Moreover, recent releases of AProVE are available for download on GitHub. Release master_2026_02_15 is the newest release that is able to disprove almost-sure termination (AST) and positive almost-sure termination (PAST) via embeddings of random walks. Note that instead of AST or PAST, the new version of AProVE only allows the user to select the proof goals AST or SAST (strong almost-sure termination). However, SAST implies PAST, and thus, whenever AProVE proves SAST, it has also proved PAST. Moreover, AProVE does not have any specific techniques to disprove SAST and thus, whenever AProVE disproves SAST, it actually disproves PAST instead. Thus, if one wants to analyze PAST, one can simply choose the proof goal “SAST” instead. See here for further details and for the reason why we analyze SAST instead of PAST.

For termination analysis of probabilistic term rewrite systems, the following dependencies have to be installed (see here for more installation instructions):

Extending the path environment is necessary so that AProVE can find these programs.

In the following, we first describe the input syntax of a PTRS file. Then, we explain more details about our implementation. In the end, we describe how to execute the implementation via the web interface and via the JAR file on GitHub.

Input Syntax

Our implementation allows for multiple different input formats of PTRSs.

WST Format

First, AProVE can parse the input syntax for PTRSs that was first described in the paper [ADY20]. Here, the PTRS file needs to have the extension ‘.ptrs’. The syntax extends the WST format for term rewrite systems by adding a new syntax for probabilistic rules. A probabilistic rewrite rule has the form:

lhs -> w1 : rhs1 || ... || wk : rhsk

Here, lhs, rhs1, …, rhsk are terms and w1,…,wk are natural numbers > 0. This corresponds to the rule:

lhs -> {w1/w : rhs1, ..., wk/w : rhsk}

where w is the sum of all w1,…,wk.

We also allow the following syntax for probabilistic rules that might be easier to read:

lhs -> a1/b1 : rhs1 || ... || ak/bk : rhsk

Here, lhs, rhs1,…,rhsk are again terms and a1,…,ak and b1,…,bk are natural numbers > 0 that describe the actual probabilities. This corresponds to the rule:

lhs -> {a1/b1 : rhs1, ..., ak/bk : rhsk}

However, one has to ensure that the value ai/bi is actually between 0 and 1, and that the sum of all probabilities in a rule is 1. Otherwise, AProVE rejects the input file with an exception. The PTRS may contain rules in both notations.

One can use AProVE to analyze “certain termination”, AST, or SAST of PTRSs. The proof goal can be indicated via the keywords ‘GOAL’. The input file has to contain either ‘(GOAL TERMINATION)’, ‘(GOAL AST)’, or ‘(GOAL SAST)’.

ARI Format

Second, AProVE can also parse the new ARI format for (probabilistic) term rewrite systems from the annual termination competition. Then, the PTRS file must have the extension ‘.ari’.

Implementation

As described in the input syntax section, we only allow for rational probabilities. Internally, we represent probabilities as (arbitrary precision) fractions, by storing the numerator and denominator as (arbitrary precision) integers.

The implementation for analyzing AST runs our existing techniques to prove AST from [KG24], [KG25], and [KG26] in parallel with our novel techniques to disprove AST. In order to disprove AST via an embedding of random walks based on loops, we first search for a term that loops in the non-probabilistic variant np(P) of the PTRS P. This is done, e.g., via narrowing the right-hand sides of the dependency pairs of np(P) until one finds a term that semi-unifies with the term that one started with. Here, we use the heuristics of [GTS05] to decide how to narrow which dependency pair. After finding a looping term t, we construct the corresponding rewrite sequence tree T for the looping rewrite sequence. Afterwards, we proceed by rewriting the leaves of the tree T until we find enough occurrences of the initial looping term t within the leaves of the tree T in order to disprove AST. If t is linear, we can use Thm. 10 to disprove AST, i.e., we can count all non-overlapping occurrences of the term t within the leaves of T. If the tree T is nvd, we can use Thm. 14 to disprove AST, i.e., we can still count all non-overlapping occurrences of the term t. Otherwise, we have to use Thm. 16, i.e., we can only count orthogonal occurrences of the term t. As long as T is still nvd, we compute both orthogonal and non-overlapping occurrences using our dynamic programming algorithm, since we do not know whether T will still be nvd in the end. Note that the dynamic programming algorithm can be adjusted to compute the number of both occurrences simultaneously. During the extension of T, i.e., before rewriting the leaves of the tree, we perform simple infeasibility checks on the leaf we want to rewrite, to detect whether an occurrence of the looping term t might possibly be reached. If we cannot reach any term with an occurrence of t, then we do not have to rewrite this leaf anymore and ignore it. We check infeasibility via the symbol transition graph from [SY19]. After we performed a rewrite step on all leaves where we might potentially reach an occurrence of t, we check whether we have enough non-overlapping occurrences (or orthogonal occurrences) of the term t in all the leaves such that Thm. 10 or Thm. 14 (or Thm. 16) is applicable and disproves AST. If after a certain threshold of rewrite steps we have not disproven AST, we stop the current tree construction and search for the next looping term and restart the tree construction.

To embed random walks based on pattern terms as in Thm. 27, we proceed in a similar way. In addition, we construct all possible splits of the looping term into a base term t and a pumping substitution σ such that <t,σ> is a pattern term. Moreover, we have to extend the tree by rewrite steps until we find an occurrence of the base term in each leaf and until the required commutation property (of the pumping substitution and the instantiating substitutions in the leaves) holds.

The implementation for analyzing SAST is analogous and runs the existing techniques to prove SAST from [ADY20] and [KSG25] in parallel with our novel techniques to disprove SAST. The implementation to disprove SAST only differs from the above implementation for AST by using weak comparisons with 1 in Theorems 10, 14, 16, and 27 while for disproving AST we have to use strict comparisons.

Web Interface

Our implementation can be tested via a web interface for PTRSs in WST format or in ARI format.

JAR File

After downloading AProVE and installing all dependencies, make sure that the path variables are set as described above. Then AProVE can be run by executing

java -ea -jar aprove.jar -m wst shuffle.ptrs

or if you want to see the full proof you can use the option -p plain for a plain text version or -p html for an html format.

java -ea -jar aprove.jar -m wst shuffle.ptrs -p plain
java -ea -jar aprove.jar -m wst shuffle.ptrs -p html

One can also set a fixed time limit, which should be used, since our transformational processors to prove termination might run infinitely long. Use the option -t limit to set the maximum time to search for a termination proof in seconds, e.g.,

java -ea -jar aprove.jar -m wst shuffle.ptrs -p html -t 30

Experiments

We performed experiments with our new implementation in AProVE together with running every individual theorem on its own. We used the set of 138 benchmarks from the Termination Problem Data Base (TPDB), which were also used for the evaluation of [KSG25]. However, this benchmark set was mainly created to evaluate techniques that can prove AST or SAST. Therefore, most of the examples are indeed AST, and we added 20 more examples that are not AST or not SAST, which include all PTRSs from our paper (that were not already part of the benchmark set) and examples that express typical bugs in implementations of probabilistic algorithms. For the latter, we included a broken shuffle algorithm and a broken algorithm that probabilistically replaces numbers in a given sequence of 0s and 1s. You can find our new benchmarks here, and the set of all benchmarks is here. Note that the .ari files do not yet contain the respective proof goal, i.e., one still has to add, e.g., ‘(GOAL AST)’, or ‘(GOAL SAST)’.

We tested AProVE on a computer with an Apple M4 CPU and 16 GB of RAM, and a timeout of 30 seconds was used for each example. This is different to the timeout and hardware used in the annual Termination Competition, since the StarExec site that was used for the competition in previous years is not available anymore.


The following table shows the overall results for disproving AST on all examples in the benchmark set:

Examples Thm. 4 Thm. 10 Thm. 14 Thm. 16 Thm. 27 AProVE
158 8 17 18 16 12 24

The next table shows the results for SAST:

Examples Thm. 4 Thm. 10 Thm. 14 Thm. 16 Thm. 27 AProVE
158 8 33 34 28 30 49

Note that there are 70 examples for which AProVE can prove AST and 31 examples for which AProVE can prove SAST. You can find a detailed table with the individual result
for each benchmark here for AST and here for SAST. The output of AProVE for each individual benchmark can be found here. (An empty output denotes a timeout by AProVE.)

You can also test each individual theorem via the following JAR files: Thm4, Thm10, Thm14, Thm16, and Thm27.

Relations Between the Different Techniques

As mentioned in the paper, each of our theorems has its own benefits, and for each individual theorem Thm. X there exists an example that can only be solved by Thm. X.

Thm. 10, Thm. 14, Thm. 16, and Thm. 27 are all beneficial for disproving both AST and SAST: By adjusting the probabilities, the examples where only the respective theorem can disprove SAST can easily be modified such that only the respective theorem can disprove AST. However, Thm. 4 is only beneficial for disproving AST. If we only consider SAST, then Thm. 4 is subsumed by Thm. 16. If there exists a P-RST with t at the root and an occurrence of t in every leaf, then we can embed a symmetric random walk by counting orthogonal occurrences of t, and hence, disprove SAST via Thm. 16.

Why SAST Instead of PAST

Our implementation in the tool AProVE only allows the user to analyze AST or SAST, but one cannot select PAST. Recall that a program is almost-surely terminating (AST) if the probability for termination is 1. A program is positively almost-surely terminating (PAST) if the expected length of every evaluation is finite. Finally, a program is strongly or bounded almost-surely terminating (SAST) if for every term t there is a finite bound on the expected lengths of all evaluations starting in t. Thus, if there is a start term t which non-deterministically leads to evaluations of arbitrary finite length, then the program can be PAST, but not SAST. In general, SAST implies PAST, and PAST implies AST, but the converse directions do not hold.

At first sight, one might think that all three notions are interesting and worth investigating. However, we have shown in [KG25] that PAST and SAST coincide for almost all PTRSs. For example, if the signature contains at least one function symbol of arity greater than 1, then there is no difference between PAST and SAST for finite PTRSs. Since AProVE only analyzes finite rewrite systems, we decided to omit an implementation that focuses specifically on PAST.

Note that since SAST implies PAST, our techniques to disprove PAST can be used to disprove SAST as well. Since AProVE does not implement any techniques that can prove PAST of PTRSs that are not SAST and it also does not implement any techniques that can disprove SAST of PTRSs that are PAST, whenever the user chooses “SAST”, any termination result reported by AProVE holds for both SAST and PAST.