Correspondence systems

Given a list of pairs p with {u, v}=Transpose[p] the constraint to be satisfied is

StringJoin[u[[s]]]==StringJoin[v[[s]]]

Thus for example p={{"ABB", "B"}, {"B", "BA"}, {"A", "B"}} has shortest solution s={2, 3, 2, 2, 3, 2, 1, 1}. (One can have lists instead of strings, replacing StringJoin by Flatten.)

Correspondence systems were introduced by Emil Post in 1945 to give simple examples of undecidability; he showed that the so-called Post Correspondence Problem (PCP) of satisfying their constraints is in general undecidable (see below). With 2 string pairs PCP was shown to be decidable in 1981. It is known to be undecidable when 9 pairs are used, but I strongly suspect that it is also undecidable with just 3 pairs. The undecidability of PCP has been used to establish undecidability of many problems related to groups, context-free languages, and other objects defined by relations (see page 1141). Finding PCP solutions shorter than a given length is known to be an NP-complete problem.

With r string pairs and n=StringLength[StringJoin[p]] there are 2^{n} Binomial[n-1, 2r-1] possible constraints (assuming no strings of zero length), each being related to at most 8 r! others by straightforward symmetries (or altogether 4^(n-1) for given n). The number of constraints which yield solutions of specified lengths Length[s] for r=2 and r=3 are as follows (the boxes at the end give the number of cases with no solution):

With r=2, as n increases an exponentially decreasing fraction of possible constraints have solutions; with r=3 it appears that a fraction more than 1/4 continue to do so. With r=2, it appears that if a solution exists, it must have length n+4 or less. With r≤3, the longest minimal solution lengths for n≤10 are given above. (Allowing r>3 yields no greater lengths for these values of n.) With n=11, example (l) yields a solution of length 112. The only possible longer n=11 case is {{"AAB", "B"}, {"B", "A"}, {"A", "AABB"}}, for which any possible solution must be longer than 200. With n=12, {{"AABAAB", "B"}, {"B", "A"}, {"A", "AB"}} has minimal solution length 120 and {{"A", "AABB"}, {"AAB", "B"}, {"B", "AA"}} has minimal solution length 132.

A given constraint can fail to have a solution either because the colors of cells at some point cannot be made to match, or because the two strings can never have the same finite length (as in {{"A", "AA"}}). To know that a solution exists in a particular case, it is sufficient just to exhibit it. To know that no solution is possible of any length, one must in effect have a proof.

In general, one condition for a solution to exist is that integer numbers of pairs can yield strings of the same length, so that given the length differences d=Map[StringLength, p, {2}] . {1, -1} there is a vector v of non-negative integers such that v . d == 0. If only one color of element ever appears this is the complete condition for a solution—and for r=2 solutions exist if Apply[Times, d] < 0 and are then of length at least Apply[Plus[##]/GCD[##]&, Abs[d]]. With two colors of elements additional conditions can be constructed involving counting elements of each color, or various blocks of elements.

The undecidability of PCP can be seen to follow from the undecidability of the halting problem through the fact that the question of whether a tag system of the kind on page 93 with initial sequence s ever reaches a halting state (where none of its rules apply) is equivalent to the question of whether there is a way to satisfy the PCP constraint

TSToPCP[{n_, rule_}, s_] := Map[Flatten[IntegerDigits[#, 2, 2]] &, Module[{f}, f[u_] := Flatten[Map[{1, #} &, 3u]]; Join[Map[{f[Last[#]], RotateLeft[f[First[#]]]} &, rule], {{f[s], {1}}}, Flatten[Table[{{1, 2}, Append[RotateLeft[f[IntegerDigits[j, 2, i]]], 2]}, {i, 0, n - 1}, {j, 0, 2^{i} - 1}], 1]]], {2}]

Any PCP constraint can also immediately be related to the evolution of a multiway tag system of the kind discussed in the note below. Assuming that the upper string is never shorter than the lower one, the rules for the relevant tag system are given simply by

Apply[Append[#2, s___] -> Prepend[#1, s] &, p, {1}]

In the case of example (e) the existence of a solution of length 24 can then be seen to follow from the fact that MWTSEvolve[rule, {{"B"}}, 22] contains {"B","A"}.

This correspondence with tag systems can be used in practice to search for PCP solutions, though it is usually most efficient to run tag systems that correspond both to moving forward and backward in the string, and to see whether their results ever agree. (In most PCP systems, including all the examples shown except (a) and (g), one string is always systematically longer than the other.) The tag system approach is normally limited by the number of intermediate strings that may need to be kept.

The pictures below show which possible sequences of up to 6 blocks yield upper and lower strings that agree in each of the PCP systems in the main text. As indicated in the first picture for the case of two blocks, each possible successively longer sequence corresponds to a rectangle in the picture (compare page 594). When a sequence of blocks leads to upper and lower strings that disagree, the rectangle is left white. If the strings agree so far, then the rectangle is colored with a gray that is darker if the strings are closer in length. Rectangles that are black (as visible in cases (a) and (b)) correspond to actual PCP solutions where the strings are the same length. Note that in case (c) the presence of only one color in either block means that strings will always agree so far. In cases (m) through (s) there is ultimately no solution, but as the pictures indicate, in these specific PCP systems there are always strings that agree as far as they have gone—it is just that they never end up the same length.

As one example of how one proves that a PCP constraint cannot be satisfied, consider case (s). From looking at the structure of the individual pairs one can see that if there is a solution it must begin with pair 1 or pair 3, and end with pair 1. But in fact it cannot begin with pair 1 because this would mean that the upper string would have to start off being longer, then at some point cross over to being shorter. However, the only way that such a crossover can occur is by pair 3 appearing with its upper A aligned with its second lower A. Yet starting with pair 1, the upper string is longer by 2 As, and the pairs are such that the length difference must always remain even—preventing the crossover from occurring. This means that any solution must begin with pair 3. But this pair must then be followed by another pair 3, which leaves BAAB sticking out on the bottom. So how can this BAAB be removed? The only way is to use the sequence of pairs 2, 3, 3, 2—yet doing this will just produce another BAAB further on. And thus one concludes that there is no way to satisfy these particular PCP constraints.

One can generalize PCP to allow any number of colors, and to require correspondence among any number of strings—though it is fairly easy to translate any such generalization to the 2-string 2-color case.