Driven IFS and Data Analysis

Finite Words

The picture

contains 4096 points. We can estimate the probability of each of the 16 pairs by counting the number of points in the corresponding subsquare and dividing by 4096. The number of points in each address through length 2 can be obtained from the Address Stats menu of the driven IFS program. In this way, we obtain these nonzero populations

n₁=1021, n₂=637, n₃=889, n₄=1549

n₁₁=209, n₁₄=812, n₂₁=259, n₂₄=377, n₃₁=476, n₃₃=53, n₃₄=360, n₄₁=77, n₄₂=637, n₄₃=835

Note the address 241 is empty, but both 24 and 41 are occupied. That is, this length 3 address is not empty as a consequence of forbidden pairs. Is it empty because our data set is too short, or have we detected a forbidden triple independent of the forbidden pairs?

Given the probabilities of the pair transitions, what is the likelihood that address 241 will be unoccupied after 4096 iterations of this driven IFS? That is, we are testing the hypothesis that all addresses that remain empty for long time series must contain one of the forbidden pairs 12, 13, 22, 23, 32, 44. If we find it unlikely that address 241 would be visited in 4096 iterations, then we deduce that 241 probably is a forbidden triple not a consequence of any forbidden pair.
For this calculation, we build a simple Markov process with five states A, B, C, D, and E:

A 241 has not occurred, and the current string has left-most entry 2 or 3,

B 241 has not occurred, and the current string has left-most entry 1,

C 241 has not occurred, the current string has left-most entry 4, and the previous entry is not 1

D 241 has not occurred, the current string has left-most entry 4, and the previous entry is 1

E 241 has occurred

The transitions between the states of the Markov process are shown in this graph.

How are we to understand this graph?

A → B To move from A to B we must apply T₁. Given the left most entry of strings in A, this is achieved by 2 → 1 or 3 → 1.

A → C To move from A to C we must apply T₄. Given the left most entry of strings in A, this is achieved by 2 → 4 or 3 → 4.

A → A All other combinations move from A to A.

B → A To make the transition B → A, we must apply T₂ or T₃. This transition is given by 1 → 2 and 1 → 3.

B → B To stay in state B, T₁ must be applied again. That is, B → B is given by 1 → 1.

B → D To move from B to D, T₄ must be applied. That is, B → D is given by 1 → 4.

C → A To move from C to A, T₂ or T₃must be applied. That is, C → A is given by 4 → 2 and 4 → 3.

C → B To move from C to B, T₁ must be applied. That is, C → B is given by 4 → 1.

C → C To move from C to C, T₄ must be applied. That is, C → C is given by 4 → 4.

D → A To move from D to A we must apply T₃. (Note that applying T₂ takes us to E.) So D → A is given by 4 → 3.

D → B To move from D to B we must apply T₁, so D → B is given by 4 → 1.

D → C To move from D to C we must apply T₄, so D → C is given by 4 → 4.

D → E To move from D to E we must apply T₂, so D → E is given by 4 → 2.

Next, we estimate the probability of each of these transitions.

Pr(A → B) is (n₁₂ + n₁₃)/(n₂ + n₃) = (0 + 0)/(637 + 889) = 0

Pr(A → C) is (n₄₂ + n₄₃)/(n₂ + n₃) = (637 + 835)/(637 + 889) = .9646

Pr(A → A) = 1 - (Pr(A → B) + Pr(A → C)) = 1 - .9464 = .0354

Pr(B → A) is (n₂₁ + n₃₁)/n₁ = (259 + 476)/1021 = .7199

Pr(B → B) is n₁₁/n₁ = 209/1021 = .2047

Pr(B → D) is n₄₁/n₁ = 77/1021 = .0754

Pr(C → A) is (n₂₄ + n₃₄)/n₄ = (377 + 360)/1549 = .4758

Pr(C → B) is n₁₄/n₄ = 812/1549 = .5242

Pr(C → C) is n₄₄/n₄ = 0/1549 = 0

Pr(D → A) is n₃₄/n₄ = 360/1549 = .2324

Pr(D → B) is n₁₄/n₄ = 812/1549 = .5242

Pr(D → C) is n₄₄/n₄ = 0/1549 = 0

Pr(D → D) is n₂₄/n₄ = 377/1549 = .2434

In the standard language of Markov processes, E is an absorbing state. That is, once the string enters state E, it remains in state E.

In terms of the driven IFS, entering state E means having points in square 241. So we would like to find the probability of not entering state E in a string of 4096 points.

To do this, we construct the transition matrix for this Markov process. (Row labels are "to;" column labels are "from.")

A B C D E

A .0347 .7199 .4758 .2324 0

B 0 .2047 .5242 .5242 0

C .9646 0 0 0 0

D 0 .0754 0 0 0

E 0 0 0 .2434 1

Call this matrix of probabilities M.

Represent the probabilities of being in states A, B, C, D, and E by P_A, P_B, P_C, P_D, and P_E.

Making these into a column vector V, the effect of one step of the process on the probabilities is the matrix product

M V

The effect of 4096 steps is

(M⁴⁰⁹⁶)V

This is very close to the vector (0, 0, 0, 0, 1). Consequently, the probability that 241 will not occur in a string of length 4096 is very close to 0.

That is, we have found it quite likely that the square 241 is empty because of a real restriction on the underlying dynamics, and not just because the data string is too sort.

Return to Second method.