Driven IFS and Data Analysis

Depth of History

If the bin of the current data point does not affect the bin of the next data point, then

Prob(address ij is occupied) = Prob(current data point is in bin i, given the previous was in bin j)

= P(i|j) (conditional probability)

= P(i)⋅P(j) (independence)

Denote by N(i) the number of driven IFS points with address i, and by N(ij) the number with address ij.

Then in a data set of M points, the probabilities are given by

P(i) = N(i)/M and P(ij) = N(ij)/M

This gives a simple test of independence: use Address Stats to find N(1) through N(4), and N(11) through N(44). Then compare each P(ij) with P(i)⋅P(j).

By how how much must they differ to deduce the difference is significant?

On the right are the numbers N(11) through N(44) for the previous random example.

For ease of comparison, on the left are the products of the probabilities, expressed in the same scale. That these numbers sum to 9999 instead of 10000 is a consequence of rounding.

33
353

34
660

43
660

44
1234

31
562

32
303

41
1050

42
568

13
562

14
1050

23
304

24
568

11
894

12
484

21
484

22
262

33
330

34
632

43
647

44
1252

31
623

32
294

41
1016

42
597

13
588

14
1063

23
314

24
566

11
860

12
479

21
490

22
248

Although the corresponding numbers are relatively close, they are not identical.

Nevertheless, these data points are independent, so corresponding values do not differ significantly. By how much they must differ to claim statistical significance is a more delicate point.

Return to first approach.