Driven IFS and Data Analysis

IFS Driven by Texts

Driven IFS is a potentially interesting tool for analyzing patterns in data, but we need to be careful with how the data are binned.

A text is a string, but in an alphabet of more than four symbols. How can we convert this into a string in an alphabet of four symbols?

One possibility is to treat words as the fundamental units of the text, and assign bins by parts of speech. This has an obvious problem: distinguishing how much of the driven IFS structure is due to the author's style, and how much to grammatical constraints.

Another choice is to treat letters as the fundamental units and assign bins by ignoring some letters, or grouping the letters together. Of course, any choices must be justified in a way reflecting properties of the text. This is not an easy problem.

Finding a meaningful translation of text into a symbol string suitable for driving an IFS is an intricate problem. But this is a very good feature of the driven IFS approach: it encourages experimentation with different methods, thought about open-ended problems, questions for which there are no right answers. In addition, it can lead to good mathematical questions about the driven IFS itself.

Here are four examples of recent student projects.

Investigating the authorship of chapters of Genesis using gemantria to convert text to numbers.

Comparing the sound patterns of sonnets using the soundex algorithm to convert text to numbers.

Olaf Schneider's analysis of shot duration and distance in film.

Phonological analysis of English texts.

Return to Data-Driven IFS.