Dimensions of Texts

The late physicist James Corbett proposed, perhaps not entirely seriously, a method of quantifying the complexity of texts. Diagram the sentences of the text, and string the diagrams together one after the other, forming a "text diagram." Put a unit mass at the location of each word, and starting from the center of the text diagram, compute M(r), the mass of words within a circle of radius r from the center. If the complexity of the text obeys a power-law scaling, a signature of a fractal structure, then

M(r) = k*r^d

Plotting log(M(r)) versus log(r), power-law dependence reveals itself by the data points falling along a line. The slope of the line is the dimension d of the text. For intermediate-length texts, Corbett found reasonable agreement with a power-law fit. Moreover, the simpler the text, the lower the dimension.

Because much of the structure of a text depends on grammatical restrictions, finding no power-law dependence would not have been surprising. What can we deduce from the presence of this sclaing?

Jim Corbett's dimension of diagramed sentence as a measure of linguistic complexity.