The late physicist James Corbett proposed, perhaps not entirely seriously, a method of quantifying the complexity of texts. Diagram the sentences of the text, and string the diagrams together one after the other, forming a "text diagram." Put a unit mass at the location of each word, and starting from the center of the text diagram, compute M(r), the mass of words within a circle of radius r from the center. If the complexity of the text obeys a power-law scaling, a signature of a fractal structure, then
M(r) = k*rd
Plotting log(M(r)) versus
log(r), power-law dependence reveals itself by the data points falling along a line.
The slope of the line is the
dimension d of the text.
For intermediate-length
texts, Corbett found reasonable agreement with a power-law fit. Moreover, the
simpler the text, the lower the dimension.
Because much of the structure of a text depends on grammatical restrictions, finding
no power-law dependence would not have been surprising. What can we deduce from the
presence of this sclaing?
Jim Corbett's dimension of diagramed sentence as a measure of linguistic
complexity.