Complexity Is Not Complicated: Graham Morehead at TEDxUMaine

This talk was given by Graham Morehead, Curator of The Pangeon at the 2014 TEDxUMaine.

A man rots unjustly in jail, the sea urchins are dying, and Google Translate doesn't work.  Morehead ties together these three disjoint narratives using starlings.

His talk tackles the concept of a complex system.  Being complex is not the same as being complicated.  Even though no entity within a system can see the whole picture, undeniable large-scale patterns emerge from the system.  As we learn to describe this emergence mathematically, in terms of entropy, stability, and far-from-equilibrium dynamics, we will better understand systems like markets, ecologies, and life itself.

Pragmatic Labeling Based on Body Language


At present, the study of syntax and semantics has yielded certain results in the field of computational linguistics, while pragmatics has not been paid as much attention. The lack of paralinguistic research, the research of body language, is obvious. Among numerous definitions of body language we will use two in this paper: firstly, gestures, postures and face expressions accompanying speech or occurring independently, and secondly, it is a language used by deaf and mute people, or sign language.

Body language, in its first meaning plays an auxiliary role in language understanding but is also of vital importance for disambiguation. Body language in its second meaning is the only communication tool for mute and deaf people. In order to help them to become full members of the society and to remove communication barriers, scholars should pay more attention to the research of sign language.

The first step of natural language processing is its formalization. Usually tagging or labeling is meant by formalization. It is not a challenge for text processing anymore, but for body language processing one should deal with video, that is why there has been little research conducted in the field of computational linguistics so far. To date, most of body language research is highly descriptive and lacks formality, thus cannot be used in computational linguistics.

In this paper we are using the methods of componential analysis to perform formal description and tagging of body language. Componential analysis splits the unit of interest (here it is a gesture) into several components. For body language analysis these components include the part of the body involved into gesture production, the place, the direction of the motion, etc. Using this method for body language processing has the following advantages:


  • Objectiveness: personal characteristics or cultural background of the signer cannot influence the analysis and description;
  • Adaptability: the components used are relatively independent and are not correlated, thus if existing components appear to not be enough to describe some gestures, new components can be added and it will not affect the existing ones. Vice verse, if existing components are redundant for the development aim, some of them may be omitted, which will not influence the result (for instance, sign language tagging requires more components comparing to body language, because every movement is very detailed and needs to be described very precisely, for body language tagging it is not needed to use as many components);   
  • Formal aspect is very strong: research results may be presented as a binary tagset which can be used in applications development directly.


  • Universality: for componential analysis we do not need to take the meaning of the gesture into consideration, thus it may be used to process body language and sign language of any country;

This paper is not only in depth research of body language, but it is also a useful theoretical and reference material for further research and development. The results of the work have been used in several state and Tsinghua University projects.


The paper will be posted when possible. 


What is Language?


Language is beautiful and language is functional.

Language is poetry, song, and prose.

Language is the linear expression of ideas.

For us humans, language is a technology that goes back over one hundred thousand years.  With it I can extract a complex concept from my mind and thrust it into yours.  If I be an educator of sufficient skill, I can do so almost against your will.

By language we express love.  By language we teach our sons and daughters.  By it we meet many of our deepest needs, and those of our loved ones. 


Language is a game changer.  By it we stand on the shoulders of giants. If not for language, how could I have ever studied physics and mathematics?  How could I have absorbed so much in such a short time?  In just four years I went from Newton to Feynman.  Years later I caught up on string and M theory with the help of a few Brian Greene books.  Without words I would have never learned so much so quickly.

Language is a scaffold.  Basic concepts form a foundation.  Upon them the building of a mature language finds its footing.  On top, the most ambitious and nuanced topics are represented.  Only the most ambitious lifelong learners will seek them out and understand them.

Language is a palette.  Literal words are primary colors.  We mix them infinitely.  Poetry lives in the connotations, the overtones, the improbable juxtaposition of secondary and tertiary meanings of words.

To the computer scientist, a language is defined by an alphabet of symbols, and a grammar (this is the layman's version).  Some symbols are replaceable via some rule in said grammar.  The grammar is where the magic happens.  It defines all possible replacements.  Every sentence begins with the start symbol-- the first of these replaceable symbols.  To generate an English sentence, you might replace the start symbol with '<subject> <verb> <object>', and you could replace each of these (<subject> -> 'the man', <verb> -> 'bit', <object> -> 'the dog'),  to generate: 'The man bit the dog'.  A sufficiently complex English grammar should contain enough rules to generate any grammatically-correct sentence you've ever heard.  A finite alphabet + a finite grammar = an infinite language.

Languages come in different flavors, but more importantly, they come in different complexity classes.  A mathematical complexity class tells you something about how free or constrained a given language is.  Some people believe that all human languages are equally complex.  Others believe that a few are slightly more complex (e.g. Swiss German, Bámbara).  Even those who think they're all the same cannot agree on the correct complexity class (context-free? or context-sensitive?).

Language is a distillation of the human mind.  It is at once my intellectual love and how I express my love to other intellects.  With language I connect with the nobility of other human souls.  Even the great Archimedes, who proved so many things by pure geometry, exhaled his last expression with words, "μή μου τοὺς κύκλους τάραττε!"  ("Don't mess with my circles!")


As for language, I intend to use it, and abuse it, all the rest of my days.

Fractal Writing Style

It is easiest to read and absorb materials that are structured fractally.  The general idea should be in the first paragraph.  The main points should be summarized in the headings.  Each subpoint should be the first sentence of a paragraph.  Specific ideas and examples should inhabit those paragraphs.  Much of nature around us is structured fractally, including the brain itself.  We have evolved to absorb fractals and remember them.  Fractally structured reading materials best facilitate efficient information transferrance (i.e.: they’re easy to read).


A fern is a perfect example of a fractal.  Think of a document as a fern.  First consider the overall shape of the fern; then notice its many leaves coming off the main stem, then notice each subleaf coming off of each leaf, ad infinitim…  Most of us are satisfied after noticing only two or three levels of detail of a fern.  Similarly, we are often satisfied after absorbing two or three levels of detail from a document.  A reader should be able to absorb as many levels as desired without weeding through details.


When you go deeper in a fractal you get more detail.  Back in the days of dial-up many photos were in a format called Progressive JPEG.  A picture in this format appears on the screen very quickly, but in low resolution.  The pixels are so big at first that sometimes you can’t tell what it is.  After a few seconds (in the old slow days), the resolution would seem to double, and a little while after that it would double again, getting clearer and clearer, until the entire image had downloaded.  Pictures in this format are not exactly fractals, but they share the important property of progressive levels of detail.

Suppose you have a question, and the answer you need is represented by a branch on a tree.  You start at the trunk.  As you move up the tree the trunk splits into several smaller trunks.  You choose the one closest to your desired answer.  You keep moving to successively smaller branches until you find your answer.   Each successive choice provides an answer with greater and greater detail.  From an Information-theoretic perspective, if any single branch can only split into two branches, each split represents one extra bit.



Is language the stuff of thought?  Are thoughts defined in terms of language or is it the other way around?  This may be relevant to showing that the brain and mind may be fractal.

The Sapir-Whorf Hypothesis purports that language determines or at least influences language.  If you don’t have a word for something, you can obviously still think about it, but maybe that thought is only possible in your mind because of the structure built through learning and using language.  Among the earliest to support this idea were Lenneberg and Brown.  In a 1954 paper they put it beautifully:

Language is not a cloak following the contours of thought. Languages are molds into which infant minds are poured.

They believed that “Namable categories are nearer to the top of the cognitive deck,” i.e.: Having a name for something makes it easier to remember and distinguish from similar things.  After many experiments, however, it was commonly thought that the Sapir-Whorf was wrong, or at best only true in its weak form.


There may, however, be evidence that strong Sapir-Whorf is plausible.  Consider ‘speakers’ of the Nicaraguan Sign Language (NSL).  A cluster of deaf language-less children in the 1970’s in Nicaragua were brought together by the creation of a new school.  The school didn’t teach them sign language, or any language, so they created one out of nothing. Fascinated,  Judy Shepard-Kegl decided to study them and attempt to interpret.  At one point she gave these students a simple test.  Imagine a boy who puts his toy under his bed and tells his little brother not to touch it.  After leaving the room, the little brother takes the toy and hides it in the closet.  Upon returning, where will the big brother look for his toy?  Under the bed, obviously, because he still believes it’s there.  English speakers pass this test with ease before the age of six.  These students failed this test, even when they were adults.

Not having the right words may have impaired these signers.  NSL was not as descriptive as other languages.  For instance, it only had one word for anything to do with thought (e.g.: believe, think, remember, understand, etc.).  Over the years, as younger deaf students came to the same school, NSL evolved into greater complexity.  Younger signers had many words pertaining to thought.  There is now a separate word for “I know something you don’t,” and “I know something you do know.”  It turns out that these younger students passed the same test that their older alumni failed.  Even more surpising was that just a few years later the tests were repeated.  This time the older students had improved greatly.  The younger students had graduated and were hanging out at the local deaf community center.  They brought with them their richer dialect.  It is possible that a more nuanced lexicon enables greater understanding [more].



If language and thought are tightly coupled, then perhaps thought is Context-Free.  “Context-Free” (CF), refers to one level in Noam Chomsky‘s hierarchy of languages, a formal mathematical categorization of all possible types of languages.  In short, CF languages are those where any string can be generated by replacing one symbol with one or more other symbols.  The fact that only one symbol is considered in each replacement is why it’s called CF.  For instance, one could start with the following skeleton of an english sentence: S V O, (AKA: Subject Object Verb).  Each symbol could be replaced by english-appropriate rules and the sentence would still be english (e.g.: S→I, V→love, O→you).  The sentence becomes “I love you.”

This topic deserves so much more detail, but the point here is that a finite set of simple rules can generate any english sentence.  There has been some debate that Dutch, Swiss-German, and Bambara are not CF languages, but there are ways to get around this argument (If you really want to know, I think any string A^n B^n A^n can be generated by a CF grammar if we know the limit on n).  Applying a replacement rule to a symbol is much like looking at a fern frond and generating the next level of detail.



Josh Tenenbaum has human learning all figured out.  Without relying on linguistic examples he shows how people organize their knowledge in CF forms.  He says that humans learn by example, but for a long time it has been believed that we learn too quickly.  No one could explain how we learn to speak so well after being exposed to so little (by the age of three no human has been exposed to enough language for any of our computer models to generate correct language).  There must be some inductive bias.  He makes the case that the only inductive bias needed in order to explain human learning is this:

All knowledge is stored in forms generated by a single Context-Free Graph Grammar (CFGG). [more]

For instance, the ancients classified all living things in a linear Great Chain of Being.  Now we use the Tree of Life.  Both chains and trees can be generated by a CFGG.  Instead of replacing a symbol by other symbols, consider replacing a node in a tree with a node that has two branch nodes.  Using replacement rules one node can become either a chain or a tree.  Tenenbaum identified other structures as well.

CFGG’s can be further generalized to include all forms.  Such an idea has a clear application for fractals.  Let a form (any object), be considered in a limited amount of detail.  Greater detail can be generated by applying a replacement rule to a chosen subsection of the form.  For example, imagine a fern with a single stem and some leaves.  For greater detail, choose a single leaf and replace it with another stem with leaves.  Get the picture?




If our brains are wired to understand in fractals, and much of nature is fractal, perhaps the best way to communicate is fractally.  Language itself may be fractal, but just using language isn’t enough.  Many documents and papers are dense and not prepared for efficient communication to a heterogenous set of readers.  Those who want all the details can read every word.  Those who want the gist should be able to glean it in a few seconds.  Finally, Readers who want something between the gist and every detail should be able to get exactly what they want by naturally browsing the document in greater and greater detail.  You have failed to write fractally when a reader must wade through copious details just to absorb a medium-level understanding.  I hope I haven’t done that to you today.

As a final example, consider the archetypical newspaper article.  It has a headline, byline, gist, detail, and more detail.  A reader may stop reading whenever his or her particular desire for detail is sated.