The Problems With Protos

Hello Language Nerd,

A friend of mine recently bought a book that claims to list about two dozen fundamental words from the “first language,” when humans were still one small group speaking together. It says that all languages have words descended from these originals. My friend thinks it’s great, but I’m not convinced – didn’t human language begin about 50,000 years ago? How could anyone possibly reconstruct words from that far in the past?

-Andres G.

***

Dear Andres,

Your suspicions are well-founded! Have you considered taking up historical linguistics? (Corollary: sorry, friend-of-Andres, you have likely been hornswoggled.) We go through phases where these “global etymologies” become popular, and it has a heartwarming side effect: those cranky linguists, who argue so often, come together as one to trash them.

We tooootally all speak as one, man. (Photo credit: Wikipedia)

It’s tempting to try and build a global etymology. It would be cool, and make us all feel very interconnected and whatnot. And there’s a good chance that at one point all the humans on earth, all fifteen of them, did speak the same language (though the contrary idea that language showed up in many different groups also has its backers). The problem is that 50,000 years is a loooooooong time, and languages change rapidly. Our minds tend to boggle when we get into big numbers, so to put it in perspective, a mere 7,000 years ago English, Punjab, Pashto, and Greek were all one proto-language (see very cool chart). Can you speak Pashto based on this shared past? Odds are low. Can we reconstruct any of the words that are in both our languages’ pasts? Yes, a few, but it’s very difficult, and the answers we come up with can be tenuous. This is why linguists work on constructing what’s called Proto Indo-European: it is a difficult but possible task with the tools we have now. Proto-World, on the other hand, is septuple as many years away, which takes the task sadly outside the realm of possibility.

So this long time frame is the real problem. The proto-problem, if you will. A second problem, which is caused by (many, mostly amateur) would-be etymologizers trying to deal with the first problem, is this: astonishingly terrible methodology. Just… just amazing.

Lyle Campbell has written a paper titled The Ultimate Wreckage of Global Etylomogists (ok, not really, but it should be), which is very readable and had me in stitches,* so do take a peek. It’s aimed at the two biggest names in the biz, Bengston and Ruhlen, but he outlines the basic problem with the methodology of most global etymologists (hint: they don’t have one). Let’s break it into three flavors of badness, like Neapolitan shame.

1. They decide what sounds count as “close enough” based on nothing at all. So B&R are trying to prove that in Ye Verrie Firste Language “kuna” meant “woman,” by showing that lots and lots of languages today have words similar to “kuna” that mean “woman.” How close does a word need to be to “kuna” to be accepted? Decide for yourself what’s acceptable from this list:

kwn-a
gene
hanoko
qinitu
wana
aganak
chana-da
en-okhono
hoonigi
atsia-xnis

Alright, which ones are you keeping? If you’re Bengsten and Ruhlen, all of them, and plenty more besides (pgs 306-307). And there’s no explanation for how they decided these words were based on one root. The words all look kinda similar, so why not?

2. What words have “close enough” meanings is also based on nothing. If you thought all the words in B&R’s list meant specifically “woman,” uh… nope. They range from “wife” to “mother” to “girl” to “spirit of a dead woman” to “queen.” The meanings for words in their list for “hole” (pgs 301-302) include “nostrils,” “grave,” “rear of army,” “back part,” “incision,” “shoulder,” “window,” “backwards,” “to tickle,” and somehow “to tickle a tired pig to make it go.” (I know you’re wondering – the last word is “kilikili,” from the Nggela language, though I haven’t been able to find any sources for this translation other than Ruhlen.)

3. There’s no attempt to remove words that are coincidentally similar. This is the killer, which the other two feed into. There are, well, lots of words out there in the world, and you can find plenty of words that are similar just by browsing around. As you accept more and more variations on the sounds and the meanings, your chances of (bad-)lucking into similar words skyrocket. Bengston and Ruhlen refer to the “truly miniscule probability of accidental similarities” (pg 281) in their works. This paper, on the other hand, maths it up and find that with their methodology there is a 100% chance of false similarities entering the data pool. Yes. 100. It is that bad. And this paper is not nearly as ball-busting as Campbell.

One of the frequent cries on the side of the global etymologists is, “But our methods give so MANY words!” or more fancily, “…the cumulative weight of all the evidence completely swamps whatever random errors may be scattered through the work” (pg 290). Mark Rosenfelder responded to this with a quote that is going on my wall: “A bad methodology doesn’t become more respectable just by repeating it.” Rosenfelder does his own excellent deconstruction of the global etymology business here, and calculates how many chance similarities you can expect between languages here (it’s a lot).

Ironically, words that are actually historically related tend to look pretty different from each other, because again, languages change pretty fast. Here are some Pahsto words that derived from the same roots as English words: sifer, yaw, dwa, dray, celour, penza, shpeg, owa, ata, naha, las. Any ideas?

…

They’re the numbers zero through ten.

So what, you’re asking (aren’t you?), are B&R’s words too different or too similar? Well, their whole concept is that similar-looking words come from a similar root, and that’s trouble in two places: they don’t have a good definition of what words count as the same, so by their own logic they’re too different; and actual cognates change over time, so by science-logic they’re too similar. Most important is that there’s no attempt in lists like these to check the histories of the words they find and make sure there’s some relation. Any global etymologist who casts a wide enough net to connect “shpeg” and “six” is going to catch far, far more coincidences.

Yes, the rumors are true: The Language Nerd was once an incredibly cute baby. (Photo credit: my dad, I think)

Now despite all this stuff I’ve been saying, there are some ways for words to make their way into a wide swath of the world’s languages. Most obvious, perhaps, is that in our exciting interconnected modern society new words can travel quickly around the globe – most languages have a recognizable form of the word “computer,” for example, and “coca-cola” and “okay” have impressive spheres of influence too. Less obvious and to me more interesting, some words have a tendency to get themselves made, regardless of the language.

How does this happen? One word: babies. Very tiny babies trying to figure out how their mouths work tend to start with the lips, and well they should – lips are hilarious. When they first start babbling, they make lip-centric sounds like “mama,” “baba,” and “papa.” Nearby adults tend to be both excited and a little vain, they decide that the baby is talking about them, and these sounds end up as words for “mom” and “dad” in a huge swath of languages. Not all languages — some mix it up, like Japanese, where they figure that baby’s first “mama” probably means “food” — but plenty.

So clearly, the group that best understands universal language… sang California Dreamin’.

Yours,

The Language Nerd

*Yes, really. What?

Got a language question? Ask the Language Nerd! asktheleagueofnerds@gmail.com
Or: Twitter @AskTheLeague / facebook.com/asktheleagueofnerds

References today include Lyle Campbell’s paper, which is actually titled “What can we learn about the earliest human languages by comparing languages known today?” Dude I can totally help you next time you need to title something.

Or I guess Campbell could learn the Art of the Title Slam from Boe, Bessiere, Ladjili, and Audibert, who wrote “Simple combinatorial considerations challenge Ruhlen’s mother tongue theory.”

If I had realized how great a job Rosenfelder did discussing this topic, I might not have bothered writing about it myself. But I found him pretty late in the game. Fortunately?

I don’t usually bother citing page numbers, because most of the stuff I link to is pretty short, but the Bengston and Ruhlen paper is quite long, so I got your back.

Pashto numbers from here

The Wikinator has a great list of “mama,” “papa,” and similar words across languages.

WAIT A MINUTE Mark “Sick Burnzz” Rosenfelder is the same guy who made the Language Construction Kit?! OH MAN HEY MARK WANNA BE PALS

5 Comments

1 Ping/Trackback

get more info

May 10, 2013

With havin so much written content do you ever run into any problems of plagorism or copyright violation?
My site has a lot of completely unique content I’ve either created myself or outsourced but it looks like a lot of it is popping it up all over the internet without my agreement. Do you know any techniques to help reduce content from being stolen? I’d really appreciate it.
- The League of Nerds
  
  May 12, 2013
  
  I haven’t noticed any plagiarism yet. This site is still pretty small, so it may be more of a problem in the future. If you learn any useful techniques, let me know! I’ll tell you if it starts happening, and if so, what we do about it.
collin237

June 3, 2013

Campbell’s article gives examples of similar but unrelated words in different languages. These demolish any theory of words separating from a knowable source. However, they suggest a different phenomenon of words converging. It may be that everyone has an innate tendency to prefer some sound sequences over others for meaning certain things. The sound similarities between distant languages might be destiny, not history.
- The League of Nerds
  
  June 4, 2013
  
  There are definitely onomatopoeic similarities across languages! Campbell goes into this on page 10 of the paper I linked to above, mentioning that words like “blow” and “choke” are natural candidates for words formed by imitation. Another good example is birds’ names — when they’re imitations of the birds’ cries (“crow,” “cuckoo”), they end up being very similar in different languages. Exactly how far all this extends is still debatable at the moment. Some subscribe to a much larger, grander version, which posits meanings for all sorts of sounds, like “i” and “e” signifying smallness (tiny, itty-bitty, mini) and “a” and “o” leaning towards bigness (enormous, large). But “small” and “big” are pretty serious counter-examples there, and I’ve never seen that version of imitative-word theory convincingly applied across languages. If you have, send me a link!
  
  I thought about going into all this when I wrote the original post, but man, it was already really long. Maybe I’ll do a follow-up.
The League of Nerds – A Glut of Glots

July 6, 2014

[…] ancient languages to reconstruct even more ancient ones, like Proto-Indo-European (but Proto-World, not so much). Some researchers try to piece together languages that have been lost to time, like Linear A or […]

The League of Nerds – A Glut of Glots on July 6, 2014 at 6:52 pm

Similar posts

5 Comments

1 Ping/Trackback

Top Posts

The Nerds

Archives