Hello Language Nerd,
A friend of mine recently bought a book that claims to list about two dozen fundamental words from the “first language,” when humans were still one small group speaking together. It says that all languages have words descended from these originals. My friend thinks it’s great, but I’m not convinced – didn’t human language begin about 50,000 years ago? How could anyone possibly reconstruct words from that far in the past?
Your suspicions are well-founded! Have you considered taking up historical linguistics? (Corollary: sorry, friend-of-Andres, you have likely been hornswoggled.) We go through phases where these “global etymologies” become popular, and it has a heartwarming side effect: those cranky linguists, who argue so often, come together as one to trash them.
It’s tempting to try and build a global etymology. It would be cool, and make us all feel very interconnected and whatnot. And there’s a good chance that at one point all the humans on earth, all fifteen of them, did speak the same language (though the contrary idea that language showed up in many different groups also has its backers). The problem is that 50,000 years is a loooooooong time, and languages change rapidly. Our minds tend to boggle when we get into big numbers, so to put it in perspective, a mere 7,000 years ago English, Punjab, Pashto, and Greek were all one proto-language (see very cool chart). Can you speak Pashto based on this shared past? Odds are low. Can we reconstruct any of the words that are in both our languages’ pasts? Yes, a few, but it’s very difficult, and the answers we come up with can be tenuous. This is why linguists work on constructing what’s called Proto Indo-European: it is a difficult but possible task with the tools we have now. Proto-World, on the other hand, is septuple as many years away, which takes the task sadly outside the realm of possibility.
So this long time frame is the real problem. The proto-problem, if you will. A second problem, which is caused by (many, mostly amateur) would-be etymologizers trying to deal with the first problem, is this: astonishingly terrible methodology. Just… just amazing.
Lyle Campbell has written a paper titled The Ultimate Wreckage of Global Etylomogists (ok, not really, but it should be), which is very readable and had me in stitches,* so do take a peek. It’s aimed at the two biggest names in the biz, Bengston and Ruhlen, but he outlines the basic problem with the methodology of most global etymologists (hint: they don’t have one). Let’s break it into three flavors of badness, like Neapolitan shame.
1. They decide what sounds count as “close enough” based on nothing at all. So B&R are trying to prove that in Ye Verrie Firste Language “kuna” meant “woman,” by showing that lots and lots of languages today have words similar to “kuna” that mean “woman.” How close does a word need to be to “kuna” to be accepted? Decide for yourself what’s acceptable from this list:
Alright, which ones are you keeping? If you’re Bengsten and Ruhlen, all of them, and plenty more besides (pgs 306-307). And there’s no explanation for how they decided these words were based on one root. The words all look kinda similar, so why not?
2. What words have “close enough” meanings is also based on nothing. If you thought all the words in B&R’s list meant specifically “woman,” uh… nope. They range from “wife” to “mother” to “girl” to “spirit of a dead woman” to “queen.” The meanings for words in their list for “hole” (pgs 301-302) include “nostrils,” “grave,” “rear of army,” “back part,” “incision,” “shoulder,” “window,” “backwards,” “to tickle,” and somehow “to tickle a tired pig to make it go.” (I know you’re wondering – the last word is “kilikili,” from the Nggela language, though I haven’t been able to find any sources for this translation other than Ruhlen.)
3. There’s no attempt to remove words that are coincidentally similar. This is the killer, which the other two feed into. There are, well, lots of words out there in the world, and you can find plenty of words that are similar just by browsing around. As you accept more and more variations on the sounds and the meanings, your chances of (bad-)lucking into similar words skyrocket. Bengston and Ruhlen refer to the “truly miniscule probability of accidental similarities” (pg 281) in their works. This paper, on the other hand, maths it up and find that with their methodology there is a 100% chance of false similarities entering the data pool. Yes. 100. It is that bad. And this paper is not nearly as ball-busting as Campbell.
One of the frequent cries on the side of the global etymologists is, “But our methods give so MANY words!” or more fancily, “…the cumulative weight of all the evidence completely swamps whatever random errors may be scattered through the work” (pg 290). Mark Rosenfelder responded to this with a quote that is going on my wall: “A bad methodology doesn’t become more respectable just by repeating it.” Rosenfelder does his own excellent deconstruction of the global etymology business here, and calculates how many chance similarities you can expect between languages here (it’s a lot).
Ironically, words that are actually historically related tend to look pretty different from each other, because again, languages change pretty fast. Here are some Pahsto words that derived from the same roots as English words: sifer, yaw, dwa, dray, celour, penza, shpeg, owa, ata, naha, las. Any ideas?
They’re the numbers zero through ten.
So what, you’re asking (aren’t you?), are B&R’s words too different or too similar? Well, their whole concept is that similar-looking words come from a similar root, and that’s trouble in two places: they don’t have a good definition of what words count as the same, so by their own logic they’re too different; and actual cognates change over time, so by science-logic they’re too similar. Most important is that there’s no attempt in lists like these to check the histories of the words they find and make sure there’s some relation. Any global etymologist who casts a wide enough net to connect “shpeg” and “six” is going to catch far, far more coincidences.
Now despite all this stuff I’ve been saying, there are some ways for words to make their way into a wide swath of the world’s languages. Most obvious, perhaps, is that in our exciting interconnected modern society new words can travel quickly around the globe – most languages have a recognizable form of the word “computer,” for example, and “coca-cola” and “okay” have impressive spheres of influence too. Less obvious and to me more interesting, some words have a tendency to get themselves made, regardless of the language.
How does this happen? One word: babies. Very tiny babies trying to figure out how their mouths work tend to start with the lips, and well they should – lips are hilarious. When they first start babbling, they make lip-centric sounds like “mama,” “baba,” and “papa.” Nearby adults tend to be both excited and a little vain, they decide that the baby is talking about them, and these sounds end up as words for “mom” and “dad” in a huge swath of languages. Not all languages — some mix it up, like Japanese, where they figure that baby’s first “mama” probably means “food” — but plenty.
So clearly, the group that best understands universal language… sang California Dreamin’.
The Language Nerd
*Yes, really. What?
Got a language question? Ask the Language Nerd! email@example.com
Or: Twitter @AskTheLeague / facebook.com/asktheleagueofnerds
References today include Lyle Campbell’s paper, which is actually titled “What can we learn about the earliest human languages by comparing languages known today?” Dude I can totally help you next time you need to title something.
Or I guess Campbell could learn the Art of the Title Slam from Boe, Bessiere, Ladjili, and Audibert, who wrote “Simple combinatorial considerations challenge Ruhlen’s mother tongue theory.”
I don’t usually bother citing page numbers, because most of the stuff I link to is pretty short, but the Bengston and Ruhlen paper is quite long, so I got your back.
Pashto numbers from here
The Wikinator has a great list of “mama,” “papa,” and similar words across languages.
WAIT A MINUTE Mark “Sick Burnzz” Rosenfelder is the same guy who made the Language Construction Kit?! OH MAN HEY MARK WANNA BE PALS