Last week I was on the twitters talking about “untranslatable” words. The idea was about Dr. Tim Lomas’ work on “untranslatable words,” or his term for how some languages have words that don’t have exact equivalents in other languages (but usually English). Right around the same time I posted my blog post, Lomas wrote an article in Psychology Today. Let’s have a look at it. If you want to see my thoughts on “untranslatable” words, go see my post on it and then come back.
Lomas claims that many concepts are non-English in origin. What this means is that the words used to describe these concepts are from other languages. I think this is opening a whole can of worms, but I’m willing to go with the idea that concepts can be “from another language”. For a bit. Let’s move on.
To prove his point, Lomas analyzes an article on positive psychology by Seligman and Csikszentmihalyi (2000). He looks for the etymology of every word in the text.
According to Lomas, there are:
1333 distinct lexemes
‘Native’ English words – belonging either to the Germanic language from which English emerged, or originating as neologisms in English itself – comprise only 39.4% of the sample (and 38% of the psychological words). Thus, over 60% of the general words (and 62% of psychological words) are loanwords, borrowed from other languages at some point in the development of English.
First, Lomas has a strange definition for “‘native’ English words”. Which “Germanic language” does he mean? Proto-Germanic? One of the other West Germanic languages? Old English? It’s also strange because Lomas’ definition means that these words are not native English words: they, table, blue, and orange. [Britney Spears gif says “huh?!” Oprah gif says “hrmmm?!”]
Lomas also doesn’t say exactly how he counted the words in the C&S article. He says that there are 1,333 “distinct lexemes”. The term lexeme is used in linguistics to talk about all the inflected forms of a word: singular and plural forms for nouns, present and past tense forms for verbs, etc. So runner and runners would be a part of the same lexeme RUNNER, and run, runs, ran, running are a part of RUN. Lexemes are also sometimes called “lemmas” in linguistics.
If Lomas really went through every single word in the article, then he spent a whole lotta time on this. The C&S article is 8,124 words long (not including the References section). He doesn’t say how he did the work, but I used some corpus linguistics methods and got different results. I checked the C&S article against the Someya lemma list in AntConc and found 1,750 lemmas, or 417 more lexemes than Lomas found. This is a large difference and I’m not sure how to explain it. Maybe Lomas didn’t divide his words based on parts of speech? So he counted ran and runner as part of the same lexeme? I don’t know.
Second, let’s look at counting the words in language. Lomas seems to do a straight count. That means one instance of one form of a lexeme is equal to all the other instances. For Lomas, it doesn’t matter how many times a word occurs. In corpus linguistics, however, frequency is a big deal. I’m not going to go through the theoretical points here, but basically if a word is more frequent then it is more important or worthy of being looked at (hehe, fight me, corpus linguists).
So, Lomas claims that only 39% of the lexemes in the article are “native English words”. I took the lexemes in the article and ranked them based on frequency (using AntConc). Then I went through the 100 most frequent lexemes on the list and looked at their etymology. My numbers look much different than Lomas’. I found that 85% of the 100 most frequent lexemes are English in origin. That is, the 100 most frequent lexemes occur a total of 4,440 times in the article (so the lexeme the occurs 442 times, the lexeme of occurs 308 times, the lexeme BE occurs 300 times, and so on) and of these occurrences, 3,767 are English words. This isn’t particularly intriguing – you’ll probably find a similar percentage with any text in English. [See the bottom of this post for my data.]
Looking at this from another angle, we could treat each of the 100 most frequent lexemes as equal – forgetting about how often they occur. Then we find that 70 of them are English, while 30 of them come from another language. This is closer to Lomas’ numbers, but still pretty far off: 70 of the 100 most common lexemes in the article are still English words.
Of course, words in language do not really occur in the way that we’re looking at them. The most common word is the with 442 instances, but the first 442 words of the article are not all the. The word the is sprinkled around the article (you know, where the grammar of English calls for it). I’m not sure how to get to Lomas’ numbers. We could assume that every lexeme outside the 100 most frequent were non-English, but that only gets us down to 46% of the words in the article as being English lexemes. Lomas’ ratio was 40% English to 60% non-English.
Later in the article, Lomas says that 234 words were treated as English in origin in his analysis. But this means that only 17% of the words in his counting are English in origin (234/1,333=0.17). What’s going on here? If 39.4% of the lexemes in the article are English in origin, and there are 1,333 total lexemes in the article (according to Lomas), then there should be 525 English words. Where he gets 234, I don’t know. Let’s move on.
Lomas’ includes two graphs to visualize his findings but they’re pretty weird. The graph below “shows the influx of words according to the language of origin (with the century in which they entered English as stacks within them)”. Look at the third column.
English words entered English? I don’t get it. Or Germanic words from before the 12th century are not English words? What’s going on here? I guess in Lomas’ counting, Germanic and English lexemes are English lexemes, but then he splits them up in the graph? Are the words me, myself and I not English words? It seems very strange to me to cut things up like this and I would like to see his list of etymologies, or his rationale for doing so.
Agree to disagree?
But there are places that I can agree with Lomas. At the end of the article, he writes:
In these ways does our understanding of life become complexified and enriched. In that respect, one can make the case that English-speaking psychology would do well to more consciously and actively engage with other languages and cultures. Its understanding of the mind has benefited greatly from English incorporating loanwords over the centuries. If one accepts that premise, it follows that psychology would continue to develop from this kind of cross-cultural engagement and borrowing – including, of course, through collaboration with scholars from non-English speaking cultures themselves. One such way in which the field might develop is through inquiring into untranslatable words, since these constitute clear candidates for borrowing (given that they lack an exact equivalent in English). I myself have sought to promote this kind of endeavor, with my ongoing creation of a cross-cultural lexicography of untranslatable words relating to well-being.
I definitely agree with the first part of this. We should engage with speakers of other languages and people from other cultures (although Lomas’ wording seems to present all English speakers as a monolithic culture). I find it hard for anyone to not accept the premise that English (not just “English-speaking psychology”) has benefited greatly from incorporating loanwords. That’s kind of just a fact of language – borrowing words is one of the things that living languages do and so English is still a living language partly for this reason. But I totally agree that people should collaborate with people from different cultures (although again, Lomas’ wording blurs the distinction between language and culture too much for me and again presents English speakers as one culture).
When Lomas goes into the sales pitch in the second to last sentence, I can’t sign on, particularly based on what I’ve seen of his research into “untranslatable” words (in my last post and in this one and in a later one to come).
Lomas’ claims are true – we should reach out to people who speak other languages. But he should perhaps recognize that the reason that English has so many words from Latin and Ancient Greek is because these were once prestigious languages (and to a large extent still are in academia). It wasn’t because the Latin-speaking or Greek-speaking cultures had anything more special than other cultures, but it was believed that by using these languages people would be more civilized. Of course, we know what happened to the Latin-speaking and (Ancient) Greek-speaking cultures. They dead.
But we in English-speaking cultures could just as easily have adapted Finnish words to use in the fields of psychology and linguistics, but Finnish was never considered a prestigious language. Or consider German: once German raised its standing, we got words from German to describe abstract concepts because the texts describing them were written in German and people were supposed to know German to engage in the debate.
There’s more to say about all this and I’ll be back at cha with a later post. I’ll link to it when I write it.
Spreadsheet with my analysis. The first sheet is the Someya lemma list analysis. I counted words from Anglo-Norman as not being English. I’m including the 3rd person plural pronouns (they, them, their, themselves) as being English. Illness counts as English. The second sheet uses AntConc’s Word List tool, so it’s not a lexeme/lemma analysis, it treats every “word” as separate (that is, was, am, and is are separate words, not part of the lexeme BE).
Link to download the C&S article as a plain text file (.txt) which was used with AntConc in the analysis. The References section is excluded. And here’s a link to download a POS-tagged version of the article (using CLAWS7).