How Many Japanese Words do you need?
There are several different ways to learn vocabulary in Japanese. You can use Duolingo, Memrise, dictionaries, websites, textbooks, or whatever. Remembering words is important but that is not all. Which words you remember is also important. So, which words should you learn? The obvious answer would be words that are used the most. Great! So, the next question is how many should I learn? Again, the obvious answer would be at least enough to be able to be fluent in Japanese.
Now, it gets a bit harder from here. How many words is enough to be fluent in Japanese? Let’s take a look at this question and what to expect when learning Japanese vocabulary.
INDEX
How much Japanese Vocabulary do you need for Fluency?
This is a hard question to answer definitively because you need more than just vocabulary to be fluent. So, I will rephrase this questions to, how much vocabulary do you need to understand 98% of what you read? The answer would be somewhere from 4,000 to 5,000 words.
https://japaneseuniverse.com/2024/02/18/pac-man-in-japaneseconcepts-behind-development-to-global-icon/Why 98%?
98% is a magic number. If you understand 98% of the words in a text then you are able to read smoothly, and even fill in gaps with the context for the words you don’t understand.
Great! So does this mean that Japanese only has 5,100 words? The answer is no. The most commonly used 4,000 to 5,000 words are used 98% of the time. All the other words are used 2% of the time. This is nothing odd about Japanese, but is just how all languages work.
What is Text Coverage?
Let’s imagine we want to read something with 2,000 words total. We count every unique word in the text one time and we get 200 words. If we knew each of these 200 words we would understand 100% of the text. This is called text coverage, so knowing all the words would give us 100% text coverage. In other words, Text coverage means the number of unique words in a text we need to know to understand a certain percentage of the text.
In our case, we need to know 200 words to have 100% text coverage. If all words are used an equal number of times then you would see each word 10 times. So, if we know only 2 of those words we would understand 1% of the text, 20 words would give us 10% and so on. But, this is not how vocabulary is distributed in Japanese or any language.
What are Frequency Lists?
Find a text in Japanese and count all the words in that text. Then, rank those words by the most frequent to the least frequent. There, you have a frequency list. So, in our previous example if we ranked our 200 unique words this would be a frequency list.
With a frequency list you know which words are most common. That allows you to prioritize the most common words over the least common ones.
If you don’t want to make your own lists
If you don’t want to make your own frequency lists that is ok. There are lists lists out there which have compiled millions of words over different sources. These lists maybe from spoken or written sources and are ready for you anytime you want. I have included a link in the resources section at the bottom of the page from Wikipedia.
You can also just get a small pocket dictionary. The emphasis here is on small. These dictionaries will have the most common words.
辞書 – Why to Read the Dictionary to learn Japanese
How are Words Distributed in a Language?
We know that 4,000 or 5,000 words will give you 98% text coverage. But, how many words for say, 80%, 20%, or even 10% text coverage? There is actually a calculation for this called Zipfs law, but here are the distributions below:
- Top 15 Words: 25%
- Top 100 Words: 60%
- Top 1,000 Words: 85%
- Top 4,000 Words: 97.5%
Here we have a case of diminishing returns. The more vocabulary you learn the less often it will be used. But, 1,000 or even 5,000 words are not enough.
Great! So how does this help me?
The shorter the time from reading to recognizing a word the better you know it. As you learn words you stop listening to the individual sounds and hear words as individual units. Learning the top 5,000 words of Japanese will allow you to hear 98% of what you hear as word units. This makes everything sound shorter. It also makes it easier to distinguish between the words you know and don’t know. This then allows you to focus on what you don’t know so well, which allows you to understand everything better.
5,000 words are not enough but will help
5,000 words is a good goal to aim for to help your reading and listening comprehension, but you need 98% text coverage in order to be able to read comfortably without a dictionary. As stated above, Language is not distributed evenly, and just to get that extra 2% you need thousands more words.
The Meaning Comes from the Uncommon Words
The most common words in Japanese are built up to a big part of function words, and other words that set up the scene for the sentence. The real meaning however comes often from those less common words. If you learn only the top 5,000 words you can become an expert on talking about meaningless things. You would be able to communicate about daily activates and get by but you would have no depth. If you want to have real conversations about things you are interested you need the remaining 2%. These words are the bulk of Japanese and contain all the words for more specific topics.
If you want to talk about anime, games, cars, sports, food, culture, computers, politics, economics, or what ever you are interested in you need the 2%. This is where Japanese comes alive. Think about a situation where you may have been stuck in a room and forced to talk with someone that shared no interests with you. If you have only 5,000 words then it is the same as being a person with no interests in anything. Pretty boring right?
So, now you know the benefits and limitations of frequent words in Japanese. If you are interested in learning Japanese I have reviewed what I consider the best resource for Japanese Grammar below:
Resources
- Lexical threshold revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension
- Frequency Lists (Wikipedia)
- Crystal, David (1997), pp. 87. The Cambridge Encyclopedia of Language Second Edition: The Statistical structure of Language. Cambridge