A Frequency Dictionary of Japanese



Similar documents

L1 What Can You Blood Type Tell Us? Part 1 Can you guess/ my blood type? Well,/ you re very serious person/ so/ I think/ your blood type is A. Wow!/ G

平成29年度英語力調査結果(中学3年生)の概要

3

-2-

open / window / I / shall / the? something / want / drink / I / to the way / you / tell / the library / would / to / me


NO


Answers Practice 08 JFD1


高2SL高1HL 文法後期後半_テキスト-0108.indd

L3 Japanese (90570) 2008


C. S2 X D. E.. (1) X S1 10 S2 X+S1 3 X+S S1S2 X+S1+S2 X S1 X+S S X+S2 X A. S1 2 a. b. c. d. e. 2


What s your name? Help me carry the baggage, please. politeness What s your name? Help me carry the baggage, please. iii

教育実践上の諸問題

Microsoft Word - j201drills27.doc

S1Šû‘KŒâ‚è

Bull. of Nippon Sport Sci. Univ. 47 (1) Devising musical expression in teaching methods for elementary music An attempt at shared teaching


日本語教育紀要 7/pdf用 表紙


untitled

きずなプロジェクト-表紙.indd

tikeya[at]shoin.ac.jp The Function of Quotation Form -tte as Sentence-final Particle Tomoko IKEYA Kobe Shoin Women s University Institute of Linguisti

/™Z‚å‰IŠv‚æ36“ƒ /fi¡„´“NŠm†€


\615L\625\761\621\745\615\750\617\743\623\6075\614\616\615\606.PS

Warm Up Topic Question Who was the last person you gave a gift to? 一番最近誰にプレゼントをあげましたか? Special Topics2

駒田朋子.indd


<4D F736F F F696E74202D CEA8D758DC E396BC8E8C F92758E8C81458C E8C81458F9593AE8E8C>


elemmay09.pub

Level 3 Japanese (90570) 2011


NINJAL Project Review Vol.3 No.3

生研ニュースNo.132


A pp CALL College Life CD-ROM Development of CD-ROM English Teaching Materials, College Life Series, for Improving English Communica

untitled

3re-0010_an

CONTENTS Public relations brochure of Higashikawa September No.755 2

Juntendo Medical Journal

Microsoft Word - j201drills27.doc

NINJAL Research Papers No.8

ñ{ï 01-65


Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

untitled

第16回ニュージェネレーション_cs4.indd

Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]



大学論集第42号本文.indb

評論・社会科学 84号(よこ)(P)/3.金子


Microsoft Word - ??? ????????? ????? 2013.docx

千葉県における温泉地の地域的展開

授受補助動詞の使用制限に与える敬語化の影響について : 「くださる」「いただく」を用いた感謝表現を中心に

sein_sandwich2_FM_bounus_NYUKO.indd

三浦陽一.indd

高等学校 英語科

橡LET.PDF

2

< D8291BA2E706466>

<95DB8C9288E397C389C88A E696E6462>

01-望月.indd

(1) i NGO ii (2) 112

P


05[ ]櫻井・小川(責)岩.indd

11_寄稿論文_李_再校.mcd

H24_後期表紙(AB共通)


178 New Horizon English Course 28 : NH 3 1. NH 1 p ALT HP NH 2 Unit 2 p. 18 : Hi, Deepa. What are your plans for the holidays? I m going to visi

先端社会研究 ★5★号/4.山崎

A Nutritional Study of Anemia in Pregnancy Hematologic Characteristics in Pregnancy (Part 1) Keizo Shiraki, Fumiko Hisaoka Department of Nutrition, Sc

鹿大広報149号


Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science,

平成23年度 児童・生徒の学力向上を図るための調査 中学校第2 学年 外国語(英語) 調査票

untitled

,

English Locomotion 参加して学ぶ総合英語 JACET 教材開発研究会編著

untitled

K-A05.dvi

16_.....E...._.I.v2006

™…


-March N ~ : National Statistical Office,n.d., Population & Housing Census Whole Kingdom National Statistical Office,, Population & Housing C

井手友里子.indd

46

はじめに

untitled



.N..

Transcription:

A Frequency Dictionary of Japanese A Frequency Dictionary of Japanese is an invaluable tool for all learners of Japanese, providing a list of the 5,000 most commonly used words in the language. Based on combined corpora of over 107 million words covering spoken and written, fiction and non-fiction registers this dictionary provides the user with a detailed frequency-based list, as well as alphabetical and part-of-speech indexes. All entries in the frequency list feature the English equivalent and a sample sentence with English translation. The dictionary also contains 25 thematically organized lists of frequently used words on a variety of topics such as food, weather, occupations and leisure. Numerous bar charts are also included to highlight the phonetic and spelling variants across register. A Frequency Dictionary of Japanese enables students of all levels to maximize their study of Japanese vocabulary in an efficient and engaging way. It is also an excellent resource for teachers of the language. Yukio Tono is Professor at the Graduate School of Global Studies, Tokyo University of Foreign Studies. Makoto Yamazaki is Associate Professor at the Department of Corpus Studies, the National Institute for Japanese Language and Linguistics. Kikuo Maekawa is Professor at the Department of Corpus Studies, the National Institute for Japanese Language and Linguistics.

Routledge Frequency Dictionaries General Editors: Paul Rayson, Lancaster University, UK Mark Davies, Brigham Young University, USA Editorial Board: Michael Barlow, University of Auckland, New Zealand Geoffrey Leech, Lancaster University, UK Barbara Lewandowska-Tomaszczyk, University of Lodz, Poland Josef Schmied, Chemnitz University of Technology, Germany Andrew Wilson, Lancaster University, UK Adam Kilgarriff, Lexicography MasterClass Ltd and University of Sussex, UK Hongying Tao, University of California at Los Angeles, USA Chris Tribble, King s College London, UK Other books in the series: A Frequency Dictionary of Arabic A Frequency Dictionary of Mandarin Chinese A Frequency Dictionary of Czech A Frequency Dictionary of American English A Frequency Dictionary of French A Frequency Dictionary of German A Frequency Dictionary of Portuguese A Frequency Dictionary of Russian (forthcoming) A Frequency Dictionary of Spanish The Frequency Dictionaries are all available as data CDs. These CD versions are specifically designed for use by corpus and computational linguists. They provide the frequency corpus in a tab-delimited format allowing users the flexibility to process the material for their own research purposes.

A Frequency Dictionary of Japanese Core vocabulary for learners Yukio Tono, Makoto Yamazaki and Kikuo Maekawa Routledge Taylor & Francis Group LONDON AND NEW YORK

First published 2013 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Simultaneously published in the USA and Canada by Routledge 711 Third Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business 2013 Yukio Tono, Makoto Yamazaki and Kikuo Maekawa The right of Yukio Tono, Makoto Yamazaki and Kikuo Maekawa to be identified as authors of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data Tono, Yukio. A frequency dictionary of Japanese : core vocabulary for learners / Yukio Tono, Kikuo Maekawa and Makoto Yamazaki. p. cm. (Routledge frequency dictionaries) Includes bibliographical references and index. 1. Japanese language Word frequency Dictionaries. 2. Japanese language Dictionaries. 3. Japanese language Textbooks for foreign speakers English. I. Makawa, Kikuo. II. Yamazaki, Makato. III. Title. PL685.T593 2013 495.6 321 dc23 2012021445 ISBN: 978-0-415-61012-4 (hbk) ISBN: 978-0-415-61013-1 (pbk) ISBN: 978-0-415-60104-7 (CD) Typeset in Parisine by Graphicraft Limited, Hong Kong

Contents Thematic vocabulary lists vi Series preface vii Acknowledgments ix Abbreviations x Introduction 1 References 9 Frequency index 10 Alphabetical index 247 Part of speech index 302 Word types (origins) 357

Thematic vocabulary lists 1 Animals 20 2 Body 29 3 Clothing 38 4 Colors 47 5 Countries 55 6 Emotions 65 7 Family 77 8 Food 86 9 Furniture 96 10 Greetings 105 11 House 114 12 Leisure 123 14 Plants 141 15 School 151 16 Shops 160 17 Sports 169 18 Taste 178 19 Time 188 20 Transportation 197 21 Weather 206 22 Words including letters of the alphabet 215 23 o-/go- (Honorifics) 224 24 Honorific expressions 233 25 Numbers/numerals 243 13 Occupations 132

Series preface Frequency information has a central role to play in learning a language. Nation (1990) showed that the 4,000 5,000 most frequent words account for up to 95 per cent of a written text and the 1,000 most frequent words account for 85 per cent of speech. Although Nation s results were only for English, they do provide clear evidence that, when employing frequency as a general guide for vocabulary learning, it is possible to acquire a lexicon which will serve a learner well most of the time. There are two caveats to bear in mind here. First, counting words is not as straightforward as it might seem. Gardner (2007) highlights the problems that multiple word meanings, the presence of multiword items, and grouping words into families or lemmas present in counting and analysing words. Second, frequency data contained in frequency dictionaries should never act as the only information source to guide a learner. Frequency information is nonetheless a very good starting point, and one which may produce rapid benefits. It therefore seems rational to prioritise learning the words that you are likely to hear and read most often. That is the philosophy behind this series of dictionaries. Lists of words and their frequencies have long been available for teachers and learners of language. For example, Thorndike (1921, 1932) and Thorndike and Lorge (1944) produced word frequency books with counts of word occurrences in texts used in the education of American children. Michael West s General Service List of English Words (1953) was primarily aimed at foreign learners of English. More recently, with the aid of efficient computer software and very large bodies of language data (called corpora), researchers have been able to provide more sophisticated frequency counts from both written text and transcribed speech. One important feature of the resulting frequencies presented in this series is that they are derived from recently collected language data. The earlier lists for English included samples from, for example, Austen s Pride and Prejudice and Defoe s Robinson Crusoe, thus they could no longer represent present-day language in any sense. Frequency data derived from a large representative corpus of a language brings students closer to language as it is used in real life as opposed to textbook language (which often distorts the frequencies of features in a language, see Ljung, 1990). The information in these dictionaries is presented in a number of formats to allow users to access the data in different ways. So, for example, if you would prefer not to simply drill down through the word frequency list, but would rather focus on verbs for example, the part of speech index will allow you to focus on just the most frequent verbs. Given that verbs typically account for 20 per cent of all words in a language, this may be a good strategy. Also, a focus on function words may be equally rewarding 60 per cent of speech in English is composed of a mere 50 function words. The series also provides information of use to the language teacher. The idea that frequency information may have a role to play in syllabus design is not new (see, for example, Sinclair and Renouf, 1988). However, to date it has been difficult for those teaching languages other than English to use frequency information in syllabus design because of a lack of data.

viii Series preface Frequency information should not be studied to the exclusion of other contextual and situational knowledge about language use and we may even doubt the validity of frequency information derived from large corpora. It is interesting to note that Alderson (2007) found that corpus frequencies may not match a native speaker s intuition about estimates of word frequency and that a set of estimates of word frequencies collected from language experts varied widely. Thus corpus-derived frequencies are still the best current estimate of a word s importance that a learner will come across. Around the time of the construction of the first machine-readable corpora, Halliday (1971: 344) stated that a rough indication of frequencies is often just what is needed. Our aim in this series is to provide as accurate as possible estimates of word frequencies. References Paul Rayson and Mark Davies Lancaster and Provo, 2008 Alderson, J. C. (2007) Judging the frequency of English words. Applied Linguistics, 28 (3): 383 409. Gardner, D. (2007) Validating the construct of Word in applied corpus-based vocabulary research: a critical survey. Applied Linguistics, 28, pp. 241 65. Halliday, M. A. K. (1971) Linguistic functions and literary style. In S. Chatman (ed.) Style: A Symposium. Oxford University Press, pp. 330 65. Ljung, M. (1990) A Study of TEFL Vocabulary. Almqvist & Wiksell International, Stockholm. Nation, I. S. P. (1990) Teaching and Learning Vocabulary. Heinle & Heinle, Boston. Sinclair, J. M. and Renouf, A. (1988) A lexical syllabus for language learning. In R. Carter and M. McCarthy (eds) Vocabulary and Language Teaching. Longman, London, pp. 140 58. Thorndike, E. (1921) Teacher s Word Book. Columbia Teachers College, New York. Thorndike, E. (1932) A Teacher s Word Book of 20,000 Words. Columbia University Press, New York. Thorndike, E. and Lorge, I. (1944) The Teacher s Word Book of 30,000 Words. Columbia University Press, New York. West, M. (1953) A General Service List of English Words. Longman, London.

Acknowledgments We are first and foremost grateful to Paul Rayson and Mark Davies for their guidance and suggestions throughout the project. We thank Adam Kilgarriff for reviewing our draft proposal and giving us useful suggestions. We are especially indebted to Yukari Honda, a postgraduate student at Tokyo University of Foreign Studies, who has worked closely with the first author to organize the team of research assistants in writing the draft entries. Without her dedication, this work would not have been possible. We are also indebted to a number of research assistants who helped with this project: Yukari Honda, Makiko Kobayashi, Kanako Maebo, Kimie Abo, Satomi Kurusu, Tomoyo Fujita, Atsuko Yamashita, and Fumiko Watanabe. Special thanks to the National Institute for Japanese Language and Linguistics for the use of the Balanced Corpus of Contemporary Written Japanese (BCCWJ) and the Corpus of Spontaneous Japanese (CSJ). Thanks also go to the Japan Society for the Promotion of Science for their financial support. Yukio Tono Makoto Yamazaki Kikuo Maekawa

Abbreviations Part of speech Example adn. adnominal 30 その sono adn. that adv. adverb 39 そう soo adv. so, such aux. auxiliary 50 せる seru aux. CAUSATIVE conj. conjunction 31 けれど keredo conj. though, although cp. compound 13 ている, てる te iru, te ru cp. CONTINUATION i-adj. i-adjective 47 無 い nai i-adj. There is no..., no... interj. interjection 18 えー, ええ ee interj. eh?, what?; well, yes n. noun 17 事 koto n. thing na-adj. na-adjective 121 風 fuu na-adj. style, type, way, like num. numeral 1094 十 juu num. ten p. particle 11 も mo p. too, also p. case case particle 12 で de p. case in; at; from; by p. conj. conjunctive particle 8 て te p. conj. REASON p. disc. discourse particle 26 ね ne p. disc. isn t it?, don t you? prefix prefix 2301 御 o prefix POLITENESS pron. pronoun 40 何 nani pron. what; something; anything; nothing suffix suffix 1694 等 tou suffix and so on v. verb 19 言 う iu, yuu v. say, speak, talk Register Example BK books 1 の no p. case of; in; at; for; by 彼 はこの 大 学 の 学 生 だ He is a student at this university. 47078 0.98 BK NM newspapers & magazines 564 午 後 gogo n. afternoon, p.m. 午 後 会 議 がある There is a meeting in the afternoon. 118 0.37 NM OF official documents 106 ておる te oru cp. CONTINUATION (polite) お 返 事 をお 待 ちしております I look forward to hearing from you at your earliest convenience. 1017 0.63 OF SP spoken 23 ま ー,ま あ maa interj. Wow!, Oh my God! ま ー な ん て 素 晴 らし い ん で しょう Wow! That s amazing! 6950 0.04 SP WB web 14 です desu aux. COPULA (polite) 彼 は 独 身 です He is single. 9828 0.83 WB

Introduction The value of a frequency dictionary of Japanese A Frequency Dictionary of Japanese provides a list of core vocabulary for learners of Japanese as a second or foreign language. Like other volumes in the Routledge Frequency Dictionary series, it gives the most up-to-date, reliable frequency guidelines for common vocabulary in spoken and written Japanese, which helps learners of Japanese set practical goals for acquiring both productive and receptive knowledge of vocabulary. For teachers, it provides a valuable pedagogical tool with which to organize their teaching syllabus, prepare teaching materials, and assess their learners vocabulary level and size. Japanese is rather unique in its linguistic status. It is one of the very popular foreign languages that people want to learn. According to the 2006 Survey Report by the Japan Foundation, there are approximately 3 million students studying Japanese at 13,000 institutions in 130 countries. This does not include people learning Japanese via the TV or radio, so the potential number would be much larger. Despite such a huge demand, Japanese as a language has been very difficult to describe systematically. We use three different types of characters: two sets of Japanese syllabaries, hiragana and katakana, and a set of Chinese characters. The word spelled Seiko in English could be a person s name ( 聖 子 ) if the final vowel o is short, or a word meaning success ( 成 功 ), or the name of the famous watch brand in katakana (セイコー)! Also we have three sources of word origins: wago (native Japanese words), kango (words adopted from Chinese), and gairaigo (words adopted from Western languages). It is useful to know what types of words are important to learn. Therefore, a frequency dictionary like the present one will be a great resource for teachers and learners to find which words are used for which meanings in speech and writing. The shortage of such good resources is partly due to the lack of good data. In Japan, we have very advanced technologies in natural language processing (NLP), but people have not shown any interest in the so-called balanced corpus until recently. Most NLP work has been carried out on a large body of newspaper texts, and is thus not suitable for educational purposes. There has been very little exchange of information between humanities and information sciences until the turn of the century. The advent of computers and corpus linguistics, however, has changed the whole picture recently, and the National Institute of Japanese Language and Linguistics (NINJAL) finally completed the Balanced Corpus of Contemporary Written Japanese, BCCWJ, in 2011. We therefore feel that it is quite timely to publish this title as part of the Routledge Frequency Dictionary series. Contents of the dictionary This frequency dictionary is designed to meet the needs of students and teachers of Japanese, as well as those who are interested in the computational processing of Japanese. The main index contains the 5,000 most common words in contemporary written and spoken Japanese, ranging from very basic core grammatical words such as the particles ga or wa, to more intermediate and advanced vocabulary. Each entry in the main index contains the word itself in Japanese orthography, a romanized headword, its part(s) of speech, an English equivalent, an example sentence in Japanese, an English translation of the illustrative example, and summary statistics about the usage of that word. Aside from the main frequency listing, there are also indexes that sort the entries by Japanese alphabetical order (gojuuon), parts of speech, and different word types, wago (native Japanese word), kango (Sino-Japanese word), and gairaigo (loan word). The Japanese alphabetical order will be very helpful for students who, for example, come across Japanese words in reading and want to check how common the word is and whether it is worth learning. The part of speech indexes could be of benefit to understand how the grammar system of Japanese works or to learn by focusing selectively on

2 Introduction particular parts of speech. The list of different word types will inform the readers of very important lexical characteristics of Japanese. Finally, there are a number of thematically related lists (foods, greetings, emotions, etc.) as well as honorific expressions, all of which should enhance the learning experience. The expectation, then, is that this dictionary will greatly help the efforts of a wide range of students and teachers who are involved in the acquisition and teaching of Japanese vocabulary. Previous frequency dictionaries of Japanese As far as the Japanese language is concerned, no frequency dictionary has been published on a commercial basis. This does not mean, however, that no statistical analysis has ever been conducted on the language. On the contrary, Japanese is one of the languages whose lexical characteristics have been most extensively examined using statistical methods. A series of statistical word surveys have been conducted by the National Language Research Institute (NLRI) (which changed its English title to the National Institute of Japanese Language and Linguistics (NINJAL) in 2001), founded for the scientific study of the Japanese language in 1948. The NLRI lexical surveys covered various registers like newspapers (published in 1952, 1959 and 1970 73), magazines from different genres (1953, 1957, 1962, 1987, and 2005), school textbooks (1983 84, 1986 97) and TV programs (1995, 1997), and in these surveys, samples were randomly selected from rigidly defined statistical populations so that techniques of statistical inference could be applied to the data. There is, however, an important drawback common to all the surveys: the lack of consistent definition of a word for sampling and analysis purposes. Since it is a so-called agglutinative language, it is difficult, if not impossible, to find a unitary definition of word for Japanese. For example, kokuritsukokugokenkyuujo (the Japanese title for the NLRI) could be analyzed in at least four different ways: as four words kokuritsu (national), kokugo (national language), kenkyuu (research) and jo (institute); as three words kokuritsu, kokugo and kenkyuujo; as two words kokuritsu and kokugokenkyuujo; and as one word. In the surveys mentioned above, different definitions of word were used according to the purposes of the surveys. As a result, it was virtually impossible to cross-compare the results obtained in different surveys. As will be explained in the following sections, recent corpora employ dual part of speech (POS) analyses to overcome this difficulty. Another drawback of the NLRI word surveys is that they lack control of the timeline as a sampling frame. Due to the different sampling frames in terms of years of publication, it was impossible to make a valid comparison between newspaper surveys and those of textbooks. The last problem of the NLRI word surveys is the non-availability of the sampled data. Every time the NLRI conducted a survey, the results were published in the form of word frequency lists, but the data has never been publicly available outside the NLRI. This seriously constrained the development of corpus-based analysis of the Japanese language. The NLRI changed its data handling policy in the mid-1990s. Since then, a series of Japanese corpora have been compiled and released for public use. The two corpora used for the compilation of this frequency dictionary are recent products of NINJAL. The Corpus Two corpora are used as the resource for this frequency dictionary: the Corpus of Spontaneous Japanese (CSJ) and the Balanced Corpus of Contemporary Written Japanese (BCCWJ). CSJ The CSJ was compiled during the years 1999 2004 through the collaboration of NINJAL and NICT (National Institute for Informatics and Communications Technology) as a resource for the development of an automatic speech recognition system for spontaneous speech. The CSJ is a richly annotated corpus of 7.5 million words or 652 hours. In addition to the digitized speech and the transcriptions including various disfluency phenomena, the complete transcription texts are annotated with respect to the POS information and clause-type information. Moreover, segmental and prosodic annotations are provided for a subset of the CSJ called the CSJ-Core (including about half a million words or 44 hours). The speech recorded in the CSJ is so-called common, or standard, Japanese, a variety shared widely by educated people and used in more or less public settings. Speakers who had clear dialectal features in their morphology or segmental phonology were excluded.

Introduction 3 There are two main sources of spontaneous speech for the CSJ: Academic Presentation Speech (APS) and Simulated Public Speaking (SPS). APS are live recordings of academic presentations in nine different academic societies covering the fields of engineering, social sciences, and the humanities. SPS, on the other hand, are recordings of speeches by paid laypeople, of about 10 12 minutes, on everyday topics such as the happiest/saddest memory of my life, the town I live in, commentaries on recent news, and so forth. SPS were presented in front of small audiences and in a relatively relaxed atmosphere. The age and sex of SPS speakers were balanced as much as possible. As predicted, there is a difference in the word distribution between the APS and SPS samples. The lexical items of the APS include technical terms (mostly compounds) used in various fields of science and technology, and the speaking style is relatively formal. The lexical items of the SPS, on the other hand, include much more everyday expressions, and the speaking style is comparatively casual. Table 1 Size of the CSJ REGISTER # SUW # LUW Academic Presentation 3,279,364 2,654,823 Speech Simulated Public Speaking 3,605,729 3,115,302 Miscellaneous 640,032 543,749 TOTAL 7,525,125 6,313,874 All samples in the CSJ were dually POS analyzed using two definitions of word, namely, short unit word (SUW) and long unit word (LUW). In the case of kokuritsukokugokenkyuujo cited above, the four-word analysis corresponds to the SUW, and, the one-word analysis corresponds to the LUW. Table 2 shows the number of running SUWs and LUWs in the APS and SPS of the CSJ. The last register, entitled miscellaneous, includes samples of dialogues (interviews on the content of the APS and SPS, taskoriented dialogues and free dialogues) and reading aloud of the transcriptions of the APS and/or the SPS previously spoken by the same speakers. Table 2 Comparison of examples in the SUW and LUW POS analyses GLOSS SUW Lemma SUW POS LUW Lemma LUW POS binaural 両 耳 Noun perception 受 聴 Noun 両 耳 受 聴 Noun PLACE に Particle be based upon 拠 る Verb によって Particle CONJUNCTION て Particle obtain 得 る Verb 得 る Verb information 情 報 Noun 情 報 Noun PLACE に Particle に Particle TOPIC は Particle は Particle power パワー Noun spectrum スペクトル Noun パワースペクトル 情 報 Noun information 情 報 Noun and と Particle と Particle binaural 両 耳 Noun between 間 Suffix phrase 位 相 Noun 両 耳 間 位 相 差 Noun difference 差 Noun NOMINATIVE が Particle が Particle exist ある Verb ある Verb POLITE ます Auxiliary ます Auxiliary

4 Introduction Note that LUW covers not only compound nouns and verbs but also compound particles. For example, niyotte (by, because of) is analyzed as a compound particle in the LUW analysis, but as three separate units in the SUW analysis; the case particle ni followed by the adverbial form of verb yoru (be based upon), which is followed by a conjunction particle te. Table 2 compares the SUW and LUW analyses of the same phrase taken from an APS sample: 両 耳 受 聴 によって 得 る 情 報 にはパワースペク トル 情 報 と 両 耳 間 位 相 差 があります (the information obtained by binaural perception includes powerspectrum information and binaural phrase difference information). This phrase consists of 20 SUWs and 12 LUWs. Note that SUWs that are not part of larger compounds are analyzed as independent LUWs. See, for example, the last three lines of Table 2. BCCWJ The BCCWJ is the first balanced corpus of written Japanese, and was compiled during the years 2006 11 in the NINJAL. It consists of three main subcorpora: publication, library, and special-purpose. The publication subcorpus consists of texts randomly sampled from the populations of books, magazines, and newspapers published during the years 2001 5, whose total size is about 35 million words. The library subcorpus contains samples of books found in public libraries. The statistical population consists of the totality of books that are registered in more than 13 public libraries in Tokyo; the size of this population is almost equal to that of the books in the publication subcorpus about 30 million words. Finally, the special-purpose subcorpus covers various registers that are indispensable for the language planning studies of the NINJAL, but not covered by the publication and library subcorpora. This subcorpus contains samples of texts in governmental white papers, school textbooks (covering elementary, junior and senior high schools), various reports issued by local governments for public relations purposes, bestselling books, texts on the Web (bulletin board Yahoo! Chiebukuro and Yahoo! blog), poetry, law, and the minutes of the National Diet. All these samples are randomly chosen from these populations. See Table 3 for more details. Although the BCCWJ is designed as a corpus of contemporary Japanese, the timeline covered by the corpus is not necessarily narrow, and the length of the period differs depending on the register. Figure 1 shows the difference in temporal coverage of the 13 registers. Registers were labeled using the abbreviations shown in Table 3 below. Table 3 Size of the BCCWJ SUBCORPUS REGISTER # SAMPLE # SUW # LUW Publication Subcorpus Books (PB) 10,117 28,552,283 22,857,932 Magazines (PM) 1,996 4,4444,492 3,480,831 Newspapers (PN) 1,473 1,370,233 997,535 Library Subcorpus Books (LB) 10,551 30,377,866 25,092,641 Special-Purpose Subcorpus White papers (OW) 1,500 4,882,812 3,100,617 Textbooks (OT) 412 928,448 746,170 Local government reports (OP) 354 3,755,161 2,308,450 Bestselling books (OB) 1,390 3,742,261 3,185,745 Internet bulletin board texts (OC) 91,445 10,256,877 8,613,610 Blog texts (OY) 52,680 10,194,143 8,285,554 Poetry (OV) 252 225,273 202,425 Law (OL) 346 1,079,146 706,313 Minutes of National Diet (OM) 159 5,102,469 4,007,842 TOTAL 172,675 104,911,464 83,585,665

Introduction 5 76 80 86 99 01 04 05 06 07 08 09 PB PM PN LB OW OT OP OB OC OY OV OL OM CSJ Figure 1 Temporal coverage of the BCCWJ registers Target vocabulary identification and description Corpus balance Based on the CSJ and BCCWJ, a breakdown of the spoken and written components of the corpus was determined. The section of simulated public speaking (SPS) was used from the CSJ, because this was more closely related to natural spoken language used in daily situations. For the written section, the entire BCCWJ was used. As regards the unit of morphological analysis, LUW was used, which is comprised of a set of compounds as a word unit, e.g. 自 動 車 (car) or 飛 行 機 (airplane). Whilst short unit words are often used for normal morphological analysis, they cause serious problems in decomposing meaningful units into smaller morphemes, which is not often useful for teaching and learning purposes. The version of BCCWJ used for this dictionary was as of June 2011, and is slightly different in the total running words from the DVD version released in December 2011. Lemmatisation The headword as lemma was determined in the following way. First, each lemma was identified if base forms and their pronunciations, and parts of speech were all identical. Frequency counts are based on this notion of lemma. Then the following further adjustments were made. All the items that have the same base forms and pronunciations but different parts of speech were regarded as the same lemma. This is the same way that most Japanese dictionaries treat headwords. Thus, those words whose stems are nouns, such as 解 決 (kaiketsu: solution) but behave as verbs when suru is attached to the end, e.g. 解 決 する (kaiketsu-suru: solve) were classified as the same lemmas. Variant forms such as the following were all grouped under the standard forms: (i) polite forms: okangaeitadaku for kangaeru (think) (ii) potential forms: yareru for yaru (do) (iii) forms with prefixes o- or go-: okaasan for kaasan (mother) (iv) forms with suffixes -san, -sama, and -chan: musume-san for musume (daughter) The words that appeared in the original frequency list but were considered inappropriate for the wordlist were deleted, e.g. archaic words, single letters of the English alphabet, specific company names (e.g. SONY), personal names (e.g. 信 長 ), English words (e.g. アンド), too domain-specific terms, etc. Frequency and dispersion Word frequencies of the CSJ and BCCWJ were both normalized to per million words, and the average of the normalized frequencies for the two corpora with equal balance (50 percent each) was used for the

6 Introduction Table 4 Text registers Register BCCWJ-register (see Table 3 for abbreviations) Books [BK] LB/ PB/ OB/ OV Web [WB] OC/ OY Official documents [OF] OW/OL/OM/OT Newspapers & PN/PM/OP magazines [NM] Spoken [SP] SPS in CSJ standard frequency index. In order to provide information on register variations, the five registers were defined based on the 13 subcorpora from BCCWJ and CSJ (see Table 4): For all 5,000 entry words, log-likelihood values were calculated across the five registers above, and any words that were significantly high in log-likelihood values, e.g. within the top 50 in the list, were specified with a special register code, e.g. [+BK], showing the word s distribution across registers. There are many dispersion measures available (see Gries 2008 for review), but in this book, Carroll s D2 were used. This will take the value ranging from 0 to 1, where 1 means that the word is most evenly distributed. rank lemma form pos English pmw disp. 32 から kara p.case from 4739.328 0.999 2130 引 き 継 ぐ hikitsugu v. continue 26.779 0.500 4293 落 札 者 rakusatsu-sha n. bidder 10.573 0.033 manually, and an effort was made to make the context clear, self-contained, and reflecting the core meaning of the word, based on the available examples from the corpus data. Sometimes finding good English translations was difficult because the one-to-one translation equivalent for the Japanese headword does not always match the expressions used in the English translations of the Japanese examples. Every effort was made to match the two, but different expressions were sometimes used for natural translations of the illustrative examples. Finally, we compiled the thematic lists using both automatic and manual techniques. While we worked on the creation of English translations, we annotated the list for thematic categories, such as food and weather. We also consulted the previous titles of the Frequency Dictionary series, because many of the thematic lists overlap across languages. In conclusion, this dictionary is carefully tuned to the needs of learners and teachers of Japanese, fully exploiting the most advanced information from the newly developed Japanese corpora, BCCWJ and CSJ. We are confident that this dictionary will provide users with one of the most reliable resources for learning Japanese. The main frequency index The main index in this dictionary is a rank-ordered listing of the top 5,000 words (lemma) in Japanese, starting with the most frequent word and progressing through to the lowest one. The following information is given for each entry: Developing associated information Parts of speech were identified in the following procedures. First, a morphological analyzer called MeCab 1 with a dictionary called UniDic 2, specially developed for the BCCWJ project, were used for SUW analysis. These SUWs were then filtered by a tool called Comainu 3 and made into LUWs. The part-ofspeech mapping list between SUWs and LUWs was applied to the output of Comainu. Glossing the terms was carried out manually. An effort was made to give the most representative meaning(s) among sometimes too many candidates for translation equivalents. Illustrative examples were again supplied rank frequency (1, 2, 3,... ), lemma, romanized word, part of speech English gloss, illustrative example, English translation of the example normalised frequency, dispersion (0.00 1.00), (indication of register variation) As a concrete example, let us look at the entry for the verb omou: 34 思 う omou v to think 私 はそう 思 いません I don t think so. 4599 0.88

Introduction 7 This entry shows that word number 34 in our rank order list is the verb 思 う. The romanized version of the entry is provided for recognition purposes. The English gloss to think is provided next. One illustrative example is shown, which shows the related negative forms of this verb with a polite form-ending, 思 いません. An English translation for the example then appears. The last line of the entry shows the average normalised frequency (per million) based on the BCCWJ and CSJ (4,599 tokens), and the dispersion (0.88 in this case). Here are some additional notes for the items appearing in the entries. The part(s) of speech More than two parts of speech can be found for some words in the corpus. Due to the space limitations, the two most frequent parts of speech were selected in this case. We tried to offer corresponding English glosses for each part of speech, but sometimes the information was omitted when the users are expected to figure out glosses on their own. Besides general adjectival usage, i-adjectives and na-adjectives can also be used as adverbs, modifying verbs. This usage is a regular, additional feature of i- and na-adjectives, so we omitted this adverbial information from POS, with a few exceptions, for the sake of brevity. The English gloss The gloss is meant to be indicative only it is not a complete listing of all possibilities. Some words are polysemous, and very difficult to describe in one line. These meanings are not included in the glosses since the main focus in this dictionary is frequency information from the corpus. The Japanese illustrative examples Illustrative examples were invented by examining corpus examples. Many examples show inflected forms of the entries, which might look slightly confusing for beginning-level learners of Japanese. However, we are aware that the selection of examples would become extremely difficult and unnatural if we stuck only to the base form of the entry. Thus, we aimed for the natural usage of the entry words in example sentences. In many cases, the subjects or proper names in the examples had to be replaced by general pronouns or popular places in Japan, which sometimes made the examples a bit awkward, but every effort was made to make them sound natural. If the entry has multiple parts of speech, an example was given only for the most common part of speech and usage. For the entries of numerals such as 三 or 七, examples were designed by using the entry items independently without other compounds, but sometimes compound expressions had to be used to make the examples more user-friendly. The English translation of the examples Whilst an attempt was made to project the register, style, and structure of the source example into its translation, an English translation sometimes involves the use of words which do not exactly match the English gloss. In many cases, we tried to avoid such mismatches, but there are some cases in which we gave up using the same English glosses in the translations of Japanese examples. The statistical and register information The last line of each entry has two numbers divided by a vertical bar. The first is the average of the normalized frequencies per million words, taken from BCCWJ and CSJ respectively. The second is the dispersion value. Some words also have a register code that specifies the word s distribution across registers. We provide only the positive value for the five registers: books, webs, official documents, newspapers and magazines, and spoken. Thematic Vocabulary ( call-out boxes ) A number of thematically grouped words are provided in tables that are placed throughout the main frequency-based index. These include thematic lists related to the body, food, family, weather, professions, nationalities, colors, emotions, clothing, greetings, sports, and several other semantic domains. There are also lists with complicated, thus hard-to-master, phonetic and orthographic variants across spoken and written texts. Other tables give data on loan words, honorific expressions, and words with o- and go- prefixes. Alphabetical and part of speech indexes The Japanese alphabetical index gives a listing of all the entries in the frequency index, ordered by the

8 Introduction Japanese ordering of kana, called gojuuon. Each entry in this chapter includes: (1) the lemma, (2) the part of speech, (3) a basic English equivalent, and (4) the word s ranking in this dictionary. The part of speech index lists the words from the frequency index, this time arranged by parts of speech. Each category lists the lemmas in decreasing order of frequency. Word type index The last section of the dictionary provides the frequency index words classified by their word types or origins. As mentioned previously, the Japanese language has three sources of word origins: wago (native Japanese words), kango (words adopted from Chinese), and gairaigo (words adopted from Western languages). In this section, the first 1,000 most frequent words are classified into the above three word categories, (1) wago, (2) kango, (3) gairaigo, as well as (4) konseigo, a blend of the three types, and (5) proper nouns, with the original ranking information. Notes on romanization Romanization of Japanese is largely based upon the so-called Hepburn system. Japanese syllables (or morae) サ, シ, ス, セ, ソ, タ, チ, ツ, テ, ト, ハ, ヒ, フ, ヘ, ホ, ザ, ジ, ズ, ゼ, and ゾ are romanized respectively as sa, shi, su, se, so, ta, chi, tsu, te, to, ha, hi, fu, he, ho, za, ji, zu, ze, and zo. So-called youon (palatalized syllables) like キャ, シャ, チャ, ニャ, ヒャ, ミャ, リャ, ギャ, ジャ, ビャ, and ピャ are represented as kya, sha, cha, nya, hya, mya, rya, gya, ja, bya, and pya respectively. Note also that ヤ, ユ, and ヨ are romanized as ya, yu, and yo. Sokuon (geminate) is represented by doubling the relevant consonant, as in yappari (やっぱり), chotto (ちょっと), shikkari (しっかり), and beddo (ベッド). Romanization of Hatsuon (syllabic nasal) is slightly different from the genuine Hepburn system. It is consistently represented by a letter n, as in mikan (ミカン), konbanwa (こんばんは), kantan ( 簡 単 かんたん), and manga (マンガ). When there is a morphological boundary between a hatsuon and the following morphonem beginning with a vowel, an apostrophe is inserted after the hatsuon, as in han i ( 範 囲 はんい), ren ai ( 恋 愛 れんあい), and han ei ( 反 映 はんえい). Another important deviation from the traditional Hepburn system is the representation of long vowels and vowel sequences. In this respect, the romanization adopted in this dictionary follows the convention of present-day Japanese orthography (Gendai kanazukai). The long vowels /a/, /i/, and /u/ are represented by doubling the vowels, as in baai ( 場 合 ばあい), sukaato (スカート), tanoshii ( 楽 しい), takushii (タクシー), riyuu ( 理 由 りゆう), and yuuzaa (ユーザー). The long vowel /e/ is represented either by ei or ee, following the convention of Gendai kanazukai, as in tokei ( 時 計 ), meiwaku ( 迷 惑 めいわく), keeki (ケーキ), and meeru (メール). In the same vein, ou and oo are used to represent the long vowel /o/, as in koukan ( 交 換 こうかん), osou ( 襲 う おそう), koohii (コーヒー), and soosu (ソース). Some loan words contain syllables that are not found in the syllable inventory of traditional Japanese. These include, for example, ti in paatii (パーティー), di in merodii (メロディー), and fi in ofisu (オフィス). Segmentation of headwords Headwords are generally romanized using spaces between SUWs, e.g. te iru (ている), de wa nai (ではない), ni tsui te (について), shi yakusho ( 市 役 所 ), kousoku douro ( 高 速 道 路 ). This is to clearly indicate that these headwords are composed of more than one SUW. However, a hyphen is used for the following cases: (i) if part of speech of the given SUW is either prefix or suffix, e.g. watashi-tachi ( 私 達 ), tukuri-kata ( 作 り 方 ), ik-kai ( 一 回 ) [Affixes are underlined.] (ii) if a space separates geminate consonants, e.g. is-shuukan ( 一 週 間 ) Notes 1 http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html 2 http://www.tokuteicorpus.jp/dist/ 3 http://slp.itc.nagoya-u.ac.jp/~kozawa/comainu/

References Carroll, J. B. (1970) An alternative to Juilland s usage coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies in the Humanities and Verbal Behavior, 3(2): 61 65. Gries, S. Th. (2008) Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics 13: 403 37. National Language Research Institute (1952) Goi-chousa: Gendai Shinbun Yougo no Ichirei. (Research on vocabulary used in modern newspaper articles). National Language Research Institute (1953) Fujin Zasshi no Yougo Gendaigo no Goi Chousa. (Research on vocabulary in women s magazines). National Language Research Institute (1957) Sougou Zasshi no Yougo Gendaigo no Goi Chousa. (Research on vocabulary in cultural reviews). National Language Research Institute (1959) Meiji Shoki no Shinbun no Yougo. (On the vocabulary in newspapers in the early years of the Meiji period). National Language Research Institute (1962) Gendai Zasshi 90-shu no Yougo Youji Daiichi Bunsatsu Goihyou. (Vocabulary and Chinese characters in 90 contemporary magazines. Vol.1: General descriptions and vocabulary frequency table). National Language Research Institute (1970 3) Denshikeisanki ni yoru Shinbun no Goi Chousa. (Computer studies on the vocabulary in modern newspapers). National Language Research Institute (1983 4) Koukou Kyoukasho no Goi Chousa I. (Studies on the vocabulary in senior high school textbooks, Vol. 1 2). National Language Research Institute (1986 7) Chuugakkou Kyoukasho no Goi Chousa I. (Studies on the vocabulary in junior high school textbooks. Vol. 1 2). National Language Research Institute (1987) Zasshi Yougo no Hensen. (Changes in the language of magazines). National Language Research Institute (1995) Terebi Housou no Goi Chousa I. (Vocabulary survey of television broadcasts I). National Language Research Institute (1997) Terebi Housou no Goi Chousa II. (Vocabulary survey of television broadcasts II). National Institute of Japanese Language and Linguistics (2005) Gendai Zasshi no Goi Chousa. (A survey of vocabulary in contemporary magazines).

Frequency index rank, lemma, romanization, part of speech, English gloss illustrative example English translation frequency dispersion register code 1 の no p. case of; in; at; for; by 彼 はこの 大 学 の 学 生 だ He is a student at this university. 47078 0.98 BK 2 に ni p. case at; on; in; to; for 私 は 大 阪 に 住 んでいます I live in Osaka. 32231 1.00 BK 3 は wa p. TOPIC 好 きなスポーツはテニスです My favorite sport is tennis. 31572 1.00 BK 4 た ta aux. PAST 昨 日 彼 を 見 ましたか Did you see him yesterday? 31549 0.98 BK 5 を o p. case ACCUSATIVE 彼 は 毎 晩 ビールを 飲 む He drinks beer every night. 29120 0.99 BK 6 だ da aux. COPULA 僕 は 英 語 が 苦 手 だ I m not good at English. 27686 0.99 BK 7 が ga p. case NOMINATIVE こちらが 私 の 妻 です This is my wife. 26904 1.00 BK 8 て te p. conj. REASON お 金 がなくて 海 外 旅 行 できない I don t have money so I can t travel abroad. 22523 0.96 9 と to p. case and; or; with; if 彼 とレストランへ 行 った I went to a restaurant with my boyfriend. 18509 1.00 BK 10 ます masu aux. POLITE (after verb) 来 週 京 都 へ 行 きます I will go to Kyoto next week. 16855 0.95 11 も mo p. too, also 彼 が 行 くなら 私 も 行 きます If he is going, I will go too. 16147 0.97 12 で de p. case in; at; from; by 東 京 駅 で 彼 女 に 会 った I met her at Tokyo station. 14058 1.00 BK 13 ている,てる te iru, teru cp. CONTINUATION 雨 が 降 っている It s raining. 13555 0.99 14 です desu aux. COPULA (polite) 彼 は 独 身 です He is single. 9828 0.83 WB 15 れる reru aux. PASSIVE 日 本 で 使 われている 通 貨 は 円 です The currency used in Japan is yen. 9234 0.99 BK 16 という, つう to iu, to yuu, tsuu cp. called, named 太 郎 という 男 の 子 を 知 っていますか Do you know a boy called Taro? 9073 0.80 17 事 koto n. thing 今 年 はいろいろな 事 があった All kinds of things happened this year. 8747 0.96 18 えー, ええ ee interj. eh?, what?; well, yes あの 人 は えー ちょっと 名 前 が 思 い 出 せませ ん That man is..., well I cannot remember his name. ええ そうです Right. 8636 0.07 SP 19 言 う iu, yuu v. say, speak, talk はっきり 言 うと あなたの 言 っていることは 無 意 味 です Frankly speaking, you are talking nonsense. 8549 0.91 20 のです,んです no desu, n desu cp. ASSERTION (polite) どうしたんですか What s the matter? 8439 0.71 21 あの,あのう,あのー ano, anoo interj. Excuse me; uh, eh, um, ah, er あの ちょっとお 聞 きしたいんですが バス 乗 り 場 はどこですか Excuse me, could you tell me where the bus stop is? 8431 0.00 SP

Frequency index 11 0% 20% 40% 60% 80% 100% Academic Negative Ending/Adnominal Public Dialogue Academic Public Dialogue iu/iwa yuu/yuwa Variation of iu (say/speak/talk) and the conjugation: When the verb is in the negative form (mizenkei), it is often pronounced as iwanai. In its ending- and adnominal-forms, the verb is pronounced almost regularly as yuu as in sou yuu hito. All the analyses of pronunciation variants hereafter were based on the CSJ data. 22 する suru v. do; make 仕 事 をしなければなりません I have to do my work. 7644 0.99 23 ま ー,ま あ maa interj. Wow!, Oh my God! まー なんて 素 晴 らしいんでしょう Wow! That s amazing! 6950 0.04 SP 24 の no p. POSSESSIVE 彼 が 言 ったのは 本 当 だ What he said is true. 6883 0.95 25 ある aru v. be (existence), have (possession), happen, occur 彼 の 報 告 書 は 問 題 がある His report has some problems. 6496 0.98 26 ね ne p. disc. isn t it?, don t you? いい 天 気 ですね It s a nice day, isn t it? 6282 0.70 27 ない nai aux. not 彼 は 朝 ごはんを 食 べない He doesn t have breakfast. 6253 0.98 28 なる naru v. become, get; come to do, start to do; turn into 彼 は 金 持 ちになるでしょう He will become rich. 5977 0.99 29 か ka p. disc. QUESTION コーヒーか 紅 茶 はいかがですか Would you like some coffee or tea? 5594 0.91 30 その sono adn. that そのカバンを 取 ってくれませんか Can you pass me that bag? 5546 0.95 BK 31 けれど keredo conj. though, although このアパートはあまり 良 くないけれど 安 い This apartment is cheap, though it s not so nice. 5293 0.64 32 から kara p. case from ここからその 店 までは 遠 い The shop is a long way from here. 4740 1.00 33 よう you aux. INDUCEMENT 一 緒 にDVDを 見 よう Let s watch a DVD together. 4638 0.97 BK 34 思 う omou v. think, believe; feel; expect 私 はそう 思 いません I don t think so. 4599 0.88 35 で de conj. so, then で あの 話 はどうなりましたか? So what happened about the story you mentioned? 4412 0.15 SP 36 か ka p. if; or 誰 か 来 たようだ It seems that someone has come. 4308 0.94 37 が ga p. conj. ADVERSATIVE いい 天 気 だが 風 が 冷 たい It s a sunny day but the wind is chilly. 4168 0.96

12 A Frequency Dictionary of Japanese 0% 20% 40% 60% 80% 100% Academic Public kedo keredo Dialogue Variation of keredo (although): The casual variant kedo is the most frequent in dialogue. In academic presentation and public speaking, kedo and keredo are used more or less equally. 38 物 mono n. thing, object, stuff そんなに 高 い 物 は 買 えません I can t buy such expensive stuff. 3676 0.96 39 そう sou adv. so, such 私 もそう 思 います I think so too. 3586 0.80 40 何 nani pron. what; something; anything; nothing 何 を 考 えているんですか? What are you thinking? 3497 0.76 41 と to p. conj. if, when; with お 酒 を 飲 みすぎると 眠 くなってし まう Drinking too much makes me sleepy. 3458 0.97 42 私 watashi, watakushi, atashi pron. I 私 は 寿 司 が 好 きです I like sushi. 3404 0.90 43 てしまう te shimau cp. end up doing... ダイエット 中 なのに つい 甘 いものを 食 べて しまう I m on a diet, but I can t stop eating sweets. 3352 0.78 44 それ sore pron. that それは 明 子 さんのカバンですか Is that Akiko s bag? 3278 0.91 BK 45 とか to ka p. and; or ケーキとかチョコレートばかり 食 べるから 君 は 太 るんだ You gain weight, because you always eat cake and chocolate. 3174 0.00 SP 0% 20% 40% 60% 80% 100% Academic Public atakushi atashi watakushi watashi Dialogue Variation of watashi (I): The most formal variant, watakushi, is used most frequently in academic presentations, and is rarely used in dialogue. The most casual variant, atashi, shows the opposite pattern.

Frequency index 13 46 この kono adn. this この 本 をもう 読 みましたか Have you read this book? 2974 0.97 BK 47 無 い nai i-adj. There is no..., no... 今 お 金 が 無 いんです I have no money now. 2953 0.96 48 行 く iku, yuku v. go; come すぐ 行 きます I m just coming. 2746 0.88 49 のだ,んだ no da, n da cp. ASSERTION 彼 女 は 何 も 知 らないのだ She doesn t know anything. 2708 0.94 50 せる seru aux. CAUSATIVE 子 供 にピアノを 習 わせたい I want my child to learn how to play the piano. 2702 0.87 51 これ kore pron. this これをください (レストランや 店 で) I will take this. (in a restaurant or shop) 2685 0.93 52 もう moo adv. already; soon; again もう 寝 ます I ll go to bed soon. 2630 0.66 53 である de aru cp. COPULA (formal) これが 実 験 の 結 果 である These are the results of the experiment. 2586 0.81 BK 54 時 toki n. time 時 が 悲 しみを 癒 してくれますよ Time will heal your sorrow. 2514 0.96 55 な na p. disc. PROHIBITION 動 くな 手 を 挙 げろ Don t move! Get your hands up! 2265 0.78 56 ず zu aux. NEGATION 彼 は 何 も 言 わずに 去 ってしまった He left without saying anything. 2250 0.94 57 の で,ん で no de, n de p. conj. as, because, since 娘 はまだ 小 さいので 手 がかかります As my daughter is still very young, she needs to be looked after. 2181 0.00 SP 58 人 hito n. person, people, human being その 歌 手 は 若 い 人 に 人 気 がある That singer is popular among young people. 2178 0.93 59 よ yo p. disc. ASSERTION, REMINDING (informal) また 来 るよ I will come again. 2134 0.88 60 こう koo adv. so, like this 粉 と 水 を 混 ぜ こうしてよくこねてください Mix flour and water together, and knead the dough like this. 2099 0.73 61 から kara p. conj. because, since 雨 が 降 っているから 出 かけるのはやめます I will not go out, because it s raining. 2028 0.96 62 ば ba p. conj. if 雨 がふれば 試 合 は 中 止 だ If it rains, the game will be cancelled. 1909 0.98 63 や ya p. and; or 結 婚 式 に 家 族 や 友 人 を 招 待 した We invited our family and friends to the wedding. 1881 0.93 64 来 る kuru v. come ここに 来 てください Come here, please. 1876 0.92 65 その sono interj. uh, er, um, mm どうして 授 業 を 休 んだの それは そ の... Why were you absent from the class? Uh, well... 1868 0.01 SP 66 まで made p. to, till, until 休 暇 は 明 日 から 来 週 の 水 曜 日 までです My vacation is from tomorrow till next Wednesday. 1839 0.99 67 見 る miru v. see; look at, watch; check 通 りを 渡 る 前 に 左 右 を 見 た I looked left and right before crossing the street. 1814 0.98 68 たり tari p. and 日 曜 日 はよく 部 屋 の 掃 除 をしたり 本 を 読 んだ りする I usually clean up my room and read books on Sunday. 1793 0.84 69 今 ima n. now 今 何 時 ですか What time is it now? 1760 0.92 70 良 い,い い yoi, ii i-adj. good 彼 は 良 い 人 だ He is a good man. 1734 0.84 71 所 tokoro n. place, point; part; aspect 先 週 はいろいろな 所 に 行 った I went to many places last week. 1714 0.93