Introducing the NPCMJ

For the major languages of the world, progress has been made in the creation of corpora annotated with syntactic information (treebanks), and significant results have been gained in the fields of linguistics and language processing using these corpora for research. With regard to Japanese, at the National Institute for Japanese Language and Linguistics (NINJAL), the Collaborative Research Project “Development of and Linguistic Research with a Parsed Corpus of Japanese” began in 2016, and is presently building the NPCMJ (NINJAL Parsed Corpus of Modern Japanese). This project aims to annotate syntactic and semantic information to texts of written and spoken Contemporary Japanese, making it possible to search and extract from the data a rich inventory of function words, phrase structures, clause types, and complex constructions, and to use the results actively for research. In the present release (March, 2019), approximately 30,000 sentences (30,000 trees) have been made publicly available. Together with the data, the project also offers a variety of tools designed to be used with the NPCMJ, enabling searches of many different kinds. By all means see for yourself what can be done with the tools and the data.

出典 ツリー数 語数
aozora(青空文庫) 4,646 101,537
bible(聖書) 1,664 30,657
book(書籍) 552 12,515
dict(辞書) 3,419 33,651
diet(国会会議録) 1,698 37,349
fiction(フィクション) 923 12,051
law(法律文) 337 7,793
misc(その他) 2085 23,872
news(ニュース) 4,666 84,927
nonfiction(ノンフィクション) 223 4,454
ted(テッドトーク) 1,453 22,030
textbook(教科書) 6,048 64,038
wikipedia(ウィキペディア) 2746 70,445
Total 30,460 505,319

Online Tools for using the NPCMJ

NPCMJ ExplorerFor entry-level users
This is a pattern browser that searches the corpus for examples matching the grammatical descriptions in Kisonihongo bunpo, Revised edition, by Masuoka Takashi and Takubo Yukinori, from Kuroshio Publishers. It also includes a character string search function with which users can look for examples based on strings they enter themselves.
Start the NPCMJ Explorer
NPCMJ SearchFor advanced users
This is a search interface comprising five tools: Tags, Word dependencies, String search, Tree search/Text analysis, and Query builder. Access is also given to the full text and the metadata for each sample.
Start the NPCMJ Search Interface
NPCMJ Search User’s Manual
NPCMJ Annotation Manual (section 1-13)

Full Download

Bracketed tree file format
This is a compressed zip file containing containing all the sample files of the NPCMJ in bracketed tree format.
Download bracketed tree files

NPCMJ Documentation

Under construction