Meetings – NPCMJ – Ninjal Parsed Corpus of Modern Japanese

Collaborative Research Project Meeting (online)

Date: 19 November 2021 (Fri) 16:00-17:00
Venue: online
Participation fee：free
Registration : see 「参加申し込み」 on NINJAL HP

「SUGINOKI：日本語学習者言語ツリーバンク」
堀内仁 (国際教養大学専門職大学院)

[abstract]

今年の夏，日本語統語意味解析コーパス「NPCMJ (NINJAL Parsed Corpus of Modern Japanese)」プロジェクトの延長線上で開発が進められてきた日本語学習者言語ツリーバンク「SUGINOKI」が公開された。本発表では，当該ツリーバンクの開発の経緯・背景，データ，アノテーション，SLA研究・日本語教育研究への応用の可能性について報告する。

Collaborative Research Project Meeting (online)

Date: 27 November 2020 (Fri) 13:30-15:30
Venue: online
Participation fee：free
Registration : see 「事前登録」 on NINJAL HP

13:30～14:30
「関係節の主名詞のタイプ ―NPCMJ による量的調査―」
世良時子 (Seikei University)

[abstract]

本研究は，統語解析情報付きコーパス NPCMJ (NINJAL Parsed Corpus of Modern Japanese) を用い，どのようなタイプの名詞句が関係節の主名詞として高い割合で出現するかを調査したものである。結果として，どのテキストジャンルにおいても，主語名詞句が圧倒的に高い頻度で出現していることが明らかになった。分析過程において，どのような関係節を対象としたか，現状と問題点を報告し，更なる検討を行う。また，テキストジャンルごとの出現率の傾向の違いについても報告する。

14:30～15:30
“Creating a Parsed Corpus of the Tsugaru Dialect”
Vance Gwidt (Hirosaki University)

[abstract]

The aim of my research is to create a parsed corpus of the Tsugaru dialect, which will include morphological and syntactic analysis. The data consists of audio samples of local folktales spoken by native speakers of the dialect. The annotation system, including the methods of linguistic analysis and online presentation is based on the NPCMJ. I will provide an overview of the creation of this corpus and its current state, and will detail the methods used for annotating the data.

Collaborative Research Project Meeting (online)

Date: 7 July 2020 (Tue) 17:00-18:00
Venue: online
Participation fee：free
Registration : see 「事前登録」 on NINJAL HP

「統語・意味解析コーパス NPCMJ における照応解析の現状と諸問題」
Wataru Okubo (TUFS / NINJAL)

[abstract]

談話における照応に関わる代名詞等の表現の解釈は，理論言語学・自然言語処理での重要な問題の一つである。NPCMJ (NINJAL Parsed Corpus of Modern Japanese) では，統語解析されたツリーバンクコーパス上の句 (i.e., 名詞句や節) に，ソート情報 (sort information) という個体・事象識別のための指標を振ることで，照応関係解釈の解決を目指している。本発表では，NPCMJ における照応・共参照や橋渡し指示 (間接照応) などの現象に関するアノテーションの現状と実際のアノテーション作業を通じて明らかになってきた問題点を報告するとともに，その解決に向けたソート情報付与の可能性を議論する。

Symposium “NINJAL Parsed Corpus of Modern Japanese and it’s applications to linguistic research” (The 44th annual conference of the Kansai Linguistic Society)

Date: 14 July 2019 (Sun) 13:30-16:30
Venue: F402, Bldg.4, Area 2, Senriyama Campus,Kansai University

第I部: 概要とウェブ・インターフェースを用いた検索の例

「イントロダクション」 [presentation materials]
Prashant Pardeshi (NINJAL), Kei Yoshimoto (Tohoku University)

[abstract]

NPCMJ 開発の動機とその特徴について解説し、NPCMJ が柔軟で文構造――例えば、主節と埋め込み節の要素間の階層的依存関係――を反映した例文検索と文法情報抽出を可能にすることについて述べる。さらに、ウェブサイト上のインターフェースを使ってどのような言語データが収集可能かについて解説する。

「前提投射の統語コーパスでの検索」 [presentation materials]
Yusuke Kubota (NINJAL), Koji Mineshima (Ochanomizu University)

[abstract]

NPCMJ のウェブ・インターフェースを用いて前提 (presupposition) 投射の実例を検索した結果について報告する。埋め込み節、条件節、モーダルに埋め込まれた環境、の3つの条件で、累加的意味を表す前提トリガーの取り立て詞・副詞の「も」「まで」「再び」「さらに」を用いて実例を収集した。該当する語の言語学的分析や、言語学的知見にもとづく自然言語処理研究のためのテストセットを作るのに十分な量のデータが得られることを確認する成果が得られた。

第II部: 統語コーパスを用いた文法研究

「名詞句と述語の共起関係から見たコーパス研究」 [presentation materials]
Nobuyoshi Miyoshi (Jissen Women’s University)

[abstract]

文の述語がそれぞれ動詞、形容詞、および名詞である場合について、普通名詞、固有名詞、代名詞など語彙的性質の異なる名詞が主語として結びつくか否かについて、構文にもとづく検索を可能にする NPCMJを利用して例文を収集し、検討を行う。文章ジャンルごとの偏り等を調べることにより、本コーパスが文体論研究や量的研究における文法情報検索の応用において効果的であることを示す。

「NPCMJコーパスをとおしてみる特定の統語環境における語彙の偏り」 [presentation materials]
Misato Ido (NINJAL)

[abstract]

本発表では、NPCMJコーパスを用いることで、特定の統語環境に現れる語彙の偏りを観察することができることを示す。具体的には、「否定文の名詞述語文において、述語名詞の連体修飾節に副詞が含まれる場合」を検索し、それらの副詞を分類する。検索の結果、多くの場合において、副詞が否定の焦点になっていること、NPI副詞（「あまり」「そんなに」等）のほか、高い程度を表す程度副詞（「とんでもなく」「格別」）などが多く現れていることがわかった。さらに、この検索結果に基づいてより規模の大きいコーパスであるBCCWJを用いて、これらの副詞の分布的特徴をより詳しく調査した。

第III部: プログラミングを伴う言語研究

「並列構文における主語句標識と文の意味解釈」 [presentation materials]
Wataru Okubo (Tokyo University of Foreign Studies)

[abstract]

並列構文での主語句の助詞の選択(が、は、も)は、文の意味解釈(対比、累加、等)と相関関係があると考えられる。この発表では、統語解析情報を基に複雑な条件を指定した上での例文の収集が可能であるNPCMJを用いて、その関係性を調査する。その中で、目的に沿ったデータ編集・整理方法の例として、ツリー構造を加工するツールであるtsurgeonによるデータの変形や、表計算ソフトで扱える形式への変換方法を紹介する。最後に、この調査で明らかになった、並列構文での主語句標識の組み合わせによる文の意味解釈への影響を指摘する。

結び

研究へのコメント・討論
Kei Yoshimoto

JSLS 2019 workshop “Development of a parsed corpus and its applications to linguistic research and education”

開催期日：7 July 2019 (Sun) 11:00-12:30
開催場所：11th Floor (大会議室), New Humanities Building, Kawauchi-Kita Campus, Tohoku University

“Introduction”
Prashant Pardeshi (NINJAL) and Kei Yoshimoto (Tohoku University)

“Recent directions in CHILDES Japan and its role in NPCMJ”
Susanne Miyata (Aichi Shukutoku University)

“Constructing Japanese predicate-argument thesaurus and annotating NPCMJ with semantic role labels”
Koichi Takeuchi (Okayama University)

“Developing a Japanese syntax textbook as part of NPCMJ Project”
Hideki Kishimoto (Kobe University) and Prashant Pardeshi (NINJAL)

Collaborative Research Project Meeting

Date: 15 June 2019 (Sat) 14:00-18:00
Venue: Faculty of Science, Building 3, Ochanomizu University
Transport and Direction

14:00-14:30
「NPCMJコーパスの現状と今後の課題」
Prashant Pardeshi (NINJAL), Kei Yoshimoto (Tohoku University)

[abstract]

14:30-15:00
「前提投射のNPCMJコーパスでの検索」
Yusuke Kubota (NINJAL), Koji Mineshima (Ochanomizu University)

[abstract]

15:10-15:40
「統語情報付きコーパスから見た名詞句と述語の共起関係」
Nobuyoshi Miyoshi (University of Tsukuba)

[abstract]

15:40-16:10
「NPCMJコーパスによる名詞述語文の名詞修飾節に現れる副詞の研究」
Misato Ido (NINJAL)

[abstract]

16:20-16:50
「並列構文の意味解釈と主語句標識の選択」
Wataru Okubo (Tokyo University of Foreign Studies)

[abstract]

並列構文での主語句の助詞の選択 (が、は、も) は、文の意味解釈 (対比、累加、等) と相関関係があると考えられる。この発表では、統語解析情報を基に複雑な条件を指定した上での例文の収集が可能であるNPCMJを用いて、その関係性を調査する。その中で、目的に沿ったデータ編集・整理方法の例として、ツリー構造を加工するツールであるtsurgeonによるデータの変形や、表計算ソフトで扱える形式への変換方法を紹介する。最後に、この調査で明らかになった、並列構文での主語句標識の組み合わせによる文の意味解釈への影響を指摘する。

16:50-17:10
研究へのコメント・総括
Kei Yoshimoto (Tohoku University)

17:10-18:00
討論

Collaborative Research Project Meeting

Date: 12 May 2019 (Sun) 9:00-12:00
Venue: Hirosaki University 50th Anniversary Auditorium (会議室2)

9:00-10:00
“Syntactic annotation for Japanese CHILDES data”
Susanne Miyata (Aichi Shukutoku University) and Alastair Butler (Hirosaki University)

[abstract]

Recent work within the NPCMJ project has involved changing the morphological base of the annotation from Japanese script analysed with the LUW (long-unit word) standard of the BCCWJ (Maekawa et al. 2014) to romanized words parsed according to WAKACHI2002v8 (Miyata, 2013) and provided with morphological tags using the JMOR library files (Miyata & Naka, 2014). A major consequence of so modifying our data is that we are now able to train a parser that is compatible with this alternative WAKACHI2002v8 analysis. This opens the way to using automatic methods to parse Japanese CHILDES data that already contains gold standard morphological analysis. We report on some first parsing results, and how we are planning to present the syntactic annotation information as a new %rlc (root-leaf context) tier within the CHAT format (MacWhinney 2019).

10:00-11:00
“Results of annotating semantic role labels and frames for the NPCMJ”
Koichi Takeuchi (Okayama University)

[abstract]

This talk presents results of annotation work conducted at Okayama University in collaboration between Takeuchi Laboratory and NINJAL NPCMJ Project. Specifically, we have started to add to the NPCMJ parsed trees markup for role labels and links to frames of the Predicate Thesaurus (Takeuchi et al., 2010). The annotation process has revealed several cases where it is difficult to decide semantic roles and their conceptual frames. Furthermore, there are cases of potential ambiguity. Taking up concrete examples, solutions for annotation practice will be discussed.

11:00-12:00
「重文の主語句標識の選択と節の関係性」
大久保弥 (東京外国語大学)

[abstract]

顕在的な主語を持つ節が並ぶ重文では、主語句の助詞の選択 (が、は、も) は、文の意味解釈 (対比、累加、等) と相関関係があると考えられる。この発表では、NPCMJを用いた調査を通して、重文での主語句標識の組み合わせがどのように文の意味解釈へ影響を与えるかを分析する。調査では、NPCMJの統語解析情報を利用し、複雑な条件を指定することで、重文と認められる例文を収集する。さらに、研究目的に合わせたデータ整理の具体例として、ツリー構造を加工するツールであるtsurgeonによるデータの変形や、表計算ソフトで扱える形式への変換方法などを紹介する。

Collaborative Research Project Meeting

Date: 27 January 2019 (Sun) 10:30-16:00
Venue: #101 Kawakita Research Forum, Kawauchi-Kita Campus, Tohoku University
https://www.tohoku.ac.jp/en/about/map_directions.html

10:30-11:30
“Changing the morphological base of the NPCMJ”
Iku Nakasaki and Alastair Butler

[abstract]

This talk describes changes being made to the morphological base of the NPCMJ, a corpus of Japanese parsed for syntax. The old morphological base consisted of segmentation decisions on Japanese script to isolate word units together with the classification of each unit’s part-of-speech (noun, verb, etc.). This old segmentation corresponded closely to, but also deviated from, the LUW (Long Unit Word) standard of the Corpus of Spontaneous Japanese (CSJ; Maekawa 2003) and the Balanced Corpus of Contemporary Written Japanese (BCCWJ; Maekawa et al. 2014). The replacement morphological base uses the JMOR system (Miyata & Naka, 2014) and is carried out with Romaji (Hebon) rather than the Japanese script. With this change it becomes possible to encode information about the internal makeup of words. Notably stem information is isolated and accompanied by an English gloss that acts as a partial lemmatisation. In addition, the grammatical functions of prefixes and suffixes are clearly distinguished. This change in morphological base brings significantly richer word information into the corpus, as well as a clear concept of what a word is for Japanese. But this change is also a massive undertaking, requiring major alterations to every annotated tree. In the talk we detail how we have used tools of automation to make the change feasible. This serves as an example of how it is possible to harness the power of a parsed corpus to improve and further supplement the contained analysis

13:00-14:00
「統語・意味情報付きコーパスの開発に関する研究：中国語名詞句の解析について」
周振

[abstract]

本発表は，統語・意味情報付きコーパスを開発するに当たって，中国語名詞句の解析を考察するものである。名詞句の解析をめぐっては，二つの課題がある。それは，名詞句の内部構造を明らかにし形式的に解析することおよび名詞句の担う類似した統語的役割を区別することである。名詞句の解析は，コーパス構築作業および構築できたコーパスを基にする言語研究の基本的かつ重要な一環を成しており，それを明らかにすることによって，研究の基盤を固めることができると考えられる。

14:00-15:00
「名詞句と述語の共起関係から見たコーパス研究」
三好伸芳

[abstract]

本発表では、統語情報付きコーパスであるNPCMJを用いて、文中における名詞句と述語の結びつきがどのように分布しているのかを明らかにする。名詞句には、普通名詞、固有名詞といったバリエーションがあるが、それらがテキスト内においてどのような述語（動詞述語、形容詞述語、名詞述語）と結びついているのかは、従来のコーパスでは明らかにすることができなかった。本研究により、従来品詞等の分布と結びつけられていた文体論研究や量的研究に、項構造や格関係の分布といった、文法的な関係性を導入することが可能になる。

15:00-16:00 全体討論

Collaborative Research Project Meeting

Date: 22 June 2018 (Fri) 9:00-13:00
Venue: Room No. 103, Faculty of Engineering Building No.4, Tsushima Campus, Okayama University
Access to Tsushima campus
Tsushima campus map (N33)

「今年度のNPCMJ プロジェクトの活動について」
プラシャント・パルデシ (国立国語研究所)

“Tools and practices for annotating discourse”
スティーブン・ライト・ホーン (国立国語研究所)

「構造的距離から見る否定極性項目間の類似度: NPCMJを指標にした検証」
岸山健 (国立国語研究所/東京大学大学院)

ディスカッション「データスキーマの改良」
竹内孔一（岡山大学），宮田スザンヌ（愛知淑徳大学），アラステア・バトラー（弘前大学），プラシャント・パルデシ（国立国語研究所）他

Workshop “Research Methods for the Penn Parsed Corpora of Historical English (PPCHE)”

Date: 12 December 2017（Tue) 14:00-18:00
Venue: Waseda University　Access
Participation fee：free
Hosted by : NINJAL, Institute for the Study of Language and Information/Institute for Digital Enhancement of Cognitive Development (Waseda University), Japanese Association for the Study of Logic, Language and Information
Lecturers: Anthony Kroch and Beatrice Santorini (University of Pennsylvania)

Collaborative Research Project Meeting

Date: 4 November 2017 (Sat)
Venue: Graduate School of Humanities and Faculty of Letters, Kobe University

「NPCMJコーパスを用いた研究事例 ―実例から見るトキ節のテンス解釈 ―」 [presentation materials]
鈴木彩香 (国立国語研究所)

「NPCMJコーパスを用いた研究事例 ―否定極性項目の節を超えた認可と副詞タイプについて ―」[presentation materials]
井戸美里 (国立国語研究所)

“From Keyaki to ABC: A treebank conversion project” [presentation materials]
Yusuke Kubota (University of Tsukuba) and Koji Mineshima (Ochanomizu University)

“Parsed corpus annotation (ad)ventures” [presentation materials]
Alastair Butler, Stephen Wright Horn and Iku Nagasaki (NINJAL)
Susanne Miyata (Aichi Shukotoku University), Zhou Zhen and Kei Yoshimoto (Tohoku University)

Collaborative Research Project Meeting

Date: 9 June 2017 (Fri)
Venue: NINJAL

“Japanese, English and Polish and the Typology of Tense” [presentation materials]
OGIHARA Toshiyuki (Associate Professor, University of Washington)

Collaborative Research Project Meeting

Date: 4 March 2017 (Sat)
Venue: Tohoku University

「統語・意味解析コーパスの開発と言語研究」
プラシャント・パルデシ（国立国語研究所）

「教材開発と構造体コーパス：現状と課題」[presentation materials]
岸本秀樹（神戸大学）

“How to annotate what”
Kei Yoshimoto （東北大学）

“The Keyaki Treebank and the NPCMJ: Bridging a growing divide”[presentation materials]
Stephen Wright Horn, Alastair Butler （国立国語研究所）

“Grammatical principles for annotation and query”[presentation materials]
Stephen Wright Horn

「日本語学習者の話し言葉データにみられる中間言語の諸相」
堀田智子（東北大学）

「名詞修飾のアノテーションについて」
檜山祥太（東北大学）佐藤亮輔（東北大学）周振（東北大学）

「言語資料としての国会会議録」[presentation materials]
金城由美子（国立国語研究所）

「NPCMJを活用したWord2Vec語彙学習過程の改善案」[presentation materials]
岸山健（国立国語研究所/東京大学）

「分裂文のアノテーションについて」
折笠誠（国立国語研究所/上智大学）