The TGrep language by Richard Pito
formulates queries as patterns that consist of expressions
to match tree nodes and relationships defining links or negated links to other tree nodes.
Nodes of searched trees are matched either with
simple character strings or regular expressions (see sections 1).
A complex expression consists of a node expression followed by relationships,
as presented in section 2.
Possible relationships are illustrated in section 3.
When you compose a search expression in the box provided,
if the search expression uses TGrep-lite syntax,
the interface will recognize that and interpret the expression accordingly.
If the expression is well-formed but there are no matches in the corpus,
the screen shows no change after the “Submit” button is pressed.
If the expression is not well-formed, the warning
“Not a valid TGrep-lite expression.”
appears.
In the result page, if you check the “reveal” box and resubmit,
a translation of the TGrep-lite expression into XPath syntax is displayed.
This allows you to check whether the expression you have composed reflects the search that is intended.
The TGrep functionality available here is referred to as “TGrep-lite”.
This is less expressive than the full TGrep language implemented in the original TGrep program.
In particular,
the expression of relationships between nodes is limited to the relations detailed in section 3.
TGrep-lite is especially weak when compared to enhanced TGrep languages
available with TGrep2
and Tregex implementations,
notably,
missing the ability to express disjunctions of relations.
TGrep-lite will also exhibit behaviour distinct from what is expected from TGrep
with regards to how nodes are specified
(described in section 1).
Despite these mentioned limitations,
TGrep-lite is the easiest and most accessible way to search the corpus using this on-line interface,
and it is a powerful search language.
TGrep-lite works by rewriting expressions of a modified TGrep language into XPath queries over a database of XML encoded trees.
The formatting of the XML
requires
that the rewrite to XPath distinguishes
three different “node” kinds expressed with TGrep-lite node patterns:
word nodes,
pre-terminal nodes (that provide information about the word, e.g., the word is a zero element), and
part-of-speech/phrase-level nodes.
The wild card (“__”) is exceptional in not needing to distinguish its node kind,
since it will match all nodes.
A simple constant string,
such as “abc”, etc.,
will match word nodes that are the unique string abc.
The expression of all other node patterns
occurs as the statement of a regular expression
with deliminators to
determine the kind of the node searched.
Specifically:
a regular expression indicated by surrounding slashes (“/”), such as “/ab/”, will search for word nodes,
a regular expression indicated by surrounding curly brackets (“{”, “}”), such as “{AB}”, will search for pre-terminal nodes, and
a regular expression indicated by surrounding square brackets (“[”, “]”), such as “[AB]”, will search for part-of-speech/phrase-level nodes.
If a simple constant string or deliminated regular expression begins with “!”,
the matching process will be complemented. That is, matches will turn into non-matches,
and vice-versa.
For example, “!abc” will match all words that are not abc,
and “![^NP] will match any part-of-speech or phrase-level node that does not start with NP.
Specified as a string,
a regular expression matches a node if there is a part of the node that is matched.
For example, “[IP]” matches IP-MAT, IP-ADV, etc.
The caret (“^”) anchors the regular expression to the beginning of a matched node,
while a dollar sign (“$”) as the last character will anchor the regular expression
to the end of a matched node.
Use of both the caret and dollar-sign in “[^NP$]” constrains the match to only NP.
A word boundary can be stated with “\b”.
Thus,
while “[^NP]” will match both NP-SBJ and NPR,
“[^NP\b]” will match only NP-SBJ.
Disjunction can be expressed with the pipe (“|”),
and regular expression elements can be grouped with round brackets,
such that “[^NP-(SBJ|OB1)]” will find nodes that start with either NP-SBJ or NP-OB1.
Note that the on-line interface is case insensitive
when the node is identified as being either a pre-terminal or part-of-speech/phrase-level node,
while being case sensitive for word (terminal) nodes.
TGrep-lite expressions are composed of a node pattern followed by the relationships the node pattern participates in.
Because word information
serves as content of the same node under the XML encoding
as pre-terminal node information,
it becomes necessary if you wish to match the combination of a particular word with a particular pre-terminal node
that the “==” (equals) relation serves to connect this information about the same underlying node.
For example,
the following will find instances of words that contain “tuti” with the “PHON” pre-terminal tag.
The following example,
will match an IP node which immediately dominates a PP node and which dominates an IP node.
Note the parenthesis to ensure that the second relationship “<< [IP]” refers to the first IP and not to the PP.
As another example,
will match an IP which immediately dominates a PP which in turn dominates some IP.
The first node in a pattern or the first node following a left parenthesis is a “master” node which is related to the relationships to its right. Thus, a TGrep-lite pattern consists of a master node for the entire query followed by a series of relationships to other nodes that can themselves with parenthesis form master nodes with relationships to yet other nodes. In the first example above only the first [IP] is a master node, while in the second example both the first [IP] and the [PP] are master nodes.
Relationships define connections between the master node (being defined) and other nodes. There is a complete pairing of forward and backward links, allowing for flexibility in choosing what is the master node. Notable relationships are:
A << B A dominates (is an ancestor of) B
A >> B A is dominated by (is a descendant of) B
A < B A immediately dominates (is the parent of) B
A > B A is immediately dominated by (is the child of) B
A .. B A precedes B
A ,, B A follows B
A . B A immediately precedes B
A , B A immediately follows B
A $ B A is a sister of and not equal to B
A $.. B A is a sister of and precedes B
A $,, B A is a sister of and follows B
A $. B A is a sister of and immediately precedes B
A $, B A is a sister of and immediately follows B
A $, B A is a sister of and immediately follows B
A == B A and B are the same node
A <<, B B is a leftmost descendant of A
A <<- B B is a rightmost descendant of A
A >>, B A is a leftmost descendant of B
A >>- B A is a rightmost descendant of B
A <1 B B is the 1st child of A
A >1 B A is the 1st child of B
A <-1 B B is the last child of A
A >-1 B A is the last child of B
A <, B B is the first child of A (synonymous with A <1 B)
A >, B A is the first child of B (synonymous with A >1 B)
A <- B B is the last child of A (also synonymous with A <-1 B)
A >- B A is the last child of B (also synonymous with A >-1 B)
A <: B B is the only child of A
A >: B A is the only child of B
A <<: B A dominates B via an unbroken chain (length > 0) of unary branches
A >>: B A is dominated by B via an unbroken chain (length > 0) of unary branches
The following presents pictures grouping some of the above relationships as forward and backward links:
C << __ (dominates, is an ancestor of)
__ >> C (is dominated by, is a descendant of)
E >> __ (is dominated by, is a descendant of)
__ << E (dominates, is an ancestor of)
C > __ (immediately dominates, is the parent of)
__ < C (is immediately dominated by, is the child of)
TGrep-lite returns the match for the left-most element in the search pattern. The following pattern matches PPs that are immediately dominated by an IP that dominates an IP:
TGrep-lite search results
Search results are listed in groups of up to twenty five entries,
each with highlighted portions corresponding to the focus of the query.
Immediately following each entry is a link to the tree for that entry in the form of the ID number of that entry.
Following the link
opens a tree view for the result,
with highlighted nodes corresponding to the focus of the search.
When appropriate,
there is a down arrow to click for moving to the next twenty five results,
and an up arrow for moving back.
In addition,
there is an open text area that contains the pattern for the search.
This gives the opportunity to see and also edit the search query.
Clicking the “Submit” button re-submits the possibly edited search.
At the page end,
there is the option to
download results for searches with results of 2000 items or less.
There are three possible forms in which search results can be downloaded:
basic text format, bracket format, and Alpino XML format.
All formats include the text and ID number of each entry.
Bracket format and Alpino XML format include all the syntactic information encoded for each entry.
Each line of text with the “basic text format” is a tab separated numbered entry,
and the number of the last entry is equal to the number of results for the search.
* ipa basiru * tagi mo todoro ni * naku semi no * kowe wo si kikeba * miyakwo si omopoyu MYS.15.3617
* sapo gapa ni * sa basiru tidori * ywo gutatite * na ga kowe kikeba * i ne gate naku ni MYS.7.1124
* paya pito no * se two no ipapo mo * ayu pasiru * yosinwo no tagi ni * napo sika zu kyeri MYS.6.960
* ipa basiru * taru mi no midu no * pasikiyasi * kimi ni kwopu raku * wa ga kokoro kara MYS.12.3025
* yasumisisi * wa go opo kimi * taka terasu * pi no mi kwo * ara tape no * pudipara ga upe ni * wosu kuni wo * myesi tamapa mu to * miaraka pa * taka sira sa mu to * kamu nagara * omoposu nabeni * ame tuti mo * yorite are koso * ipa basiru * apumi no kuni no * koromo de no * tanakami yama no * ma kwi saku * pi no tumade wo * mononopu no * ya swo udi gapa ni * tama mo nasu * uka be nagas e re * so wo toru to * sawaku mi tami mo * ipye wasure * mwi mo tana sira zu * kamo zimono * midu ni uki wite * wa ga tukuru * pi no mikadwo ni * sira nu kuni * yosi kose di ywori * wa ga kuni pa * toko yo ni nara mu * pumi op ye ru * ayasi ki kame mo * arata yo to * idumi no kapa ni * moti kwos e ru * ma kwi no tumade wo * momo tara zu * ikada ni tukuri * nobo su ramu * iswopaku mireba * kamu nagara na ra si MYS.1.50
* paru no pi no * kasum yeru toki ni * suminoye no * kwisi ni ide wite * turi bune no * toworapu mireba * inisipye no * koto so omopoyuru * miduno ye no * urasima no kwo ga * katuwo turi * tapi turi pokori * nanu ka made * ipye ni mo ko zute * una saka wo * sugwite kogi yuku ni * watatumi no * kamwi no wotomye ni * tamasaka ni * i kogi mukapi * api atorapi * koto nari sikaba * kaki musubi * toko yo ni itari * watatumi no * kamwi no miya no * uti no pye no * tape naru tono ni * tadusapari * puta ri iri wite * oi mo se zu * sini mo se zusite * naga ki yo ni * ari kyeru mono wo * yo no naka no * oroka pito no * wagimo kwo ni * tugete katara ku * simasi ku pa * ipye ni kapyerite * titi papa ni * koto mo kata rapi * asu no goto * ware pa ki na mu to * ipi kyereba * imo ga ip yera ku * toko yo pye ni * mata kapyeri kite * ima no goto * apa mu to naraba * ko no kusige * piraku na yume to * sokorakuni * kata me si koto wo * suminoye ni * kapyeri ki tarite * ipye mire do * ipye mo mi kanete * satwo mire do * satwo mo mi kanete * ayasi * to * soko ni omopa ku * ipye yu idete * mi tose no apida ni * kaki mo na ku * ipye use me ya to * ko no pakwo wo * pirakite mi teba * moto no goto * ipye pa ara mu to * tama kusige * sukwosi piraku ni * sira kumo no * pakwo ywori idete * toko yo pye ni * tanabiki nure ba * tati pasiri * sakyebi swode puri * koi marobi * asi zuri situtu * tatimatini * kokoro ke use nu * waka kari si * pada mo siwami nu * kurwo kari si * kami mo sirake nu * yunayuna pa * iki sape tayete * noti tupi ni * inoti sini kyeru * miduno ye no * urasima no kwo ga * ipye tokoro mi yu MYS.9.1740
* mitegura wo * nara ywori idete * midu tade * podumi ni itari * tonami paru * sakate wo sugwi * ipa basiru * kamunabwi yama ni * asa miya ni * tukape maturite * yosinwo pye to * iri masu mireba * inisipye omopoyu MYS.13.3230
* subye mo na ku * kuru si ku areba * ide pasiri * inana to omopedo * kwo ra ni sayari nu MYS.5.899
* oti tagitu * pasiri wi no midu no * kiywo ku areba * okite pa ware pa * yuki kate nu kamo MYS.7.1127
* utusomi to * omopi si toki ni * tori motite * wa ga puta ri mi si * pasiri de no * tutumi ni tat eru * tukwi no kwi no * koti goti no ye no * paru no pa no * sige ki ga goto ku * omop yeri si * imo ni pa aredo * tanom yeri si * kwo ra ni pa aredo * yo no naka wo * so muki si e neba * kagirwo pwi no * moyu ru ara nwo ni * sirwo tape no * ama pire gakuri * tori zimono * asa dati imasite * iri pi nasu * kakuri ni sikaba * wagimo kwo ga * katami ni ok yeru * midorikwo no * kopi naku goto ni * tori atapuru * mono si na kyereba * wotokwo zimono * waki basami moti * wagimo kwo to * puta ri wa ga ne si * makura duku * tuma ya no uti ni * piru pa mo * urasabwi kura si * yworu pa mo * ikiduki aka si * nagekedomo * se mu subye sira ni * kwopuredomo * apu yosi wo na mi * opo tori no * pagapi no yama ni * wa ga kwopu ru * imo pa i masu to * pito no ipeba * ipa ne sakumite * na dumi ko si * yo kyeku mo zo na ki * utusemi to * omopi si imo ga * tama kagiru * ponoka ni dani mo * mi ye nu omopeba MYS.2.210b
* opo kimi no * topo no mikadwo so * mi yuki puru * kwosi to na ni op yeru * ama zakaru * pina ni si areba * yama taka mi * kapa toposiro si * nwo wo piro mi * kusa koso sigye ki * ayu pasiru * natu no sakari to * sima tu tori * u kapi ga tomo pa * yuku kapa no * kiywo ki se gotoni * kagari sasi * nadusapi noboru * tuyu simo no * aki ni itareba * nwo mo sapa ni * tori sudak yeri to * masura wo no * tomo izanapite * taka pa si mo * amata aredomo * ya kata wo no * a ga opo kurwo ni * sira nuri no * suzu tori tukete * asa kari ni * i po tu tori tate * yupu kari ni * ti dori pumi tate * opu gotoni * yurusu koto na ku * ta banare mo * woti mo ka yasu ki * kore wo okite * mata pa ari gata si * sa narab yeru * taka pa na kye mu to * kokoro ni pa * omopi pokorite * wemapitutu * wataru apida ni * tabure taru * siko tu okina no * koto dani mo * ware ni pa tuge zu * tonogumori * ame no puru pi wo * to gari su to * na nomwi wo norite * misimanwo wo * sogapini mi tutu * putagami no * yama tobi kwoyete * kumo gakuri * kakeri ini ki to * kapyeri kite * sipabure tugure * woku yosi no * soko ni na kyereba * ipu subye no * tadoki wo sira ni * kokoro ni pa * pwi sape moyetutu * omopi kwopwi * ikiduki amari * kedasiku mo * apu koto ari ya to * asipikwi no * wotemo konomo ni * tonami pari * mori bye wo suwete * tipayaburu * kamwi no yasiro ni * teru kagami * situ ni tori swope * kopi nomite * a ga matu toki ni * wotomye ra ga * ime ni tuguraku * na ga kwopuru * so no potu taka pa * matudaye no * pama yuki gurasi * tunasi toru * pimi no ye sugwi te * takwo no sima * tobi ta motopori * asi gamo no * sudaku puruye ni * woto tu pi mo * kinopu mo ari tu * tika ku araba * ima putu ka damwi * topo ku araba * nanu ka no woti pa * sugwi me ya mo * ki na mu wa ga sekwo * nemokoro ni * na kwopwi so yo to so * ima ni tuge turu MYS.17.4011
* kake maku mo * ayani kasikwo si * ipa maku mo * yuyu si ki kamo * wa go opo kimi * mi kwo no mikoto * yorodu yo ni * myesi tamapa masi * opo yamato * kuni no miyakwo pa * uti nabiku * paru sari nureba * yama pye ni pa * pana saki wowori * kapa se ni pa * ayu kwo sa basiri * iya pi kyeni * sakayuru toki ni * oyodure no * tapa koto to ka mo * sirwo tape ni * toneri yosopi te * waduka yama * mi kosi tata site * pisakata no * ame sira si nure * koi marobi * pidu ti nakedomo * se mu subye mo na si MYS.3.475
* komori ku no * patuse no yama * awo pata no * osaka no yama pa * pasiri de no * yorosi ki yama no * ide tati no * kupasi ki yama zo * atarasi ki * yama no * are maku wosi * mo MYS.13.3331
* tama ta suki * unebwi no yama no * kasipara no * piziri no miya yu * are masi si * kamwi no kotogoto * tuga no kwi no * iya tugitugi ni * ame no sita * sira si myesi kyeru * swora mitu * yamato wo oki * awo ni yo si * nara yama kwoyete * ika sama ni * omoposi kye me ka * ama zakaru * pina ni pa aredo * ipa basiru * apumi no kuni no * sasanami no * opotu no miya ni * ame no sita * sira si myesi kye mu * sumyeroki no * kamwi no mikoto no * opo miya pa * koko to kikedomo * opo tono pa * koko to ipedomo * paru kusa no * sige ku opwi taru * kasumi tati * paru pi ka kwir e ru * natu kusa ka * sige ku nari nuru * momo sikwi no * opo miya tokoro * mireba sabusi * mo MYS.1.29b
* ara tama no * tosi yuki kapari * paru sareba * pana nomwi nipopu * asipikwi no * yama sita toyomi * oti tagiti * nagaru sakita no * kapa no se ni * ayu kwo sa basiru * sima tu tori * u kapi tomonape * kagari sasi * nadusapi yukeba * wagimo kwo ga * kata mi gatera to * kurenawi no * ya sipo ni somete * okose taru * koromo no suswo mo * toporite nure nu MYS.19.4156
* simo no upe ni * arare ta basiri * iya masi ni * are pa mawi ko mu * tosi no wo naga ku MYS.20.4298
* tama ta suki * unebwi no yama no * kasipara no * piziri no mi yo yu * are masi si * kamwi no kotogoto * tuga no kwi no * iya tugitugi ni * ame no sita * sira si myesi si wo * swora ni mitu * yamato wo okite * awo ni yo si * nara yama wo kwoye * ika sama ni * omoposi myese ka * ama zakaru * pina ni pa aredo * ipa basiru * apumi no kuni no * sasanami no * opotu no miya ni * ame no sita * sira si myesi kye mu * sumyeroki no * kamwi no mikoto no * opo miya pa * koko to kikedomo * opo tono pa * koko to ipedomo * paru kusa no * sige ku opwi taru * kasumi tati * paru pi no kwir e ru * momo sikwi no * opo miya tokoro * mireba kanasi * mo MYS.1.29a
* wa ga swode ni * arare ta basiru * maki kakusi * keta zute ara mu * imo ga mi mu tame MYS.10.2312
* nanipa tu ni * mi bune pate nu to * kikoye koba * pimo toki sakete * tati basiri se mu MYS.5.896
* paru sareba * wagipye no satwo no * kapa two ni pa * ayu kwo sa basiru * kimi mati gate ni MYS.5.859
* utusemi to * omopi si toki ni * tori motite * wa ga puta ri mi si * pasiri de no * tutumi ni tat eru * tukwi no kwi no * koti goti no ye no * paru no pa no * sige ki ga goto ku * omop yeri si * imo ni pa aredo * tanom yeri si * kwo ra ni pa aredo * yo no naka wo * so muki si e neba * kagirwo pwi no * moyu ru ara nwo ni * sirwo tape no * ama pire gakuri * tori zimono * asa dati imasite * iri pi nasu * kakuri ni sikaba * wagimo kwo ga * katami ni ok yeru * midorikwo no * kopi naku goto ni * tori atapuru * mono si na kyereba * wotokwo zimono * waki basami moti * wagimo kwo to * puta ri wa ga ne si * makura duku * tuma ya no uti ni * piru pa mo * urasabwi kura si * yworu pa mo * ikiduki aka si * nagekedomo * se mu subye sira ni * kwopuredomo * apu yosi wo na mi * opo tori no * pagapi no yama ni * wa ga kwopu ru * imo pa i masu to * pito no ipeba * ipa ne sakumite * na dumi ko si * yo kyeku mo zo na ki * utusemi to * omopi si imo ga * tama kagiru * ponoka ni dani mo * mi ye nu omopeba MYS.2.210a
* tosi no pa ni * ayu si pasira ba * sakita kapa * u ya tu kadukete * kapa se tadune mu MYS.19.4158
* ipa basiri * tagiti nagaruru * patuse gapa * tayuru koto na ku * mata mo kite mi mu MYS.6.991
* ipa basiru * taru mi no upe no * sa warabi no * moye iduru paru ni * nari ni kyeru kamo MYS.8.1418