The TGrep language by Richard Pito
formulates queries as patterns that consist of expressions
to match tree nodes and relationships defining links or negated links to other tree nodes.
Nodes of searched trees are matched either with
simple character strings or regular expressions (see sections 1).
A complex expression consists of a node expression followed by relationships,
as presented in section 2.
Possible relationships are illustrated in section 3.
When you compose a search expression in the box provided,
if the search expression uses TGrep-lite syntax,
the interface will recognize that and interpret the expression accordingly.
If the expression is well-formed but there are no matches in the corpus,
the screen shows no change after the “Submit” button is pressed.
If the expression is not well-formed, the warning
“Not a valid TGrep-lite expression.”
appears.
In the result page, if you check the “reveal” box and resubmit,
a translation of the TGrep-lite expression into XPath syntax is displayed.
This allows you to check whether the expression you have composed reflects the search that is intended.
The TGrep functionality available here is referred to as “TGrep-lite”.
This is less expressive than the full TGrep language implemented in the original TGrep program.
In particular,
the expression of relationships between nodes is limited to the relations detailed in section 3.
TGrep-lite is especially weak when compared to enhanced TGrep languages
available with TGrep2
and Tregex implementations,
notably,
missing the ability to express disjunctions of relations.
TGrep-lite will also exhibit behaviour distinct from what is expected from TGrep
with regards to how nodes are specified
(described in section 1).
Despite these mentioned limitations,
TGrep-lite is the easiest and most accessible way to search the corpus using this on-line interface,
and it is a powerful search language.
TGrep-lite works by rewriting expressions of a modified TGrep language into XPath queries over a database of XML encoded trees.
The formatting of the XML
requires
that the rewrite to XPath distinguishes
three different “node” kinds expressed with TGrep-lite node patterns:
word nodes,
pre-terminal nodes (that provide information about the word, e.g., the word is a zero element), and
part-of-speech/phrase-level nodes.
The wild card (“__”) is exceptional in not needing to distinguish its node kind,
since it will match all nodes.
A simple constant string,
such as “abc”, etc.,
will match word nodes that are the unique string abc.
The expression of all other node patterns
occurs as the statement of a regular expression
with deliminators to
determine the kind of the node searched.
Specifically:
a regular expression indicated by surrounding slashes (“/”), such as “/ab/”, will search for word nodes,
a regular expression indicated by surrounding curly brackets (“{”, “}”), such as “{AB}”, will search for pre-terminal nodes, and
a regular expression indicated by surrounding square brackets (“[”, “]”), such as “[AB]”, will search for part-of-speech/phrase-level nodes.
If a simple constant string or deliminated regular expression begins with “!”,
the matching process will be complemented. That is, matches will turn into non-matches,
and vice-versa.
For example, “!abc” will match all words that are not abc,
and “![^NP] will match any part-of-speech or phrase-level node that does not start with NP.
Specified as a string,
a regular expression matches a node if there is a part of the node that is matched.
For example, “[IP]” matches IP-MAT, IP-ADV, etc.
The caret (“^”) anchors the regular expression to the beginning of a matched node,
while a dollar sign (“$”) as the last character will anchor the regular expression
to the end of a matched node.
Use of both the caret and dollar-sign in “[^NP$]” constrains the match to only NP.
A word boundary can be stated with “\b”.
Thus,
while “[^NP]” will match both NP-SBJ and NPR,
“[^NP\b]” will match only NP-SBJ.
Disjunction can be expressed with the pipe (“|”),
and regular expression elements can be grouped with round brackets,
such that “[^NP-(SBJ|OB1)]” will find nodes that start with either NP-SBJ or NP-OB1.
Note that the on-line interface is case insensitive
when the node is identified as being either a pre-terminal or part-of-speech/phrase-level node,
while being case sensitive for word (terminal) nodes.
TGrep-lite expressions are composed of a node pattern followed by the relationships the node pattern participates in.
Because word information
serves as content of the same node under the XML encoding
as pre-terminal node information,
it becomes necessary if you wish to match the combination of a particular word with a particular pre-terminal node
that the “==” (equals) relation serves to connect this information about the same underlying node.
For example,
the following will find instances of words that contain “tuti” with the “PHON” pre-terminal tag.
The following example,
will match an IP node which immediately dominates a PP node and which dominates an IP node.
Note the parenthesis to ensure that the second relationship “<< [IP]” refers to the first IP and not to the PP.
As another example,
will match an IP which immediately dominates a PP which in turn dominates some IP.
The first node in a pattern or the first node following a left parenthesis is a “master” node which is related to the relationships to its right. Thus, a TGrep-lite pattern consists of a master node for the entire query followed by a series of relationships to other nodes that can themselves with parenthesis form master nodes with relationships to yet other nodes. In the first example above only the first [IP] is a master node, while in the second example both the first [IP] and the [PP] are master nodes.
Relationships define connections between the master node (being defined) and other nodes. There is a complete pairing of forward and backward links, allowing for flexibility in choosing what is the master node. Notable relationships are:
A << B A dominates (is an ancestor of) B
A >> B A is dominated by (is a descendant of) B
A < B A immediately dominates (is the parent of) B
A > B A is immediately dominated by (is the child of) B
A .. B A precedes B
A ,, B A follows B
A . B A immediately precedes B
A , B A immediately follows B
A $ B A is a sister of and not equal to B
A $.. B A is a sister of and precedes B
A $,, B A is a sister of and follows B
A $. B A is a sister of and immediately precedes B
A $, B A is a sister of and immediately follows B
A $, B A is a sister of and immediately follows B
A == B A and B are the same node
A <<, B B is a leftmost descendant of A
A <<- B B is a rightmost descendant of A
A >>, B A is a leftmost descendant of B
A >>- B A is a rightmost descendant of B
A <1 B B is the 1st child of A
A >1 B A is the 1st child of B
A <-1 B B is the last child of A
A >-1 B A is the last child of B
A <, B B is the first child of A (synonymous with A <1 B)
A >, B A is the first child of B (synonymous with A >1 B)
A <- B B is the last child of A (also synonymous with A <-1 B)
A >- B A is the last child of B (also synonymous with A >-1 B)
A <: B B is the only child of A
A >: B A is the only child of B
A <<: B A dominates B via an unbroken chain (length > 0) of unary branches
A >>: B A is dominated by B via an unbroken chain (length > 0) of unary branches
The following presents pictures grouping some of the above relationships as forward and backward links:
C << __ (dominates, is an ancestor of)
__ >> C (is dominated by, is a descendant of)
E >> __ (is dominated by, is a descendant of)
__ << E (dominates, is an ancestor of)
C > __ (immediately dominates, is the parent of)
__ < C (is immediately dominated by, is the child of)
TGrep-lite returns the match for the left-most element in the search pattern. The following pattern matches PPs that are immediately dominated by an IP that dominates an IP:
TGrep-lite search results
Search results are listed in groups of up to twenty five entries,
each with highlighted portions corresponding to the focus of the query.
Immediately following each entry is a link to the tree for that entry in the form of the ID number of that entry.
Following the link
opens a tree view for the result,
with highlighted nodes corresponding to the focus of the search.
When appropriate,
there is a down arrow to click for moving to the next twenty five results,
and an up arrow for moving back.
In addition,
there is an open text area that contains the pattern for the search.
This gives the opportunity to see and also edit the search query.
Clicking the “Submit” button re-submits the possibly edited search.
At the page end,
there is the option to
download results for searches with results of 2000 items or less.
There are three possible forms in which search results can be downloaded:
basic text format, bracket format, and Alpino XML format.
All formats include the text and ID number of each entry.
Bracket format and Alpino XML format include all the syntactic information encoded for each entry.
Each line of text with the “basic text format” is a tab separated numbered entry,
and the number of the last entry is equal to the number of results for the search.
* sanwo yama ni * utu ya wonoto no * topo kadomo * ne mo to ka kwo ro ga * omo ni mi ye turu MYS.14.3473
* tosi mo pe zu * kapyeri ki na mu to * asa kage ni * matu ramu imo si * omo kage ni mi yu MYS.12.3138
* yopi ni apite * asita omo na mi * nabari ni ka * ke naga ku imo ga * ipo ri s e ri kye mu MYS.1.60
* omo wasure * dani mo e su ya to * ta nigirite * utedomo kori zu * kwopwi to ipu yatukwo MYS.11.2574
* api mite pa * omo kakusa ruru * mono karani * tugite mi maku no * posi ki kimi kamo MYS.11.2554
* ywo no podoro * wa ga idete kureba * wagimo kwo ga * omop yeri siku si * omo kage ni mi yu MYS.4.754
* tama mo yo si * sanuki no kuni pa * kuni kara ka * miredomo aka nu * kamu kara ka * kokoda taputwo ki * ame tuti * pi tukwi to tomo ni * tari yuka mu * kamwi no mi omo to * tugite kuru * naka no mina two yu * pune ukete * wa ga kogi kureba * toki tu kaze * kumo wi ni puku ni * oki mireba * towi nami tati * pye mireba * sira nami sawaku * isana tori * umi wo kasikwo mi * yuku pune no * kadi piki worite * woti koti no * sima pa opo kyedo * na kupa si * * sa mine no sima no * ariswo omo ni * iporite mireba * nami no to no * sige ki pama pye wo * siki tape no * makura ni nasite * ara doko ni * koro pusu kimi ga * ipye siraba * yukite mo tuge mu * tuma siraba * ki mo topa masi wo * tama poko no * miti dani sira zu * opoposi ku * mati ka kwopu ramu * pasi ki tuma ra pa MYS.2.220
* tama mo yo si * sanuki no kuni pa * kuni kara ka * miredomo aka nu * kamu kara ka * kokoda taputwo ki * ame tuti * pi tukwi to tomo ni * tari yuka mu * kamwi no mi omo to * tugite kuru * naka no mina two yu * pune ukete * wa ga kogi kureba * toki tu kaze * kumo wi ni puku ni * oki mireba * towi nami tati * pye mireba * sira nami sawaku * isana tori * umi wo kasikwo mi * yuku pune no * kadi piki worite * woti koti no * sima pa opo kyedo * na kupa si * * sa mine no sima no * ariswo omo ni * iporite mireba * nami no to no * sige ki pama pye wo * siki tape no * makura ni nasite * ara doko ni * koro pusu kimi ga * ipye siraba * yukite mo tuge mu * tuma siraba * ki mo topa masi wo * tama poko no * miti dani sira zu * opoposi ku * mati ka kwopu ramu * pasi ki tuma ra pa MYS.2.220
* kamwi no goto * kikoyuru tagi no * sira nami no * omo siru kimi ga * mi ye nu ko no koro MYS.12.3015
* moyuru pwi mo * torite tutumite * pukurwo ni pa * iru to ipa zu ya * omo sira naku mo MYS.2.160
* satwo topo mi * kwopwi wabwi ni kyeri * maswo kagami * omo kage sara zu * ime ni mi ye koso MYS.11.2634
* wagimo kwo ga * wemapi maywo biki * omo kage ni * kakarite motona * omopoyuru kamo MYS.12.2900
* omo kata no * wasure mu sida pa * opo nwo ro ni * tanabiku kumo wo * mi tutu sinwopa mu MYS.14.3520
* kake maku mo * yuyu si ki kamo * ipa ma ku mo * ayani kasikwo ki * asuka no * makamwi no para ni * pisakata no * ama tu mikadwo wo * kasikwo ku mo * sadame tamapite * kamu sabu to * ipa gakuri masu * yasumisisi * wa go opo kimi no * kikosi myesu * so t omo no kuni no * ma kwi tatu * pupa yama kwoyete * koma turugi * waza mi ga para no * kari miya ni * amori imasite * ame no sita * wosame tamapi * wosu kuni wo * sadame tamapu to * tori ga naku * aduma no kuni no * mi ikusa wo * myesi tamapite * tipayaburu * pito wo yapase to * maturwopa nu * kuni wo wosame to * mi kwo nagara * make tamapeba * opomi mwi ni * tati tori oba si * opomi te ni * yumi tori mota si * mi ikusa wo * adomopi tamapi * totonopu ru * tudumi no oto pa * ikaduti no * oto to kiku made * puki nas e ru * kuda no oto mo * ata mi taru * twora ka poyuru to * moro pito no * obiyu ru made ni * sasage taru * pata no nabiki pa * puyu gomori * paru sari kureba * nwo goto ni * tukite aru pwi no * kaze no muta * nabiku ga goto ku * tori mot eru * yu pazu no sawaki * mi yuki puru * puyu no payasi ni * tumuzi ka mo * i maki wataru to * omopu made * kiki no kasikwo ku * piki panatu * ya no sige kyeku * opo yuki no * midarete ki ta re * maturwopa zu * tati mukapi si mo * tuyu simo no * ke na ba ke nu be ku * yuku tori no * araswopu pasi ni * watarapi no * ituki no miya yu * kamu kaze ni * i puki matwopasi * ama kumo wo * pi no me mo mise zu * toko yamwi ni * opopi tamapite * sadame te si * midupo no kuni wo * kamu nagara * putwo siki masite * yasumisisi * wa go opo kimi no * ame no sita * mawosi tamapeba * yorodu yo ni * sika si mo ara mu to * yupu bana no * sakayuru toki ni * wa go opo kimi * mi kwo no mikadwo wo * kamu miya ni * yosopi maturite * tukapa si si * mikadwo no pito mo * sirwo tape no * asa goromo kite * paniyasu no * mikadwo no para ni * aka ne sasu * pi no kotogoto * sisi zi mono * i papi pusi tutu * nubatama no * yupu pye ni nareba * opo tono wo * purisake mitutu * udura nasu * i papi motopori * samorapedo * samorapi e neba * paru tori no * sa maywopi nure ba * nageki mo * imada sugwi nu ni * omopi mo * imada tukwi neba * koto sapyeku * kudara no para yu * kamu paburi * paburi i masite * asamoyosi * kwinope no miya wo * toko miya to * taka ku maturite * kamu nagara * sidumari masi nu * sikaredomo * wa go opo kimi no * yorodu yo to * omoposi myesite * tuku ra si si * kagu yama no miya * yorodu yo ni * sugwi mu to omope ya * ame no goto * purisake mitutu * tama ta suki * kakete sinwopa mu * kasikwo kare domo MYS.2.199a
* kaku bakari * omo kage ni nomwi * omopoyeba * ika ni ka mo se mu * pito me sige kute MYS.4.752
* aratu no umi * ware nusa maturi * ipapi te mu * paya kapyeri mase * omo gapari se zu MYS.12.3217
* saka kwoyete * abe no ta no mo ni * wiru tadu no * tomosi ki kimi pa * asu sape moga mo MYS.14.3523
* topo ki imo ga * purisake mitutu * sinwopu ramu * ko no tukwi no omo ni * kumo na tanabiki MYS.11.2460
* kake maku mo * yuyusi kyeredomo * ipa ma ku mo * ayani kasikwo ki * asuka no * makamwi no para ni * pisakata no * ama tu mikadwo wo * kasikwo ku mo * sadame tamapite * kamu sabu to * ipa gakuri masu * yasumisisi * wa go opo kimi no * kikosi myesu * so t omo no kuni no * ma kwi tatu * pupa yama kwoyete * koma turugi * waza mi ga para no * kari miya ni * amori imasite * ame no sita * parapi tamapite * wosu kuni wo * sadame tamapu to * tori ga naku * aduma no kuni no * mi ikusa wo * myesi tamapite * tipayaburu * pito wo yapase to * maturwopa nu * kuni wo para pye to * mi kwo nagara * make tamapeba * opomi mwi ni * tati tori oba si * opomi te ni * yumi tori mota si * mi ikusa wo * adomopi tamapi * totonopu ru * tudumi no oto pa * ikaduti no * oto to kiku made * puki nas e ru * puye no oto pa * ata mi taru * twora ka poyuru to * moro pito no * kiki matwopu made * sasage taru * pata no nabiki pa * puyu gomori * paru nwo * yaku pwi no * kaze no muta * nabiku ga goto ku * tori mot eru * yu pazu no sawaki * mi yuki puru * yupu no payasi ni * tumuzi ka mo * i maki wataru to * moro pito no * mi matwopu made ni * piki panatu * ya no sige kyeku * arare nasu * soti yori kureba * maturwopa zu * tati mukapi si mo * asa simo no * ke naba ke to pu ni * utusemi to * araswopu pasi ni * watarapi no * ituki no miya yu * kamu kaze ni * i puki matwopasi * ama kumo wo * pi no me mo mise zu * toko yamwi ni * opopi tamapite * sadame te si * midupo no kuni wo * kamu nagara * putwo siki masite * yasumisisi * wa go opo kimi no * ame no sita * mawosi tamapeba * yorodu yo ni * kaku si mo ara mu to * yupu bana no * sakayuru toki ni * sasutake no * mi kwo no mikadwo wo * kamu miya ni * yosopi maturite * tukapa si si * mikadwo no pito mo * sirwo tape no * asa goromo kite * paniyasu no * mikadwo no para ni * aka ne sasu * pi no kotogoto * sisi zi mono * i papi pusi tutu * nubatama no * yupu pye ni nareba * opo tono wo * purisake mitutu * udura nasu * i papi motopori * samorapedo * samorapi e neba * paru tori no * sa maywopi nure ba * nageki mo * imada sugwi nu ni * omopi mo * imada tukwi neba * koto sapyeku * kudara no para yu * kamu paburi * paburi i masite * asamoyosi * kwinope no miya wo * toko miya to * taka ku maturite * kamu nagara * sidumari masi nu * sikaredomo * wa go opo kimi no * yorodu yo to * omoposi myesite * tuku ra si si * kagu yama no miya * yorodu yo ni * sugwi mu to omope ya * ame no goto * purisake mitutu * tama ta suki * kakete sinwopa mu * kasikwo kare domo MYS.2.199b
* siki tape no * koromo de karete * wa wo matu to * aru ramu kwo ra pa * omo kage ni mi yu MYS.11.2607
* omo kata no * wasurete araba * adukina ku * wonokwo zimono ya * kwopwitutu wora mu MYS.11.2580
* opo kimi no * maki no manima ni * tori motite * tukapuru kuni no * tosi no uti no * koto katane moti * tama poko no * miti ni ide tati * ipa ne pumi * yama kwoye nwo yuki * miyakwo pye ni * mawi si wa ga se wo * ara tama no * tosi yuki gapyeri * tukwi kasane * mi nu pi sa mane mi * kwopuru sora * yasu ku si ara neba * pototogisu * ki naku satukwi no * ayamye gusa * yomogwi kaduraki * saka miduki * aswobi naguredo * imidu kapa * yuki ke papurite * yuku midu no * iya masini nomwi * tadu ga naku * nagwo ye no suge no * nemokoro ni * omopi musubore * nagekitutu * a ga matu kimi ga * koto wopari * kapyeri makarite * natu no nwo no * sa yuri no pana no * pana wemi ni * nipubuni wemite * apa si taru * kyepu wo pazimete * kagami nasu * kaku si tune mi mu * omo gapari se zu MYS.18.4116
* topo ku areba * sugata pa mi ye zu * tune no goto * imo ga wemapi pa * omo kage nisite MYS.12.3137
* midu kukwi no * woka no kuzu pa wo * puki kapyesi * omo siru kwo ra ga * mi ye nu koro kamo MYS.12.3068
* tomosibwi no * kage ni kagaywopu * utusemi no * imo ga wema pi si * omo kage ni mi yu MYS.11.2642