The TGrep language by Richard Pito
formulates queries as patterns that consist of expressions
to match tree nodes and relationships defining links or negated links to other tree nodes.
Nodes of searched trees are matched either with
simple character strings or regular expressions (see sections 1).
A complex expression consists of a node expression followed by relationships,
as presented in section 2.
Possible relationships are illustrated in section 3.
When you compose a search expression in the box provided,
if the search expression uses TGrep-lite syntax,
the interface will recognize that and interpret the expression accordingly.
If the expression is well-formed but there are no matches in the corpus,
the screen shows no change after the “Submit” button is pressed.
If the expression is not well-formed, the warning
“Not a valid TGrep-lite expression.”
appears.
In the result page, if you check the “reveal” box and resubmit,
a translation of the TGrep-lite expression into XPath syntax is displayed.
This allows you to check whether the expression you have composed reflects the search that is intended.
The TGrep functionality available here is referred to as “TGrep-lite”.
This is less expressive than the full TGrep language implemented in the original TGrep program.
In particular,
the expression of relationships between nodes is limited to the relations detailed in section 3.
TGrep-lite is especially weak when compared to enhanced TGrep languages
available with TGrep2
and Tregex implementations,
notably,
missing the ability to express disjunctions of relations.
TGrep-lite will also exhibit behaviour distinct from what is expected from TGrep
with regards to how nodes are specified
(described in section 1).
Despite these mentioned limitations,
TGrep-lite is the easiest and most accessible way to search the corpus using this on-line interface,
and it is a powerful search language.
TGrep-lite works by rewriting expressions of a modified TGrep language into XPath queries over a database of XML encoded trees.
The formatting of the XML
requires
that the rewrite to XPath distinguishes
three different “node” kinds expressed with TGrep-lite node patterns:
word nodes,
pre-terminal nodes (that provide information about the word, e.g., the word is a zero element), and
part-of-speech/phrase-level nodes.
The wild card (“__”) is exceptional in not needing to distinguish its node kind,
since it will match all nodes.
A simple constant string,
such as “abc”, etc.,
will match word nodes that are the unique string abc.
The expression of all other node patterns
occurs as the statement of a regular expression
with deliminators to
determine the kind of the node searched.
Specifically:
a regular expression indicated by surrounding slashes (“/”), such as “/ab/”, will search for word nodes,
a regular expression indicated by surrounding curly brackets (“{”, “}”), such as “{AB}”, will search for pre-terminal nodes, and
a regular expression indicated by surrounding square brackets (“[”, “]”), such as “[AB]”, will search for part-of-speech/phrase-level nodes.
If a simple constant string or deliminated regular expression begins with “!”,
the matching process will be complemented. That is, matches will turn into non-matches,
and vice-versa.
For example, “!abc” will match all words that are not abc,
and “![^NP] will match any part-of-speech or phrase-level node that does not start with NP.
Specified as a string,
a regular expression matches a node if there is a part of the node that is matched.
For example, “[IP]” matches IP-MAT, IP-ADV, etc.
The caret (“^”) anchors the regular expression to the beginning of a matched node,
while a dollar sign (“$”) as the last character will anchor the regular expression
to the end of a matched node.
Use of both the caret and dollar-sign in “[^NP$]” constrains the match to only NP.
A word boundary can be stated with “\b”.
Thus,
while “[^NP]” will match both NP-SBJ and NPR,
“[^NP\b]” will match only NP-SBJ.
Disjunction can be expressed with the pipe (“|”),
and regular expression elements can be grouped with round brackets,
such that “[^NP-(SBJ|OB1)]” will find nodes that start with either NP-SBJ or NP-OB1.
Note that the on-line interface is case insensitive
when the node is identified as being either a pre-terminal or part-of-speech/phrase-level node,
while being case sensitive for word (terminal) nodes.
TGrep-lite expressions are composed of a node pattern followed by the relationships the node pattern participates in.
Because word information
serves as content of the same node under the XML encoding
as pre-terminal node information,
it becomes necessary if you wish to match the combination of a particular word with a particular pre-terminal node
that the “==” (equals) relation serves to connect this information about the same underlying node.
For example,
the following will find instances of words that contain “tuti” with the “PHON” pre-terminal tag.
The following example,
will match an IP node which immediately dominates a PP node and which dominates an IP node.
Note the parenthesis to ensure that the second relationship “<< [IP]” refers to the first IP and not to the PP.
As another example,
will match an IP which immediately dominates a PP which in turn dominates some IP.
The first node in a pattern or the first node following a left parenthesis is a “master” node which is related to the relationships to its right. Thus, a TGrep-lite pattern consists of a master node for the entire query followed by a series of relationships to other nodes that can themselves with parenthesis form master nodes with relationships to yet other nodes. In the first example above only the first [IP] is a master node, while in the second example both the first [IP] and the [PP] are master nodes.
Relationships define connections between the master node (being defined) and other nodes. There is a complete pairing of forward and backward links, allowing for flexibility in choosing what is the master node. Notable relationships are:
A << B A dominates (is an ancestor of) B
A >> B A is dominated by (is a descendant of) B
A < B A immediately dominates (is the parent of) B
A > B A is immediately dominated by (is the child of) B
A .. B A precedes B
A ,, B A follows B
A . B A immediately precedes B
A , B A immediately follows B
A $ B A is a sister of and not equal to B
A $.. B A is a sister of and precedes B
A $,, B A is a sister of and follows B
A $. B A is a sister of and immediately precedes B
A $, B A is a sister of and immediately follows B
A $, B A is a sister of and immediately follows B
A == B A and B are the same node
A <<, B B is a leftmost descendant of A
A <<- B B is a rightmost descendant of A
A >>, B A is a leftmost descendant of B
A >>- B A is a rightmost descendant of B
A <1 B B is the 1st child of A
A >1 B A is the 1st child of B
A <-1 B B is the last child of A
A >-1 B A is the last child of B
A <, B B is the first child of A (synonymous with A <1 B)
A >, B A is the first child of B (synonymous with A >1 B)
A <- B B is the last child of A (also synonymous with A <-1 B)
A >- B A is the last child of B (also synonymous with A >-1 B)
A <: B B is the only child of A
A >: B A is the only child of B
A <<: B A dominates B via an unbroken chain (length > 0) of unary branches
A >>: B A is dominated by B via an unbroken chain (length > 0) of unary branches
The following presents pictures grouping some of the above relationships as forward and backward links:
C << __ (dominates, is an ancestor of)
__ >> C (is dominated by, is a descendant of)
E >> __ (is dominated by, is a descendant of)
__ << E (dominates, is an ancestor of)
C > __ (immediately dominates, is the parent of)
__ < C (is immediately dominated by, is the child of)
TGrep-lite returns the match for the left-most element in the search pattern. The following pattern matches PPs that are immediately dominated by an IP that dominates an IP:
TGrep-lite search results
Search results are listed in groups of up to twenty five entries,
each with highlighted portions corresponding to the focus of the query.
Immediately following each entry is a link to the tree for that entry in the form of the ID number of that entry.
Following the link
opens a tree view for the result,
with highlighted nodes corresponding to the focus of the search.
When appropriate,
there is a down arrow to click for moving to the next twenty five results,
and an up arrow for moving back.
In addition,
there is an open text area that contains the pattern for the search.
This gives the opportunity to see and also edit the search query.
Clicking the “Submit” button re-submits the possibly edited search.
At the page end,
there is the option to
download results for searches with results of 2000 items or less.
There are three possible forms in which search results can be downloaded:
basic text format, bracket format, and Alpino XML format.
All formats include the text and ID number of each entry.
Bracket format and Alpino XML format include all the syntactic information encoded for each entry.
Each line of text with the “basic text format” is a tab separated numbered entry,
and the number of the last entry is equal to the number of results for the search.
* taratine no * papa ga ywobu na wo * mawosa medo * miti yuku pito wo * tare to sirite ka MYS.12.3102
* taratine no * papa ni sapa raba * itadura ni * imasi mo ware mo * koto naru be si ya MYS.11.2517
* tuginepu * yamasiro di wo * pito tuma no * uma ywori yuku ni * ono duma si * kati ywori yukeba * miru goto ni * ne nomwi si naka yu * soko omopu ni * kokoro si ita si * taratine no * papa ga katami to * wa ga mot eru * maswomi kagami ni * akidu pire * opi name motite * uma kapye wa ga se MYS.13.3314
* opo kimi no * make no manima ni * masura wo no * kokoro puri okosi * asipikwi no * yama zaka kwoyete * ama zakaru * pina ni kudari ki * iki dani mo * imada yasume zu * tosi tukwi mo * ikura mo ara nu ni * utusemi no * yo no pito nareba * uti nabiki * toko ni koi pusi * ita kyeku si * pi ni kyeni masaru * taratine no * papa no mikoto no * opo pune no * yukurayukura ni * sita gwopwi ni * itu ka mo ko mu to * mata su ramu * kokoro sabusi ku * pasikiyosi * tuma no mikoto mo * ake kureba * two ni yori tati * koromo de wo * wori kapyesitutu * yupu sareba * toko uti parapi * nubatama no * kurwo kami sikite * itu si ka to * nageka su ramu so * imo mo se mo * waka ki kwo domo pa * woti koti ni * sawaki naku ramu * tama poko no * miti wo ta dopo mi * ma dukapi mo * yaru yosi mo na si * omoposi ki * koto tute yara zu * kwopuru ni si * kokoro pa moye nu * tamakiparu * inoti wosi kyedo * se mu subye no * tadoki wo sira ni * kaku site ya * ara si wo surani * nageki puse ramu MYS.17.3962
* taratine no * papa ga te panare * kaku bakari * subye na ki koto pa * imada se na ku ni MYS.11.2368
* sumyeroki no * topo no mikadwo to * kara kuni ni * wataru wa ga se pa * ipye bito no * ipapi mata ne ka * tada mwi ka mo * ayamati si kye mu * aki saraba * kapyeri masa mu to * taratine no * papa ni mawosite * toki mo sugwi * tukwi mo pe nureba * kyepu ka ko mu * asu ka mo ko mu to * ipye bito pa * mati kwopu ramu ni * topo no kuni * imada mo tuka zu * yamato wo mo * topo ku sakarite * ipa ga ne no * ara ki sima ne ni * yadori suru kimi MYS.15.3688
* ara tama no * tosi pa ki yukite * tamadusa no * tukapi no ko neba * kasumi tatu * naga ki paru pi wo * ame tuti ni * omopi tara pasi * taratine no * papa ga kapu kwo no * maywo komori * ikiduki watari * wa ga kwopuru * kokoro no uti wo * pito ni ipu * mono ni si ara neba * matu ga ne no * matu koto topo ku * ama tutapu * pi no kure nure ba * sirwo tape no * wa ga koromo de mo * topori te nure nu MYS.13.3258
* tamadare no * wo su no sukyeki ni * iri kaywopi ko ne * taratine no * papa ga twopa saba * kaze to mawosa mu MYS.11.2364
* sumyeroki no * topo no mikadwo to * siranupi * tukusi no kuni pa * ata mamoru * osape no kwi so to * kikosi wosu * yomo no kuni ni pa * pito sapa ni * mitite pa aredo * tori ga naku * aduma wonokwo pa * ide mukapi * kapyeri mi se zute * isami taru * takye ki ikusa to * negwi tamapi * make no manima ni * taratine no * papa ga me karete * waka kusa no * tuma wo mo maka zu * ara tama no * tukwi pi yomitutu * asi ga tiru * nanipa no mi tu ni * opo pune ni * ma kai sizi nuki * asa nagi ni * kakwo totonope * yupu sipo ni * kadi piki wori * adomopite * kogi yuku kimi pa * nami no ma wo * i yuki sagukumi * ma saki ku mo * paya ku itarite * opo kimi no * mi koto no manima * masura wo no * kokoro wo motite * ari meguri * koto si woparaba * tutumapa zu * kapyeri ki mase to * ipapi be wo * toko pye ni suwete * sirwo tape no * swode wori kapyesi * nubatama no * kurwo kami sikite * naga ki ke wo * mati ka mo kwopwi mu * pasi ki tuma ra pa MYS.20.4331
* tare so ko no * wa ga yadwo ki ywobu * taratine no * papa ni koropa ye * mono omopu ware wo MYS.11.2527
* ame tuti no * pazime no toki yu * utusomi no * ya swo tomo no wo pa * opo kimi ni * maturwopu mono to * sadamar eru * tukasa ni si areba * opo kimi no * mi koto kasikwo mi * pina zakaru * kuni wo wosamu to * asipikwi no * yama kapa pyenari * kaze kumo ni * koto pa kaywopedo * tada ni apa zu * pi no kasanareba * omopi kwopwi * ikiduki woru ni * tama poko no * miti kuru pito no * tute koto ni * ware ni katar aku * pasikiyosi * kimi pa ko no koro * urasabwite * nagekapi imasu * yo no naka no * u kyeku tura kyeku * saku pana mo * toki ni uturopu * utusemi mo * tune na ku ari kyeri * taratine no * mi papa no mikoto * nani si ka mo * toki si pa ara mu wo * maswo kagami * mi redomo aka zu * tama no wo no * wosi ki sakari ni * tatu kwiri no * use yuku goto ku * oku tuyu no * ke nuru ga goto ku * tama mo nasu * nabiki koi pusi * yuku midu no * todomwi kane tu to * maga koto ya * pito no ipi turu * oyodure wo * pito no tuge turu * adusa yumi * tuma piku ywo to no * topo to ni mo * kikeba kanasi mi * nipa tadumi * nagaruru namita * todomwi kane tu mo MYS.19.4214
* kaku nomwi si * kwopwiba sinu be mi * taratine no * papa ni mo tuge tu * yama zu kaywopa se MYS.11.2570
* taratine no * papa no mikoto no * koto ni araba * tosi no wo naga ku * tanomi sugusa mu MYS.9.1774
* taratine no * papa ni mawosana * kimi mo ware mo * apu to pa na si ni * tosi so pe nu be ki MYS.11.2557
* opo kimi no * mi koto kasikwo mi * tuma wakare * kanasi ku pa aredo * masura wo no * kokoro puri okosi * tori yosopi * kadwo de wo sureba * taratine no * papa kaki nade * waka kusa no * tuma pa tori tuki * tapirake ku * ware pa ipapa mu * ma saki kute * paya kapyeri ko to * ma swode moti * namita wo nogopi * musepitutu * katarapi sureba * mura tori no * ide tati kate ni * todokopori * kapyeri mi situtu * iya topo ni * kuni wo ki panare * iya taka ni * yama wo kwoye sugwi * asi ga tiru * nanipa ni ki wite * yupu sipo ni * pune wo uke suwe * asa nagi ni * pe muke koga mu to * samorapu to * wa ga woru toki ni * paru gasumi * sima mwi ni tatite * tadu ga ne no * kanasi ku nake ba * paroparo ni * ipye wo omopi de * opi so ya no * soyo to naru made * nageki turu kamo MYS.20.4398
* saniturapu * kimi ga mi koto to * tamadusa no * tukapi mo ko neba * omopi yamu * wa ga mwi pito ri zo * tipayaburu * kamwi ni mo na opose * ura bye mase * kame mo na yaki so * kwopwi siku ni * ita ki wa ga mwi zo * itisirwo ku * mwi ni simi topori * mura kimo no * kokoro kudakete * sina mu inoti * nipaka ni nari nu * imasara ni * kimi ka wa wo ywobu * taratine no * papa no mikoto ka * momo tara zu * ya swo no timata ni * yupuke ni mo * ura ni mo so twopu * sinu be ki wa ga yuwe MYS.16.3811
* ama kumo no * muka busu kuni no * mononopu to * ipa yuru pito pa * sumyeroki no * kamwi no mi kadwo ni * two no pye ni * tati samorapi * uti no pye ni * tukape maturite * tama kadura * iya topo naga ku * oya no na mo * tugi yuku mono to * omo titi ni * tuma ni kwo domo ni * katarapite * tati ni si pi ywori * taratine no * papa no mikoto pa * ipapi be wo * mapye ni suwe okite * kata te ni pa * yupu tori moti * kata te ni pa * niki tape maturi * tapirake ku * ma saki ku mase to * ame tuti no * kamwi wo kopi nome * ika n ara mu * tosi tukwi pi ni ka * tutuzi pana * nipop yeru kimi ga * kuro tori no * nadusapi ko mu to * tatite wite * mati kye mu pito pa * opo kimi no * mikoto kasikwo mi * osi teru * nanipa no kuni ni * ara tama no * tosi puru made ni * sirwo tape no * koromo mo posa zu * asa yopi ni * ari turu kimi pa * ika sama ni * omopi mase ka * utusemi no * wosi ki ko no yo wo * tuyu simo no * okite ini kye mu * toki ni ara zu site MYS.3.443
* taratine no * papa wo wakarete * makoto ware * tabwi no kar ipo ni * yasu ku ne mu kamo MYS.20.4348
* ame tuti to * tomo ni moga mo to * omopitutu * ari kye mu monowo * pasikyeyasi * ipye wo panarete * nami no upe yu * nadusapi ki nite * ara tama no * tukwi pi mo ki pe nu * kari ga ne mo * tugite ki nakeba * taratine no * papa mo tuma ra mo * asa tuyu ni * mo no suswo piduti * yupu gwiri ni * koromo de nurete * saki ku si mo * aru ramu goto ku * ide mi tutu * matu ramu monowo * yo no naka no * pito no nageki pa * api omopa nu * kimi ni are ya mo * aki pagwi no * tirap eru nwo pye no * patu wo pana * kar ipo ni pukite * kumo banare * topo ki kuni pye no * tuyu simo no * samu ki yama bye ni * yadori s eru ramu MYS.15.3691
* taratine no * papa ni mo ipa zu * tutum yeri si * kokoro pa yosi we * kimi ga manima ni MYS.13.3285
* taratine no * papa ga sono n aru * kupa surani * negapeba kinu ni * kisu to pu mono wo MYS.7.1357
* taratine no * papa ga kapu kwo no * maywo komori * ibuse ku mo aru ka * imo ni apa zusite MYS.12.2991
* taratine no * papa ni sira ye zu * wa ga mot e ru * kokoro pa yosi we * kimi ga manima ni MYS.11.2537
* taratine no * papa ga kapu kwo no * maywo komori * kakur eru imo wo * mi mu yosi mo ga mo MYS.11.2495
* taratine no * nipi gupa maywo no * kinu pa aredo * kimi ga mi kyesi si * amata ki posi * mo MYS.14.3350b