Flamegraphs are visualisations of segmentation and structural context for a searched string.
The elements are blocks that represent all or part of the path from a segment to the root of the tree.
Here is an explanation of how the flamegraph should be read (from the bottom up):
The (blank) bottom bar represents all possible segmentations, and hovering the mouse over the bottom bar will give the total number of all tokens.
On the second bar up from the bottom,
the bar has been divided into blocks, each showing the search string in one of its various segmentation patterns.
The length of each block is proportional to the number of tokens of the pattern that block represents.
Hovering the mouse over a block gives the number of tokens and the percentage of the total for that segment in a segmentation pattern.
Higher bars show progressively finer sub-divisions depending on differences in the way each segment of a given pattern has been annotated.
The finest level of distinction that can be displayed for a given segment shows a full path from that segment to the root of the tree.
Clicking on a block with a “Full” and
“Fine” description of the path to the root sends you to a search result page with all the tokens for that segmentation pattern.
Note that flamegraph results are
sensitive to the interpretation of the segmentation you choose: “Liberal”,
“Character”,
“Mine”, or
“Strict”.
The flamegraph below displays an overview for the total use of “という”
across the corpus under the “Character” interpretation of segmentation.
Observe how:
hovering the mouse above the lowest bar (the blank “string” block) shows a count of the total number of occurrences
of the string “という” in the corpus.
moving the mouse to higher bars
gives count and percentage information
for the number of instances of
specific segments in a given segmentation pattern for the string
“という”
depending on the block selected.
clicking on
a block zooms the interface to show only the instances of “という” for the block clicked.
a basic string search with boxes Full and Fine checked will provide full paths from segment to root, with complete node labels.
clicking on a block with a full path from segment to root and with complete node labels will send you to a search results page for the segment in question.
Basic string search
Basic string search allows you to search for sequences of characters in the text of the corpus according to a selection of conditions.
The search results provide three different summaries about the ways in which
the characters in the specified string are distributed within the structures in the corpus.
This is particularly helpful for finding out about
the word segmentation
and word use.
Results are returned as:
a flamegraph to display the total use of the search string
overview counts that are links to KWIC-formatted (keyword-in-context) results
KWIC (keyword-in-context) results,
with the central focus of each result paired with paths indicating the immediate structural contexts of the string (or its sub-strings).
The paths are links for opening a view of the corresponding tree.
(When viewing the KWIC results,
scroll down the screen to see all results.)
The interface allows for four interpretations of what is entered as a search string,
with radio buttons to select between
“Liberal”,
“Character”,
“Mine”,
and
“Strict”.
The default
“Character”
triggers a
search
with no constraints on the segmentation of matched results
other than that the left-most character
of the search string should correspond to the start of a word
while the right most character
should be the end of a word.
As a consequence the string “123” becomes a search for the
segmentations of
[1][2][3],
[12][3],
[1][23],
and
[123].
With “Liberal”
there is the same interpretation as when “Character” is selected,
but without the requirements
that the first character should be preceeded by a word boundary
and the last character should be followed by a word boundary.
As a consequence the string “123” becomes a search for the
segmentations of
[(...)1][2][3(...)],
[(...)12][3(...)],
[(...)1][23(...)],
and
[(...)123(...)].
Choice of “Strict”
enforces the exact segmentation that the user provides.
Adding a character space between characters indicates a word boundary.
Thus the search string “123” can only find [123],
while “12 3”,
with a character space between “12” and
“3”,
can only find [12][3].
For a string search consisting of a single character,
“Character” and
“Strict”
yield the same results.
Choice of
“Mine” gives a version of
“Strict”
but without the requirement
that left and right edges of the search string should invoke word boundaries.
Thus the search string “123” yields [(...)123(...)],
while “12 3”,
with a character space between “12” and
“3”,
yields [(...)12][3(...)].
The interface has a check box
“Fine”
that once selected provides more detailed information from the annotation.
Specifically, the extensions for all node labels are included in the paths for segments yielded from the search.
There is also a check box
“Full”
that once selected provides the full path from the segment to the root of the tree.
If “Full” is selected,
then results are presented with a flamegraph only.
If both “Fine” and
“Full” are selected, the resulting flamegraph
has outer edges that serve as links to click and so trigger a search
to find the trees corresponding to the results.
Submitting a search string brings up search results.
Immediately following each entry is a link to the tree for that entry in the form of the ID number for that entry.
Following the link opens a tree view for the result.
Beneath the search results,
there is a button that allows you to download all the results of the search in a comma-separated values (.csv) format.
By clicking on the “Greedy string search” button,
you can perform a search that allows characters (as well as word boundaries) to intervene between parts of the search string.
By clicking on the “Tree fragments” button,
you can produce tree fragments from a string search,
with counts opening to annotation links.