The Nordic Dialect Corpus

User Manual
The Nordic Dialect Corpus in the Glossa Interface

By Eirik Olsen

Overview and Contents

This user manual is partially based on Lars Nygaard's manual Glossa - The Corpus Explorer (HTML|PDF). More about Glossa is also found on its Text Laboratory site. See also The Search Interface Documentation and The Example Searches Page.

The interface works best with Firefox or Chrome.

1 The Linguistic Search Field
    1.1 The Criteria Menu
        1.1.1 The Non-Linguistic Criteria
        1.1.2 The Linguistic Criteria
    1.2 Searching for (Multiple) Phrases

2 The General Search Options Field
    2.1 Regular Expressions
    2.2 Result Options
        2.2.1 Hits per Page, Max Results, Randomize and Skip Total Frequency
        2.2.2 Orthographic and Phonetic Transcriptions
        2.2.3 Changing Audio/Video Player
    2.3 Other Functions

3 The Metadata Specification Field
    3.1 Informant Code Naming Conventions
    3.2 The Informant Metadata Categories
    3.3 The Recording Metadata Categories
    3.4 The Show Informants Function
    3.5 The Subcorpus Function

4 Other Search Interface Functions
    4.1 Managing Saved Result Sets
    4.2 Managing Annotation Sets
        4.2.1 Creating and Editing Annotation Sets
        4.2.2 Retrieving Annotation Statistics for Saved Result Sets
    4.3 Accessing Complete Transcriptions in HTML-format
    4.4 Bug and Error Report

5 The Results Page
    5.1 The Media Player
    5.2 The Map Function
    5.3 The Action Menu
        5.3.1 Count
        5.3.2 Download
        5.3.3 Sort
        5.3.4 Collocations
        5.3.5 Annotate
        5.3.6 Delete Hits
        5.3.7 Save Hits

1 The Linguistic Search Field

The linguistic search field is shown in figure 1 below. Its most central part is the word box, where you input your query (e.g. a word). This input should normally be orthographic, but if the settings are right it may also contain so called regular expressions or be semi-phonetic (more about this in 1.1.1 and 2.1). To facilitate the insertion of special characters, you'll find the characters æ, ø, å, ä, ö, ð, þ in the æøå... menu above the word box. You may also search for classes of words defined by linguistic criteria. The criteria menu is explained in detail in section 1.1 below. Several word boxes may be combined in order to search for phrases, and also phrases may be combined in the same search. How to search for (multiple) phrases is explained in section 1.2.

Figure 1 - The linguistic search field

1.1 The Criteria Menu

The criteria menu is shown in Figure 1.1a below. You may use criteria to define classes of words based on various linguistic and non-linguistic traits. In the following we will look closer at some of these criteria, explaining their function and use. Some examples of criteria used in actual searches are found on the example searches page.
In order to select a criterion, simply find it in the drop down menu and click it. The criterion will show up in a white field below the criteria menu (cf. Figure 1.1b). Selected criteria may be removed again by double clicking them. More than one criterion will be connected by disjunction, meaning that you'll get hits on a query matching either criterion (or both). Most criteria can also be excluded from a query, in which case they will appear with a prefixed exclamation mark when you select them. Excluded criteria are also connected to each other by disjunction, but are connected to other criteria by conjunction. This means that words matching excluded criteria will not be included among the hits even if they also match non-excluded criteria.

Figure 1.1a - The Criteria Menu

Figure 1.1b - Selecting a Criterion

The selection of criteria is based on automatic tagging of the transcriptions of the different languages. There are some shortcomings of automatic tagging, which means that some words unfortunately are incorrectly tagged. This may cause two related problems that it's important to keep in mind when searching using criteria. First of all, your query may match words that are incorrectly tagged as the kind of words you're looking for. In other words: All words that match your query might not be relevant. Second, words of potential interest may be incorrectly tagged so that they don't match a query they should have matched if they were correctly tagged. In other words: You might miss out on relevant words. Because of this, it is always wise to run a couple of test searches, to see how accurate the hits are, and whether or not your search actually retrieves the words you are looking for. Searching for concrete orthographic sequences will on the other hand always retrieve the expected results, and it's therefore often worthwhile spending some time formulating concrete queries instead of relying fully on criteria.
There are different solutions for tagging the transcriptions of the different languages, cf. the tagging page on the ScanDiaSyn site. This means that while some tags are common for all languages, there are others that are language specific. In the following two sections, we'll therefore present non-linguistic and linguistic criteria separately. Most of the non-linguistic criteria are general and language independent, and these can be presented in a relatively straight-forward manner. The linguistic criteria, on the other hand, are to a much greater extent based on language specific tagging. It is out of scope for this manual to present all the criteria and their uses in all languages. The criteria will therefore only be given a quick presentation below, and you are encouraged to experiment yourself to find out precisely how they are used for the different languages.

Chapter 1 - Contents - Home

1.1.1 The Non-Linguistic Criteria

The Word Criteria
Lemma: The Norwegian and Danish transcriptions are lemmatized. This means that it's possible to enter the dictionary form of any Norwegian or Danish word, select the lemma criterion, and have all word forms of the lemma included among the hits. Unfortunately, this is presently not possible for any of the other languages represented in the corpus.
Start of/within/end of word: These three criteria are similar in function. They can be combined with an input in the word box to search for words that begin with, contain or end with a certain orthographic sequence. The input may also be phonetic, if combined with the phonetic criterion (see below).
Case sensitive: When this criterion is selected, the search is case sensitive. Otherwise, it is not.
Exclude word, add (negated) word form and add (negated) lemma: These five criteria are similar in function in that they all offer different ways of including or excluding specific word forms or lemmas from the search. The exclude word criterion simply excludes what you have input in the word box, while the other criteria let you input multiple word forms or lemmas and combine these with any other input you may wish to enter in the word box.
Segment Initial/Final: These criteria give you the opportunity to search for words that are in the very beginning or end of a segment. Since this is a speech corpus, segments do not necessarily correspond to complete sentences. Segmentation is rather an attempt to achieve natural units based on a compromise between turn taking, intonation and syntax (cf. also section 5).

The Occurrences Criteria
The different occurrences criteria (one or more, zero or one and zero or more) can be used to specify how many times what you've entered in the linguistic search field can occur in a single hit. In section 1.2 you are presented with the possibility to search for (multiple) phrases. One of the possible applications of the zero or one criterion might e.g. be to mark single word boxes in a search phrase as optional. Doing so would essentialy yield the same result as querying for two separate phrases, but is arguably a more streamlined way of performing the search.

The Nlex (Non-Lexical) Criteria
These criteria can be used to search for non-lexical sounds/utterances in the Norwegian transcriptions (the sibilant criterion is also used in Faroese).

The Descr (Descriptive) Criteria
X: In the Norwegian, Icelandic and Faroese transcriptions, lexical words that aren't found in the orthographic standard are marked with an x-tag. Using this criterion, you will narrow your search to include only such non-standard lexical words.
O: In the Norwegian and Övdalian transcriptions, grammatical words that aren't found in the orthographic standard are "translated" to their standard equivalent, and marked with an o-tag. Using this criterion, you will narrow your search to include only such translated grammatical words.

The Phonetic Criterion
By default, any text entered in the word box will be searched for in the orthographic transcriptions of the recordings in the corpus. The Norwegian and Övdalian transcriptions have alternative transcriptions in addition to the orthographic ones (for details on the transcriptions, cf. section 2.2.2 below). These transcriptions may be queried using the phonetic criterion. This may for instance be helpful if you look for specific phonetic alternants in Norwegian (cf. the example searches page).

Chapter 1 - Contents - Home

1.1.2 The Linguistic Criteria

The linguistic criteria are either simple or compound criteria. Compound criteria consist of two or more simple criteria joined by a colon. A word will normally have been assigned a compound tag because the automatic tagger hasn't been able to disambiguate it with regard to two or more possible tags within a single category (eg. grammatical gender, part of speech etc.). Words with compound tags won't be included among the hits if you only search for the simple criteria the compound criterion consists of. For instance, searching for the criterion adv won't include words found with the adv:subjunc criterion. This is worth having in mind when searching for Norwegian parts of speech.

The Num (Number) Criteria
These criteria may be used to search for words tagged for grammatical number. The criteria sg (singular) and pl (plural) are used in all five languages, and sg:pl in all but Icelandic.

The Case Criteria
These criteria may be used to search for words tagged for grammatical case. The language distribution of the simple gender criteria is shown in table 1.1.2a below.

Criterion		Norwegian	Swedish	Danish	Faroese	Icelandic
nom	(nominative)	yes	yes	yes	yes	yes
acc	(accusative)	yes	yes	yes	yes	yes
gen	(genitive)	yes	no	no	yes	yes
dat	(dative)	no	no	no	yes	yes

Table 1.1.2a - The Case Criteria

A note on the use of the dative criterion: There are some dialects of Norwegian that have preserved the dative, but as is evident from table 1.1.2a, you will not find such dative forms using the dative criterion. Since dative forms aren't part of Norwegian standard orthography, dative tags aren't used in the automatic tagging of the orthographic transcription. Such forms will rather need to be found searching for phonetic realizations (cf. the example searches page) or typical dative contexts.

The Degr (Degree) Criteria
These criteria may be used to search for words tagged for grammatical degree of comparison. The criteria comp (comparative) and sup (superlative) are used in all five languages, and pos (positive) in all but Danish and Faroese.

The Voice Criteria
The only language where verbs are tagged for more than one voice is Icelandic. There, active forms and middle forms can be distinguished using the act and med criteria respectively. Swedish and Danish don't make use of the med criterion, but still make (most?) active verbs searchable using the act criterion. The language distribution of the voice criteria act and med is shown in table 1.1.2b below.

Criterion		Norwegian	Swedish	Danish	Faroese	Icelandic
act	(active)	no	yes	yes	no	yes
med	(middle)	no	no	no	no	yes

Table 1.1.2b - The Voice Criteria

The Pers (Person) Criteria
Norwegian pronouns are tagged for grammatical person. This means that you may search for 1st, 2nd or 3rd person pronouns in Norwegian using these criteria, without entering a specific word form into the word box in the linguistic search field.

The Temp (Tense) Criteria
These criteria may be used to search for words tagged for grammatical tense. The languages have, however, been tagged in different ways, and when it comes to past participles, there is some overlap with the tags that the mood criteria are based on. For instance, verbs in the past tense can be found using the pret (preterite) criterion in all languages but Icelandic, where the temp criterion past is used. Likewise, the perf-part (perfect participle) criterion is used in all languages but Icelandic. To find Icelandic past participles, you have to use one of the mood criteria past-part (past participle) or supine (this said, Swedish and Faroese also make use of the supine criterion, cf. below). The language distribution of the simple temp criteria is shown in table 1.1.2c below. The compound criteria pres:inf, pret-perf-part, and inf:imp are only found in Norwegian.

Criterion		Norwegian	Swedish	Danish	Faroese	Icelandic
pres	(present)	yes	yes	yes	yes	yes
pret	(preterite)	yes	yes	yes	yes	no
past		no	no	no	no	yes
perf-part	(perfect participle)	yes	yes	yes	yes	no

Table 1.1.2c - The Temp Criteria

The Defn (Definiteness) Criteria
These criteria may be used to search for words tagged for grammatical definiteness. All five languages make use of the defn (definite) and indefn (indefinite) criteria but different parts of speech are tagged for definiteness in the different languages. In Icelandic, for instance, these tags are only used with pronouns. The indefn:def criterion is mainly used in Swedish.

The Mood Criteria
These criteria may be used to search for words tagged for grammatical mood. There is some differences between how the criteria are used in the different languages, but common to all languages are the criteria inf (infinitive) and imp (imperative). The language distribution of the mood criteria is shown in table 1.1.2d below.

Criterion		Norwegian	Swedish	Danish	Faroese	Icelandic
inf	(infinitive)	yes	yes	yes	yes	yes
imp	(imperative)	yes	yes	yes	yes	yes
ind	(indicative)	no	no	no	yes	yes
subjunctive		no	yes	no	no	yes
pres-part	(present participle)	no	yes	yes	yes	yes
past-part	(past participle)	no	no	no	no	yes
supine		no	yes	no	yes	yes

Table 1.1.2d - The Mood Criteria

The Pos (Part of Speech) Criteria
These criteria may be used to search for specific parts of speech. The following list includes the parts of speech common to all five languages. Some of them are noted as not being used in Icelandic, and while this is true for the specific part of speech criteria listed below, some of these parts of speech may still be found in Icelandic if you query for certain type criteria. Icelandic adverbs may for instance be searched for using the type criterion no-case (but not the pos criterion adv!) and Icelandic prepositions using the type criteria gov-acc or gov-dat criteria (but not the pos criterion adv!).

adj (adjective)
adv (adverb, not used in Icelandic)
conj (conjunction)
det (determiner)
inf-marker (infinitive marker, not used in Icelandic)
interj (interjection)
noun
prep (preposition, not used in Icelandic)
pron (pronoun)
subjunc (subjunction, not used in Icelandic)
verb

In addition to the pos criteria described above, Norwegian makes use of several compound part of speech criteria. These are listed below. In addition, there is a part of speech included in this list used in Norwegian, but not found in any of the other languages: the sånn-word (which includes the word "sånn" only).

adv:subjunc
conj:prep:adv
conj:subjunc:adv
noun:adj
prep:subjunc
pron:det
verb:noun
sånn-word

In addition to the criteria mentioned in the list, the pause and unknown criteria are used by several languages for pauses and words that are interrupted, respectively. The language distribution of these criteria is shown in table 1.1.2e below.

Criterion	Norwegian	Swedish	Danish	Faroese	Icelandic
pause	yes	no	yes	yes	yes
unknown	yes	no	no	yes	yes

Table 1.1.2e - The Pause and Unknown Criteria

The Type Criteria
Most of the type criteria may be seen as complementary to the pos criteria. Some of the type criteria allow you to divide words with or without specific pos criteria into finer categories based on different variables, while some are quite simply used instead of pos criteria. In Icelandic for instance, different declensions of adjectives (words with the pos-tag adj) may be found using the type criteria strong, uninflected or weak. Adverbs may be searched for using the type criterion no-case (but not the pos criterion adv!) and prepositions using the type criteria gov-acc or gov-dat criteria (but not the pos criterion adv!). The language distribution of the different type criteria is shown in table 1.1.2f below (only short criterion names are given).

Criterion	Norwegian	Swedish	Danish	Faroese	Icelandic
abbrev	yes	no	no	no	no
cm-noun	yes	no	no	no	no
dem	yes	no	no	yes	yes
gov-acc	no	no	no	no	yes
gov-dat	no	no	no	no	yes
hesit	yes	no	no	yes	no
intensifier	yes	no	no	no	no
no-case	no	no	no	no	yes
pers	yes	no	yes	yes	yes
poss	yes	yes	yes	yes	yes
prop	yes	yes	yes	yes	yes
q	yes	no	no	yes	yes
quant	yes	yes	yes	yes	no
refl	yes	no	no	yes	yes
rel	no	no	no	no	yes
strong	no	no	no	no	yes
uninflected	no	no	no	no	yes
weak	no	no	no	no	yes

Table 1.1.2f - The Type Criteria

The Gender Criteria
These criteria may be used to search for words tagged for grammatical gender. The language distribution of the simple gender criteria is shown in table 1.1.2g below. In addition, we find the following compound criteria in Norwegian: m:f, masc:fem, masc:fem:neut and masc:neut (the last one is also used to a certain extent i Swedish).

Criterion		Norwegian	Swedish	Danish	Faroese	Icelandic
masc	(masculine)	yes	yes	yes	yes	yes
fem	(feminine)	yes	no	no	yes	yes
neut	(neuter)	yes	yes	yes	yes	yes
unspec	(unspecified)	no	no	no	no	yes

Table 1.1.2g - The Gender Criteria

Chapter 1 - Contents - Home

1.2 Searching for (Multiple) Phrases

Adding More Words to a Phrase
More word boxes may be added or removed by clicking the buttons with plus or minus signs (cf. Figure 1.2a). This allows you to perform phrase searches.

Figure 1.2a - Adding an extra word box

It is possible to specify the number of words that may intervene between each query word using the interval function. The minimum and maximum interval boxes are left empty by default, restricting unspecified words from intervening. If the minimum interval is specified, but not the maximum, unlimited maximum interval is assumed. Conversely, if the maximum interval is specified, but not the minimum, a minimum interval of zero is assumed. Setting both the minimum and maximum interval values to 1 is equal to having an extra, empty word box between two specified word boxes.

Adding More Phrases
Using the add phrase button gives you an extra query row (cf. figure 1.2b below). You can use this function to join two or more different queries in the same result set. The queries will be connected by disjunction, meaning that you'll get hits matching either queries. The delete phrase button removes the phrase that was added last.

Figure 1.2b - Adding an extra phrase

Chapter 1 - Contents - Home

2 The General Search Options Field

This field gives you options that modify the entire search (cf. figure 2). How to use so called regular expressions to speed up the input of queries is explained in section 2.1, while general options relating to the presentation of the results are presented in section 2.2. There we'll look at how we can randomize results, limit the number of results presented (both total and per page) and select whether the hit list should include orthographic and/or phonetic transcriptions. Remaining functions are presented in 2.3.

Figure 2 - The General Search Options Field

2.1 Regular Expressions

Queries can sometimes be created faster or be made more flexible by typing regular expressions into the word box than by defining a class of words using the criteria menu (e.g. typing "hus.*" instead of typing "hus" and then selecting the start of word criterion). Checking the regular expressions box (cf. figure 2.1) allows you to input these expressions in the word box in the linguistic search field. If this box isn't checked, regular expressions will be interpreted as plain text and part of the query.

Figure 2.1 - The Regular Expressions Box

List of Alternative Characters
A list of alternative characters can be embedded within square brackets ("[...]"). For example, the query "studer[ea]r" will match either "studerer" or "studerar".

Unspecified Character
Period (".") represents any character. The query "studer[ea]." will match "studerer" or "studerar" like above, but also "studerat".

Optionality
Question mark ("?") indicates that the previous character is optional. If more than one character are to be indicated as optional, these may be embedded in parentheses. A list of alternative characters (see above) may also be indicated as optional using question mark. The query "studer[ea].?" will thus match "studerer", "studerar" and "studerat" like in the example above, but also "studere" and "studera". The query "studer(er)?" will match either "studerer" or "studer", while the query "studer[ea]?" will match either "studer", "studere" or "studera".

Iteration
Asterisk ("*") indicates that the previous character can occur any number of times or needn't occur at all. The plus ("+") works in a similar way, but it requires the preceding character to occur at least once. Like the case was with question mark above, a string of more than one character may be embedded within parentheses if the whole string is to be indicated as iterative. The query "studer.*" will match "studer" in addition to all words beginning with "studer-", while the query "studer.+" will only match words beginning with "studer-". The query "stud(er)+" will match "studer" and "studerer".
It's also possible to limit the maximum (and minimum) number of iterations using curly brackets ("{...}"). The query "studer.{1,2}" thus matches all words that start with "studer-" and end with minimum 1 and maximum 2 unspecified characters.

Overriding Regular Expressions
If regular expressions are prefixed by a backslash ("\") they are interpreted literally when the regular expressions box is checked. The query "\?" will thus match all occurrences of question mark in the corpus.

Chapter 2 - Contents - Home

2.2 Result Options

The result options (cf. figure 2.2) let you adjust how the results are presented to you on the results page (cf. section 5). You are given the opportunity to change the number and order of results retrieved (cf. 2.2.1) and to change what kind of transcription that is shown on the results page (cf. 2.2.2). On the bottom right hand side of the interface, you also find two radio buttons controlling what type of audio/video player you are presented with on the results page. Even though this is not part of the general search options field, it's clearly a result option, and is therefore dealt with in this section (cf. 2.2.3).

Figure 2.2 - Result Options

Chapter 2 - Contents - Home

2.2.1 Hits per Page, Max Results, Randomize and Skip Total Frequency

The hits per page box quite simply lets you adjust the number of hits shown per result page. By default, this number is set to 20. The lower this number is, the faster each results page will load.

The max results box lets you limit the total number of hits available to you in total on the results page. By default, this number is set to 2000. When searching for frequent words or phrases, this number will be to low to include all matches. In such cases, the max results number can be adjusted up, but the more matches there are in the corpus, the slower the results page will load.

The randomize check box is useful in cases where the max results number is set to a number lower than the actual number of matches your query has in the corpus. If left unchecked (which is the default), the hits on the results page will be presented in alphabetical order, based on the place name (cf. section 3.2). For words and phrases that are very frequent then, you will get many hits distributed over the the first few places. Checking the randomize box will retrieve fewer hits from each place, and distribute the hits more evenly across all recording locations.
When the max results number is set lower than the total number of hits a query matches, you can choose whether or not to be presented with the total number of hits potentially available to you in the corpus using the skip total frequency check box. By default this is checked, hiding the total number of matches. This is to speed up searches that might otherwise be slowed down when all potential hits need to be counted.

In figure 2.2.1a below, a results page is shown where the max number, randomize and skip total frequency options are modified. A search has been performed for words tagged as verbs (cf. section 1.1.2). The max results number is set to 5, which means that the five hits in figure 2.2.1a are all you are presented with. The skip total frequency box is on the other hand unchecked, so that we are presented with the total number of words tagged as verbs in the corpus (440441). The randomize box is checked, and this gives us hits from a random selection of recording locations. If we perform the same search, but with the randomize box left unchecked (which is the default), we will only get hits from the first recording location in an alphabetical list (Ål (aal) in Norway), cf. figure 2.2.1b.

Figure 2.2.1a - Results page where the max number, randomize and skip total frequency options are modified

Figure 2.2.1b - Results page where max number and skip total frequency options are modified, but where the randomize box is left unchecked

Chapter 2 - Contents - Home

2.2.2 Orthographic and Phonetic Transcriptions

All recordings are orthographically transcribed following the standard of each language. In addition, all Norwegian recordings are transcribed semi-phonetically, and the Övdalian recordings are transcribed according to Övdalian orthography. For more details about the transcriptions themselves, cf. the transcription page on the ScanDiaSyn site. Using the radio buttons named orthographic, phonetic and both, you can choose which transcriptions to show on the results page. The phonetic option includes both the semi-phonetic Norwegian transcriptions and the Swedish transcriptions with Övdalian orthography. If the phonetic or both option is chosen, and the results include other transcriptions than the previously mentioned ones, the secondary transcription line will be empty, consisting only of dashes. This is illustrated in figure 2.2.2, where a query for verbs (like the one in the last section) returns five random hits from Norwegian, Övdalian and Danish transcriptions.

Figure 2.2.2 - Results page with both orthographic and phonetic transcriptions

Chapter 2 - Contents - Home

2.2.3 Changing Audio/Video Player

The audio/video player selected by default is a Flash player. If there for some reason seems to be a problem using this, you can select the QT radio button to use a Quick Time plug in instead. The functions of the audio/video players are presented in section 5.1.

Chapter 2 - Contents - Home

2.3 Other Functions

The remaining functions of the general search options field are for searching the corpus and resetting the form (cf. figure 2.3). As explained earlier, the search corpus button opens a new window with the search results matching your query. The reset form button on the other hand is used to clean all user input and reset all fields to default.

Figure 2.3 - The Search Corpus and Reset Form Buttons

Chapter 2 - Contents - Home

3 The Metadata Specification Field

The Metadata Specification Field is shown in figure 3a below. This field gives you the opportunity to narrow your search based on sociogeographic variables. The conventions used when giving informants their code is briefly mentioned in section 3.1. The informant metadata categories (geographic location, age and sex) are then presented in section 3.2, while the recording metadata categories (recording year and genre) are presented in section section 3.3. Finally, the show informants and subcorpus functions are presented in section 3.4 and 3.5, respectively. In figure 3b, the different metadata category menus have been expanded by pressing the + to their right. Variables are selected by double clicking, which will move them from the left to the right column. The selected variables are then included in the search by default, but may also be excluded by selecting exclude from the drop down menu below the right hand columns. More than one selected variable will be connected by conjunction, meaning that you'll only get hits on a query matching all selected variables. This means that when you select more than one variable, you must keep in mind that there might not be informants or recordings that match them all, in which case your query will not retrieve any hits. Read more about how to avoid this in section 3.2 and 3.4 below.

Figure 3a - The Metadata Specification Field

Figure 3b - The Metadata Specification Field with Expanded Variables

3.1 Informant Code Naming Conventions

The Nordic Dialect Corpus is a collection of recordings done by different people and at different times, and so the codes given to informants have also been based on different conventions. Common to all though, is that the place name is the initial and most central part of the informant code. All informant codes also contain a number in addition to the place name, and some combine the number with one to three letters. In the case of Norwegian and Swedish informants, these letters may contain info on the sex and age group of the informant (cf. 3.2). Old Norwegian recordings form the Oslo Old Dialect Archive (cf. 3.3) may also be singled out by letters used in the code. The different letter combinations containing such metadata info are summarized in table 3.1 below. There are a few other letter combinations also used in informant codes that do not contain metadata info. These are not included in the table.

Letters	Metadata Info
um/ym	young man
uk/yw	young woman
gm/om	old man
gk/ow	old woman
chi	child
ma	Oslo Old Dialect Archive

Table 3.1 - Informant Code Letters Containing Metadata Information

Chapter 3 - Contents - Home

3.2 The Informant Metadata Categories

All informants have various metadata associated with them, and any search may be narrowed down using three sorts of informant metadata categories: geographic location, age and sex. The variables found for each of these categories are explained in the following.

Geographic Location
The category geographic location is divided into four subcategories: country, region, area and place. These subcategories are used differently for the different languages. In table 3.2 below, the use of the region, area and place subcategories in the different languages are summarized.

Subcategory	Norwegian	Swedish	Danish	Faroese	Icelandic
region	region (landsdel)	region (landsdel)	not used	region (sýsla) or island	region
area	county (fylke)	province (landskap)	not used	island or part of island	region
place	municipality (kommune) or urban area	municipality (kommun) or urban area	island - but also some Jutish urban areas and regions + Copenhagen	municipality (kommuna) or urban area	municipality or urban area

Table 3.2 - Geographic Variables

The four subcategories country, region, area and place represent four different levels of precision when it comes to specifying geographic location. This is important to keep in mind when selecting variables from different subcategories, since multiple variables are joined by conjunction. This means that the more precise subcategories selected need to be subsets of the more general subcategories selected in order for the query to retrieve any hits. To see if selected variables match any informants, you can use the show informants function, cf. section 3.4 below. A map of recording locations is shown in figure 3.2 below (and may also be found here).

Figure 3.2 - Map of Recording Locations

Age and Sex
The age category is divided into two variables, age group A and age group B. Age group A consists of young people, mainly between 15 and 30 years of age, while age group B consists of adults and elderly people over 50 years of age. You can read more about the distinction on the NorDiaSyn homepage. The sex category naturally has the two variables female (F) and male (M).

Chapter 3 - Contents - Home

3.3 The Recording Metadata Categories

The recordings themselves also have associated metadata categories: recording year and genre. These are explained below.

Recording Year
The recording years span from 1951 to 2012. The recordings from 1951 to 1984 are old Norwegian recordings from Målførearkivet (Oslo Old Dialect Archive), which are meant to complement the modern recordings and give the possibility for longitudinal studies. The modern ScanDiaSyn recordings are from 1998 to 2012. Exact year spans for each language are given in the list below:

Norwegian: 1951-1984, 2006-2012
Swedish: 1998-2000, 2007
Danish: 2008
Faroese: 2008
Icelandic: 2008, 2013

Genre
This category has two main variables: interview and conversation. The latter has been divided into several subgenres, depending on the relationship between the informants. You can read more about the distinction between interview and conversation on the NorDiaSyn homepage. The Oslo Old Dialect Archive recordings (cf. section 3.3) have been classed as a separate genre. The following is a complete list of genres used in the corpus:

Interview
Conversation
Conversation - strangers
Conversation - acquaintances
Conversation - friends
Conversation - family
Oslo Old Dialect Archive

Chapter 3 - Contents - Home

3.4 The Show Informants Function

When you have made a selection of metadata variables, you may use the show informants function to show a table presenting all informants that match your selection. To do this, press the show informants button to the far right in the metadata specification field (cf. figure 3.4a below). The list will open in a new window (cf. figure 3.4b below). Above the table you are given a total word count for all matching informants, as well as a summary giving you the total number of matching informants, places and countries. Each individual informant is then presented in the table along with various informant metadata given in the different columns. You are also presented with an individual word count, which specifies the total number of words included in the corpus for each informant. You may sort the contents of each table column alphabetically by pressing the column header. If your selection of variables doesn't match any informants, you'll get the message "Sorry, no data available on informant.". In this case, you'll need to make a wider selection, or check for contradicting variables (cf. what's written under the geographic location heading in section 3.2 above).

Figure 3.4a - The Show Informants Button

Figure 3.4b - Showing Informants from Akershus, Norway

Chapter 3 - Contents - Home

3.5 The Subcorpus Function

The subcorpus function may be used to save a selection of metadata variables for later use. This allows you to perform several independent searches with matching metadata variables without having to reselect all variables every time. A specific selection of metadata variables is saved as a subcorpus by pressing the save subcorpus button to the far left in the metadata specification field (cf. figure 3.5 below). This will open a new window with a listing of selected variables, and a word box where you must name your subcorpus. Confirming the name and saving the subcorpus is then done by pressing the send button to the right of the word box. When you at a later time want to retrieve a saved subcorpus, press the choose subcorpus button to the far left in the metadata specification field (cf. figure 3.5). This will open a list of saved subcorpora, and the desired subcorpus is selected by clicking on its name. You will then be directed back to the search interface where the metadata variables constituting the subcorpus will have been selected automatically. It is then possible to further modify the metadata variable selection freely, but any changes that you want to retrieve at a later time must be saved as a new subcorpus.

Figure 3.5 - The Subcorpus Buttons

Chapter 3 - Contents - Home

4 Other Search Interface Functions

There are a few functions available through the search interface in addition to the ones already discussed. How to manage saved result sets is described in section 4.1, while you'll find out how to create and edit annotation sets in section 4.2. You may also access complete transcriptions in HTML-format, this is described in section 4.3.

4.1 Managing Saved Result Sets

As we shall see in section 5.3.7, Glossa gives you the opportunity to save entire result sets online for later retrieval. You may manage these sets from the search interface by clicking the my results button in the top menu, cf. figure 4.1a. This will take you to the manage saved result sets page, shown in figure 4.1b, where you get the options to retrieve, delete, rename and copy result sets. The result sets are personal, and are therefore unique to every user. In other words, the result sets shown in the table in figure 4.1b will not be the same as the result sets available to you.

Figure 4.1a - Accessing Manage Saved Result Sets Page

Figure 4.1b - Manage Saved Result Sets Page

Retrieve a set simply by clicking on its name. To delete a set, press the asterisk symbol (*) in the delete column on the same row. Renaming a set or creating a copy of a set with another name is done by entering the new name in the rename or the copy column on the same row, respectively, and pressing the asterisk symbol.

Using the join results function you may merge two or more result sets into a new result set. This is done by selecting the result sets you wish to merge (checking the sets in the add column), entering a name for the new result set in the join selected with new name word box at the bottom of the page, and pressing the join results button. The new result set will then appear in the result set list.

Chapter 4 - Contents - Home

4.2 Managing Annotation Sets

Glossa gives you the opportunity to annotate hits both freely and using predefined sets of values, so called annotation sets. In section 5.3.5 we will see how to annotate, but for now, we will see how to create and edit annotation sets, and how to retrieve annotation statistics on saved result sets. You may manage your annotation sets from the search interface by clicking the my annotations button in the top menu, cf. figure 4.2a below. This will take you to the manage annotation sets page shown in figure 4.2b.

Figure 4.2a - Accessing Manage Annotations Page

Figure 4.2b - Manage Annotation Sets Page

Chapter 4 - Contents - Home

4.2.1 Creating and Editing Annotation Sets

Creating a new annotation set is fairly straight forward: You simply enter a desired name for the annotation set into the name box on top of the page, and press the create set button. The set will appear in the drop down menu below, to the left of the edit set button. The next step is to define values for the newly created annotation set. Simply find the set you just created in the drop down menu, and press the edit set button. You will then be presented with the manage annotation set values page, shown in figure 4.2.1 below. For each predefined value that you want your set to contain, enter its name into the name box on top of the page, and press the add value button. The values will show up in the drop down menu under the heading "manage existing values" below. To make the annotation process itself as effortless as possible, it's also possible to set a default value. This value will be preselected for all hits, so it might be a good idea setting the value you expect to be the most frequent as default (more about the annotation process in section 5.3.5). To set a value as default, select it from the drop down menu, and press the set this as default value button.

Because of the way the annotation functionality is designed, all existing annotation sets are available to all users. Because of this it's important to name sets so that they are easily recognizable as your own, both for yourself and for others. Using your own name or initials might be a good idea. In addition to this, it's good etiquette to leave other users' annotation sets alone. Furthermore, it's unfortunately not possible to delete values from annotation sets at present.

Figure 4.2.1 - Manage Annotation Set Values Page

Chapter 4 - Contents - Home

4.2.2 Retrieving Annotation Statistics for Saved Result Sets

If you have already annotated some hits, and these are included in one or more of your saved result sets, you may retrieve annotation statistics for them directly through the Glossa interface (more on annotation in section 5.3.5, and how to save result sets in section 5.3.7). Just select the annotation set you wish statistics for from the drop down menu (cf. figure 4.2b above), select the result set(s) that contain the annotated hits you want to include and then retrieve statistics by clicking the get statistics button. To illustrate this function we'll retrieve statistics for a saved result set with 50 random occurrences of the present tense form "er" of the verb "være (to be)" in the Fredrikstad dialect of Norwegian (not included in the saved result sets list in figure 4.2b above). The 50 hits are annotated using the stress annotation set (how this is done is explained in section 5.3.5). This is a simple annotation set containing only two values: stressed and unstressed. The statistic summary is shown in figure 4.2.2a below.

As mentioned in section 4.1 which dealt with the manage saved result sets page, saved result sets are user specific. A different list containing your own result sets will therefore be visible to you on the manage annotation sets page shown in figure 4.2b.

Using the value transformations box you may rename and/or merge values in annotation sets when you are retrieving statistics. As shown in the example on the page, renaming is done after the following pattern (without the quotation marks): "old_value_name=new_value_name". If you want to merge two or more values into one new value, you simply give them the same new value name (and separate them using a comma and no space): "old_value_name_1=new_value_name_1,old_value_name_2=new_value_name_1". To illustrate the renaming of values, let's say you want to translate the unstressed and stressed values into Norwegian. You will then need to input the following in the value transformations box: "unstressed=trykksvak,stressed=trykksterk". The statistics summary this gives is shown in figure 4.2.2b below.

Figure 4.2.2 - Statistic Summaries

Chapter 4 - Contents - Home

4.3 Accessing Complete Transcriptions in HTML-format

Complete transcriptions for all recordings are available for download via the link marked out in red in figure 4.3 below. For the Norwegian and Övdalian recordings only the phonetic transcriptions are available here (i.e. Norwegian transcriptions that are semi-phonetic and Övdalian transcriptions that are transcribed using the local orthography).

Figure 4.3a - Accessing Complete Transcriptions

Chapter 4 - Contents - Home

4.4 Bug and Error Report

On the bottom of the search interface page, you'll find a link to a feedback form. We'd be greatful if you use this to report any bugs or errors that you might find in the corpus, or any other feedback you might have.

Chapter 4 - Contents - Home

5 The Results Page

The search results always open in a new window like the one presented in figure 2.2.1a, here reproduced as figure 5 below. After giving a quick overview of the page, the following sections will deal with various functions available through the results page in more detail.
At the very top of the results page, you'll find the number of informants included in the search, which in turn is followed by the CWB search expression. Below the action drop down menu (cf. 5.3) and the map button (cf. 5.2), you'll see how many hits that are available to you and how many hits there are in total in the entire corpus (see also section 2.2.1). A list of results pages follows, with the current page shown in bold (in figure 5 there is only one result page in total). The rest of the results page contain the hits themselves. Each hit is presented in the form of a segment containing the word (or phrase) searched for, which in turn is in bold types. Segment division is done as part of the transcription, and an optimal segment is the same as an utterance or a meaningful unit with regard to both syntactical and intonational boundaries. In spoken language, such optimal segments are however the exception rather than the rule, and most often the segment division is a compromise between multiple criteria. The code of the informant having uttered the segment is in the left column, and here you also find three clickable buttons. The first one () opens informant metadata in a new window. The two following buttons ( and ) open video and audio or just audio for the segment in a media player (cf. 5.1) on top of the results page. As is shown in Figure 5, not all recordings include video, and in most such cases there will just be an audio button available. In a few cases both media buttons will be available even though the only available media is audio. Because of this, you should try the audio button as well if you experience lack of video. In cases where neither of the media buttons work, please report it in as a bug (cf. 4.4)!

Figure 5 - The Results Page

5.1 The Media Player

As mentioned in section 2.2.3, both a Flash player and a QuickTime player are available to the user. The Flash player is selected by default, and will be the player that is presented in the following, but the QuickTime player offers the same functionality as the Flash player.

When clicking the video () or audio () button to the left of a hit, the media player will open in the top of the results page, cf. figure 5.1a. The segment currently playing is highlighted in orange on the right hand side. The recording starts automatically. To pause the recording, or play it again without reloading, simply press anywhere in the video window. Pressing the play button below the video will reload and restart the recording. The media player also allows you to broaden the context of any hit using the sliders directly above the play button. In figure 5.1b, the left context has been adjusted to include the three preceding segments.

Figure 5.1a - The Media Player

Figure 5.1b - The Media Player with Expanded Context

Chapter 5 - Contents - Home

5.2 The Map Function

Basically, this function lets you see your hits on a map. When the search is done in recordings that have both an orthographic and a phonetic transcription (cf. 2.2.2), you are also shown variation in the phonetic transcriptions (if you search for a multi-word phrase, variations are only shown for the first word). For Norwegian transcriptions that are transcribed both orthographically and semi-phonetically then, this is a good way to look for phonetic variation.
Let's say we are interested in phonetic variation in the first person pronoun in Eastern Norwegian dialects. We perform a search for the word form "jeg", adjusting the maximum number of hits up to 20000 (to be sure to include all hits) and narrowing the search to the Østlandet region (cf. 2.2.1 and 3.2). If we then press the map button next to the action drop down menu on top of the results page, we'll be presented with a map like the one in figure 5.2a below. The recording locations from which the search has returned hits are shown as red dots on the map, and the different phonetic realizations are shown in a list to the right. By selecting a color each phonetic realization may be marked out on the map. To illustrate this, the recording locations where we find the phonetic realization "i" is marked out with red in figure 5.2b.

Figure 5.2a - Hit Map

Figure 5.2b - Hit Map with Marked Forms

Chapter 5 - Contents - Home

5.3 The Action Menu

There are several functions available for handling the results available through the action drop down menu. You may count match occurrences (5.3.1), download hits (5.3.2), sort hits (5.3.3), compile statistics on collocations (5.3.4), annotate hits (5.3.5), delete hits from the hit list (5.3.6) or save hits for later retrieval (5.3.7).

Chapter 5 - Contents - Home

5.3.1 Count

The count function easily allows you to get an overview over how many times each specific match to your query occurs in the hit list. When count is selected in the action drop down menu, the options in figure 5.3.1a will appear. The different options are presented in the following.

Figure 5.3.1a - Count Options

The case sensitive option is selected by default, making hits that only differ in case are counted separately. Deselect this, and such hits are counted together. The create headings option applies to the downloadable output formats, which are dealt with in 5.3.2.

You are also given the option to choose what level of accuracy that should be included in the count. The more specific level orthographic word form is what is selected by default. When only this level of accuracy is selected, all hits that include different orthographic word forms will be counted separately. The most general level is part of speech. When only this level of accuracy is selected, all hits that include different parts of speech will be counted separately. At a slightly less general level comes lemma. It is only available for transcriptions that are lemmatized, i.e. Norwegian and Danish, cf. 1.1.1. When only this level of accuracy is selected, all hits that include different lemmas will be counted separately. The most accurate level, phonetic word form, is only available for transcriptions that have a phonetic transcription, i.e. Norwegian and Övdalian, cf. 2.2.2. When only this level of accuracy is selected, all hits that include different phonetic word forms will be counted separately. This is handy for quantifying phonetic variation in Norwegian transcriptions, cf. the example below.
More than one level of accuracy may be included in the count, but the hits will always be grouped according to the most accurate level selected. In the output lists (see below), the different levels will be presented in the following order for each hit (separated by a hyphen): orthographic word form - lemma - phonetic word form - part of speech.

The maximum number of results is left empty by default, which means that all hits are included in the count. Any number entered here will limit the number of unique hits included in the count, regardless of how many matches each of these hits includes.

The count results may be output in several different formats. HTML is what is selected by default, and will quite simply produce a list of the number of occurrences of each unique hit, with the most frequent hit at the top. The histogram and pie chart outputs give a graphic representation of the same values. The tab-separated values, comma-separated values and excel spreadsheet output formats are all primarily for downloading results for later reference. Tab and comma separated values may be saved as text files and imported in various database software.

To illustrate the use of the count function, we can use it on the results for the search for phonetic variation in the first person singular personal pronoun "jeg" in Eastern Norwegian presented in 5.2. All hits in this search belong to the same lemma and orthographic word form, so what we are interested in is the phonetic word form. We therefore select phonetic word form as accuracy level for the counting, and at the same time deselect orthographic word form. The rest of the options are left in their default settings, before we press the submit query button. The list of the different phonetic realizations we then are presented with is reproduced in figure 5.3.1b below.

Figure 5.3.1b - Count List for Phonetic Realizations of "jeg" in Eastern Norwegian

Chapter 5 - Contents - Home

5.3.2 Download

The download function gives you the opportunity to save the hits locally on your computer. In addition to the match phrase itself, the left and right context will always be included, and you can optionally include informant code, recording genre and informant word count. When download is selected in the action drop down menu, the options in figure 5.3.2a will appear. The different options are presented in the following.

Figure 5.3.2a - Download Options

The create headings box is checked by default, ensuring that the different data columns all have headings that specify their contents (which may be informant code, genre, word count, left context, match or right context). Such headings will not be created if this box is unchecked.

The check boxes found under the token data headings gives you the opportunity to include different levels of accuracy of the hits you download. The different levels of accuracy are orthographic word form (selected by default), lemma, phonetic word form and part of speech. The different levels of accuracy are described in more detail in section 5.3.1 above, note in particular that the levels lemma and phonetic word form only are available for certain languages (Norwegian and Danish, and Norwegian and Övdalian, respectively). The selections made under token data (all) apply to the left and right context, in addition to the match itself, and override any selections made under token data (match).

When downloading results, it is also possible to include three kinds of metadata associated with each hit: informant code (cf. 3.1), genre (cf. 3.3), and word count. The word count specifies the total number of words included in the corpus for each informant (cf. also 3.4).

Annotations (cf. 4.2 and 5.3.5) connected to hits may also be included in the download, but values may only be downloaded from one set of annotations at the time (or free annotation values). Simply select the annotation set with the values you wish to include from the annotation drop down menu.

The following formats are available for download: tab separated values (default), comma separated values, Excel spreadsheet and HTML.

Chapter 5 - Contents - Home

5.3.3 Sort

The sort function gives you the opportunity to arrange the hits on the result page in a specific order. When sort is selected in the action drop down menu, the options in figure 5.3.3 will appear. The different options are presented in the following.

Figure 5.3.3 - Sort Options

The case sensitive box is unchecked by default, meaning that hits will be sorted alphabetically irrespective of capitalization.

You may sort your hits alphabetically after left context, match or right context. In addition you may sort your hits after informant code (which is also the default sorting), or simply randomize your hits.

If you wish to sort your hits alphabetically after match or left or right context, you also need to select what level of accuracy the hits should be sorted after ("features used on tokens"). The default selection is orthographic word form. As mentioned in section 5.3.1, two of the other levels are only available for some languages: lemma for Danish and Norwegian, and phonetic word form for Övdalian and Norwegian.

If you choose to sort after one of the context variables, you also need to specify which position in the context the word you wish to sort your hits after should have. If this field is left empty, the hits are sorted after the first word to the left/right of the match.

If you wish, you may have secondary sorting of your hits in addition to the primary sorting. This means that your hits will be sorted after your primary sorting options first, and then sorted after the secondary sorting options for hits that are identical according to the primary ones.

Even though more than one level of accuracy may be chosen, this should be avoided due to unpredictable sorting results. It's better to use secondary sorting, which achieves the same goal in a more predictable manner. An unfortunate bug that should also be noted is that sorting results resets any choices made in the result options (cf. section 2.2 above) to their default values. If you have selected to show both orthographic and phonetic transcriptions, though, the phonetic transcriptions will reappear if you try to change the result page.

Chapter 5 - Contents - Home

5.3.4 Collocations

Using the collocations option in the action menu, you may compile statistics on collocations with one or two words in addition to the match of your search (so called bigrams and trigrams). When collocations is selected in the action drop down menu, the options in figure 5.3.4a will appear. Some of these are presented in the following, but explaining the different statistical measures are unfortunately out of scope for this user manual. Note also that only the first word in a matching phrase is used, so if any matching phrase contains more than one word, the right side statistics will contain errors.

Figure 5.3.4a - Collocations Options

The case sensitive box is left unchecked by default. This means that bigrams and trigrams that include words that are identical apart from capitalization will be counted together. If you want such collocations to be counted separately, check this box.

The collocates option lets you choose what the match should be collocated with. There are three levels of accuracy (cf. section 5.3.1) to choose from: orthographic word form, lemma and part of speech. The default selection gives you bigrams or trigrams where the match is collocated with orthographic word forms. You may also collocate your match with parts of speech or lemmas (but the latter only works for transcriptions of Norwegian and Danish that are lemmatized). More than one level of accuracy may be chosen, but the collocation will always follow the most accurate level selected. In the output lists then, the more general levels will just serve as information (to what lemma a word form belongs, or to what part of speech a word form or lemma belongs).

If you aren't interested in the more marginal collocations, you may also set different cutoff values. The maximum number of results value is set to 1000 by default. This means that you will only be presented with the 1000 most frequent collocations. The minimum association value field is left empty by default, and has to do with the results of the statistical measures. The minimum number of occurrences field is also left empty by default, but you may enter any number here in order to only retrieve collocations that have a minimum number of occurrences in the current result set.

To exemplify the use of the collocations option, we perform a search for all occurrences of the 1st person sg. pronoun "jeg" in Norwegian. The aim is to see what the 5 most frequent parts of speech to the left and right of "jeg" are. To do this, we select the collocations option from the action menu of the results page, and then select only "part of speech" of the collocates options, and the statistical measure "frequency" under bigram. The rest of the options are left in their default settings. When we click the submit query button, we then get the results shown in figure 5.3.4b. The left context is shown in the left column and the right context in the right column. The first subcolumn shows the ngram, or collocation, and the numbers under "occ" show the number of occurrences in the result set. You may download the results using the drop down menu on the top of the page. The following formats are available: tab separated values, comma separated values and Excel spreadsheet. Unfortunately, the histogram output formats do not work at present.

Figure 5.3.4b - Collocation Frequencies

Chapter 5 - Contents - Home

5.3.5 Annotate

Glossa gives you the opportunity to annotate hits both freely and using predefined sets of values, so called annotation sets. In section 4.2 we saw how to create and edit annotation sets, and how to retrieve annotation statistics on saved result sets. In this section however, we'll learn how to perform the annotation itself. When you select the annotate option from the action drop down menu, you are presented with the annotate options presented in figure 5.3.5a below.

If you already have created an annotation set, you may simply select it from the drop down menu and press the annotate button. If not, you may go to the manage annotation sets page following the manage sets link. Since this process is thoroughly explained in section 4.2.1, we won't repeat it here, and we jump straight to explaining annotating using an existing annotation set. An alternative to using an annotation set is free annotation. This is also explained below.

It is important to remember that annotations are not retrievable independently of the hits annotated. It's therefore important to remember your query, or quite simply save the hits you want to annotate. How to save hits is explained in section 5.3.7. As mentioned in section 4.2.1, all existing annotation sets are available to all users because of the way the annotation functionality is designed. This also means that a certain segment (cf. section 5) may only be annotated once using a certain annotation set (or free annotation). It's therefore good etiquette to leave other users' annotation sets alone, even if there already exists one that fits your needs, and rather make a new one for your own use. The same applies if someone else already has saved free annotation for a segment: You may add your own annotation, but add it onto the annotation that's already there using your own initials or some other way of identifying that part of the annotation is yours.

Figure 5.3.5a - Annotate Options

Annotating Using an Existing Annotation Set
To illustrate how to annotate hits using an existing annotation set, we have performed a search for the present tense form of the verb "være", "er", in the Fredrikstad dialect of Norwegian (cf. 3.2). We have limited our maximum number of hits to 50, and have made sure the hits are randomized (cf. 2.2.1). We have also saved the results (cf. 5.3.7) to be able to retrieve them later (along with the annotations). Following the guidelines in section 4.2.1, we have prepared an annotation set called "stress", with the values "stressed" and "unstressed". Using this set, we can listen through the 50 hits and easily annotate stress variation. When we select the stress set from the annotate options drop down menu (cf. figure 5.3.5a above), we will be presented with the hit list, now with a small drop down menu above each hit containing the predefined values from the annotation set (cf. figure 5.3.5b below). The value "unstressed" is set as the default, and is therefore preselected for all hits. Now we may simply listen through the hits one by one, changing the value in the drop down menu from "unstressed" to "stressed" if need be, until we have done so for all the hits on result page 1. It's now very important that we press the save annotations button before we continue on to page two. If we change pages without saving, our annotations will be lost! When the save annotations button is pressed, you'll be redirected to a page confirming your annotations. To get back to the result page, you may just use the back button in your browser. The annotations may later be downloaded along with the hits (cf. section 5.3.2) or summarized statistically through the Glossa interface (cf. section 4.2.2).

Figure 5.3.5b - Annotating Hits

Free Annotation
Annotating hits with more unsystematic notes is possible using free annotation. When this option is selected from the annotate options drop down menu (cf. figure 5.3.5a above), we are presented with a hit list with a text box above each hit. Any text may be entered into these boxes, and as with annotation using sets explained above, this must be saved using the save annotation button before changing result page. The annotations may later be downloaded along with the hits (cf. section 5.3.2).

Chapter 5 - Contents - Home

5.3.6 Delete Hits

This function allows you to delete single hits from a set of results. When delete hits is selected from the action drop down menu, each hit will be preceded by a check box, and you'll be given the following options: delete selection, select all, unselect all and finished deleting, cf. figure 5.3.6. You delete hits by checking their box and pressing the delete selection button. You will then be asked if you are finished deleting hits, or if you want to delete more hits on the same page. When you are finished, the result set may be saved as usual, cf. section 5.3.7.

Figure 5.3.6 - Deleting Hits

Chapter 5 - Contents - Home

5.3.7 Save Hits

You may also save a result set through the action menu for later retrieval. When you select the save hits option from the action drop down menu, you are presented with the save hits options page presented in figure 5.3.7 below. Here, you simply enter the desired name of your result set, and press the save results button. Previously used names are listed below and cannot be reused.

Retrieving and editing saved result sets may be done through the search interface, cf. section 4.1.

Figure 5.3.7 - Save Hits Options

Chapter 5 - Contents - Home