Nordic Dialect Corpus: Technical Solutions

Technical Solutions

The corpus is searchable via a web search interface using the new version of the corpus explorer tool Glossa. Glossa is a very user-friendly interface built on top of the IMS Corpus Work Bench Query system. The results are shown as concordances linked to the multimedia representations. The Glossa system also allows further processing of the search results by exporting them to external file formats, and by viewing them in a variety of ways, such as frequency counts and maps. The orthographic and phonetic transcriptions are linked to each other and are individually searchable in the Glossa web user interface. They are also word by word linked to grammatical tags. The transcriptions are also linked to audio and video.

Transcriptions have been performed using the free software Transcriber. The Oslo Transliterator developed at the Text Laboratory has been used to translate from phonetic to orthographic transrciptions.

Tagging has been done using and adapting existing taggers and tagger models, see the tagging web page above.

People involved
The development team has consisted of several people at the Text Laboratory: Main developers Anders Nøklestad and Joel Priestley; Lars Nygaard, Kristin Hagen and Janne Bondi Johannessen. In addition, many participants in ScanDiaSyn and NORMS, have voiced wishes and opinions that have helped shape the corpus. Øystein A. Vangsnes should be mentioned as a key person here.

Financed by: NorDiaSyn and NordForsk.

Transcription tools:
Transcriber
Transcription guidelines for Norwegian in Nordic Dialect Corpus (Pdf. In Norwegian)
Translation to orthographic transcription - Guidelines (Pdf. In Norwegian)

Transliterator:
The Oslo Transliterator

Taggers:
TreeTagger
The Oslo-Bergen Tagger
TnT Tagger

Corpus tools:
IMS Corpus Work Bench Query system

Glossa:
New version of Glossa