LIA Norwegian comprises 3.5 million words elicited from 1382 informants from 227 local areas in Norway. The material is transcribed both (quasi) phonetically and orthographically (Nynorsk), as well as being morphologically tagged with the newly developed spoken language tagger for Nynorsk, the LIA tagger.
LIA Norwegian is accessible via the corpus search interface Glossa.
The recordings and transcriptions were provided by four universities: NTNU, UiB, UiO and UiT. There is also material from Målførearkivet (the dialect archive at the University of Oslo) that was previously available in the Nordic Dialect Corpus.
Search the corpus
Read the user manual for LIA Norwegian
Audio files, transcriptions and metadata from the corpus are available at a file depot, along with audio that has not been transcribed in the project. The transcriptions can be downloaded in ELAN format from the depot, while the audio can be streamed.
Search the LIA file depot
Read about the depot
Downloadable transcriptions and audio files
All transcriptions are downloadable in plain text format. A folder containing 553 transcriptions from LIA Norwegian, in ELAN format, along with their corresponding audio, can moreover be downloaded. The recordings contain no sensitive information and can be used freely by linguists or for other technological purposes. (Many of the LIA recordings have content that has been deemed sensitive. Such content has not been transcribed, such that the recordings can still be used in the corpus. These recordings are not available for download.)
Download selected audio files and transcriptions from LIA Norwegian
Download all transcriptions from LIA in plain text format
The LIA Treebank
The LIA Treebank includes 7536 speech segments and 77 701 tokens from LIA Norwegian. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in three versions:
- A downloadable version in conllx format.
Download the treebank from Github
- A searchable version in the search interface Glossa.
Search the treebank
- A downloadable version in conllu format. This version is automatically converted to Universal Dependencies by Lilja Øvrelid, University of Oslo. The conllu version contains 5250 speech segments and 55 410 tokens.
Download the conllu version
tekstlab-post at iln.uio.no
Read more about the LIA-project