Home        
CLARINO        
CLARIN        
Text Laboratory        



Welcome to CLARINO Text Laboratory Centre

CLARINO is a Norwegian infrastructure project jointly funded by the Research Council of Norway and a consortium of Norwegian universities and research institutions. Its goal is to implement the Norwegian part of CLARIN. The ultimate aim is to make existing and future language resources easily accessible for researchers and to bring eScience to humanities disciplines. The CLARINO project is coordinated by University of Bergen.

CLARINO Text Laboratory Centre is a C centre in the CLARIN infrastructure.
The table below shows Text Laboratory resources with a signed CLARIN agreement. More resources will come. Go to the Text Laboratory homepage to view all resources from the Text Laboratory.

Corpora:

The Big Brother Corpus (2007) 440 300 tokens. Speech. Norwegian TV show from 2001. Accessible through Glossa. Licence: - Licence conditions - Download metadata - Search the corpus
Corpus of American Nordic Speech v.3 (2019) (746 000 tokens). Speech. American Norwegian/Swedish. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the corpus
Corpus of Doctor-Patient Consultations from Ahus (2015) 950 000 tokens. Speech. Transcriptions without audio files. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the corpus
The Lexicographic Corpus for Norwegian Bokmål (2013) 100 mill tokens. Written text. Norwegian Bokmål. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the corpus

LIA Norwegian - Corpus of historical dialect recordings

(2018) 3,5 mill tokens. Speech. Norwegian dialects from 1937 - 1996. Accessible through interface. Licence: - Licence conditions
- Download metadata - Search the corpus
LIA Sápmi - Sámegiela hállangiellakorpus (2018) 190 000 tokens. Speech. Sami dialects. Accessible through interface. Licence: - Licence conditions
- Download metadata - Search the corpus
Nordic Dialect Corpus v. 4.0 (2013) 2.75 mill tokens. Speech. Nordic dialects. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the corpus
Nordic Syntax Database (2013) 924 sentence judgments by Nordic dialect speakers. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the database
The NORINT Corpus (2017) Speech (110 000 tokens) and written text (53 000 tokens). Norwegian as second language. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the corpus
The NORM Corpus (2017) 1.17 mill tokens. Written pupil texts. Norwegian Bokmål and Nynorsk. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the corpus
Norwegian Words (2013) Lexical database with 1650 Norwegian Bokmål nouns, adjectives and verbs. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the database
NoTa-Oslo Norsk talespråkskorpus - Oslodelen (2006) 957 000 tokens. Speech. Oslo sociolects. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the corpus
NoWaC - Norwegian Web as Corpus v1.0 (2010) 700 million tokens. Written text. Bokmål. Accessible through interface or download. Licence: - Licence conditions - Licence for download:
- Download metadata - Download the corpus - Search the corpus
Frequency lists from NoWaC (2010) Frequency lists. Bokmål. Licence:
- Download metadata
- Download Frequency lists
The SKRIV Corpus (2016) 112 000 tokens. Written texts by students in upper secondary vocational education programs. Norwegian Bokmål. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the corpus
TAUS - Talemålsundersøkelsen i Oslo v.3 (2007, 2020) 388 000 tokens. Speech. Oslo sosiolect from 1971-1973. Accessible through interface. Licence: - Licence conditions - Download metadata - Search the corpus

Downloadable transcriptions (and audio files) from corpora:

The Big Brother Corpus - downloadable transcriptions (2007) 440 300 tokens. Transcriptions of dialogs from the Norwegian TV show from 2001. Licence: - Licence conditions - Download metadata - Download transcriptions
Corpus of American Nordic Speech v.3 - downloadable transcriptions (2019) (746 000 tokens). Speech. Transcriptions of American Norwegian/Swedish interviews and dialogs. Licence: - Licence conditions - Download metadata - Download transcriptions
LIA: Transcriptions and selected audio files from LIA Norwegian for download (2021) 553 transcriptions with corresponding audio files from LIA Norwegian. Speech. Licence: - Licence conditions - Download metadata - Download audio files and transcriptions
The LIA Treebank (2021) 5250 speech segments and 55 410 tokens from LIA Norwegian annotated with morphological and dependency-style syntactic analysis. Licence: - Licence conditions - Download metadata - Download conllx-format - download conllu-format
Nordic Dialect Corpus v. 4.0 - downloadable transcriptions (2013) 2.75 mill tokens. Speech. Transcriptions of interviews and dialogs with Nordic dialects. Licence: - Licence conditions - Download metadata - Download transcriptions
NoTa-Oslo Norsk talespråkskorpus - Oslodelen - downloadable transcriptions (2006) 957 000 tokens. Speech. Transcriptions of interviews and dialogs with Oslo sociolects. Licence: - Licence conditions - Download metadata - Download transcriptions
TAUS - Talemålsundersøkelsen i Oslo - downloadable transcriptions (2007, 2020) 388 000 tokens. Speech. Transcriptions of interviews with Oslo sosiolect from 1971-1973. Licence: - Licence conditions - Download metadata - Download transcriptions

Tools:

Glossa Search and post-processing tool for text and speech corpora. Licence: - MIT Licence - Download metadata - Download Glossa
The Oslo-Bergen Tagger Morphological tagger for Norwegian Bokmål and Nynorsk. Licence: - GPL - Download metadata - Download OBT


More language resources from the Text Laboratory.


Contact: tekstlab-post at iln.uio.no

Privacy Policy of the CLARINO Text Laboratory Centre

 

Clarino Consortium partners: