Home        
CLARINO        
CLARIN        
Text Laboratory        



Welcome to CLARINO Text Laboratory Centre

CLARINO is a Norwegian infrastructure project jointly funded by the Research Council of Norway and a consortium of Norwegian universities and research institutions. Its goal is to implement the Norwegian part of CLARIN. The ultimate aim is to make existing and future language resources easily accessible for researchers and to bring eScience to humanities disciplines. The CLARINO project is coordinated by University of Bergen.

CLARINO Text Laboratory Centre is a C centre in the CLARIN infrastructure.
The table below shows Text Laboratory resources with a signed CLARIN agreement. More resources will come. Go to the Text Laboratory homepage to view all resources from the Text Laboratory.

Corpora:

The Big Brother Corpus (2007) 550 000 words. Speech. Norwegian TV show from 2001. Accessible through interface. Licence: . Licence conditions.
- Download metadata - Get username and password - Search the corpus
Corpus of American Nordic Speech (2015) 244 000 words. Speech. American Norwegian/Swedish. Accessible through interface. Licence: . Licence conditions. - Download metadata - Search the corpus
Corpus of Doctor-Patient Consultations from Ahus (2015) 950 000 words. Speech. Transcriptions without audio files. Accessible through interface. Licence: . Licence conditions.
- Download metadata - Get username and password - Search the corpus
The Lexicographic Corpus for Norwegian Bokmål (2013) 100 mill words. Written text. Norwegian Bokmål. Accessible through interface. Licence: . Licence conditions.
- Download metadata
- Get username and password - Search the corpus
Nordic Dialect Corpus (2013) 3 mill words. Speech. Nordic dialects. Accessible through interface. Licence: . Licence conditions.
- Download metadata - Search the corpus
Nordic Syntax Database (2013) 924 sentence judgments by Nordic dialect speakers. Accessible through interface. Licence: . Licence conditions.
- Download metadata
- Search the database
The NORINT Corpus (2017) Speech (110 000 words) and written text (53 000 words). Norwegian as second language. Accessible through interface. Licence: . Licence conditions.
- Download metadata
- Search the corpus
The NORM Corpus (2017) 1.17 mill words. Written pupil texts. Norwegian Bokmål and Nynorsk. Accessible through interface. Licence: . Licence conditions.
- Download metadata
- Search the corpus
Norwegian Words (2013) Lexical database with 1650 Norwegian Bokmål nouns, adjectives and verbs. Accessible through interface. Licence: . Licence conditions.
- Download metadata
- Search the corpus
NoTa-Oslo Norsk talespråkskorpus - Oslodelen (2006) 900 000 words. Speech. Oslo sociolects. Accessible through interface. Licence: . Licence conditions.
- Download metadata - Get username and password - Search the corpus
NoWaC - Norwegian Web as Corpus v1.0 (2010) 700 million tokens. Written text. Bokmål. Accessible through interface or download. Licence: . Licence conditions. Or licence for download: . .
- Download metadata - Download the corpus - Search the corpus
Frequency lists from NoWaC (2010) Frequency lists. Bokmål. Licence: . .
- Download metadata
- Download Frequency lists
The Oslo Corpus of Tagged Norwegian Texts, Bokmål (1999) 18.5 mill words. Written text. Norwegian Bokmål. Accessible through interface. Licence: . Licence conditions.
- Download metadata
- Search the corpus
The Oslo Corpus of Tagged Norwegian Texts, Nynorsk (1999) 3.8 mill words. Written texts. Norwegian Nynorsk. Accessible through interface. Licence: . Licence conditions.
- Download metadata - Search the corpus
The SKRIV Corpus (2016) 112 000 words. Written texts by students in upper secondary vocational education programs. Norwegian Bokmål. Accessible through interface. Licence: . Licence conditions.
- Download metadata - Search the corpus
TAUS - Talemålsundersøkelsen i Oslo (2007) 245 500 words. Speech. Oslo sosiolect from 1971-1973. Accessible through interface. Licence: . Licence conditions.
- Download metadata - Search the corpus

Tools:

Glossa Search and post-processing tool for text and speech corpora. Licence: . MIT Licence.
- Download metadata - Download Glossa
The Oslo-Bergen Tagger Morphological tagger for Norwegian Bokmål and Nynorsk. Licence: . GPL.
- Download metadata - Download OBT


More language resources from the Text Laboratory.


Contact: tekstlab-post at iln.uio.no

Privacy Policy of the CLARINO Text Laboratory Centre

 

Clarino Consortium partners: