CMDI 1.1. Metadata

Header

MdCreator: Kristin Hagen

MdCreationDate: 2016-11-04

MdSelfLink:

MdProfile: clarin.eu:cr1:p_1407745711925

MdCollectionDisplayName: Clarino - Textlab

Resources

ResourceProxyList:

ResourceProxy [id=‘norint-lp’]:

ResourceType [mimetype=‘’]: LandingPage

ResourceRef: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/projects/norint/index.html

JournalFileProxyList:

ResourceRelationList:

IsPartOfList:

Components

corpusProfile:

resourceCommonInfo [ComponentId=‘clarin.eu:cr1:c_1396012485126’]:

resourceType: corpus

identificationInfo [ComponentId=‘clarin.eu:cr1:c_1396012485125’]:

resourceName [xml:lang=‘nb’]: NORINT-korpuset

resourceName [xml:lang=‘en’]: The NORINT Corpus

description [xml:lang=‘en’]: The NORINT Corpus consists of speech from 51 and written texts from 116 adult learners of Norwegian as second language, all of whom were taking advanced Norwegian courses (≈the CEFR level B2) at the University of Oslo during the summers of 2014 and 2015.

The NORINT Corpus is divided into three sub-parts:

- NORINT Speech: The speech part of the corpus consists of interviews and conversations, 111,000 words all together. In the interviews, a teacher asks L2 learners general questions about their background, studies, work, and future plans. In addition, the same L2 learners converse in pairs about optional themes such as culture, leisure, travel, or life in Norway. There are both audio and video recordings of the interviews and conversations.
The recordings are transcribed orthographically with the transcription tool Elan.
- NORINT Recited: 57 L2 learners, 51 of whom contributed to the NORINT Speech sub-part, recite a short story, as well as 60 non-contextualized sentences. This part of the corpus has been audio-recorded.
- NORINT Text: The text part of the corpus consists of 53,247 words from 116 exam papers written by adult L2 learners taking their Norwegian exams. The informants are partially the same as in NORINT Speech and NORINT Recited but the identification of participants is not possible in the corpus because of privacy protection.
The texts are available in three formats: one original hand written version in pdf format, one written digital copy of the original version and one version where all the orthographic errors are corrected. The original text version and the corrected version are linked together.

The corpus is searchable in the search interface Glossa, and the transcriptions are linked to audio and video files.

description [xml:lang=‘nb’]: NORINT-korpuset inneholder muntlig materiale fra 51 og skriftlig materiale fra 116 voksne internasjonale studenter som gikk på norskkurs på høyere nivå (≈CEFR-nivå B2) ved Universitetet i Oslo sommeren 2014 og 2015.

NORINT-korpuset består av tre deler:

- NORINT tale: Taledelen av korpuset består av intervjuer og samtaler, i alt 111 000 ord. Studentene ble intervjuet om bakgrunn, studier, arbeid og fremtidsplaner. I tillegg er det gjort video- og lydopptak der informantene samtaler to og to om emner som kultur, fritid, reiser eller livet i Norge. Det er 30 – 40 minutters opptak av hver student.
Opptakene er transkribert ortografisk med transkripsjonsprogrammet Elan.
- NORINT opplest: 57 informanter, 51 av dem de samme som bidro til NORINT tale, leser opp 60 utvalgte setninger og en liten historie. Det finnes bare lydopptak av opplesningene.
- NORINT tekst: Tekstdelen av korpuset består av 53 247 ord fra 116 eksamensoppgaver. Informantene er delvis de samme som i den muntlige delen av materialet. Av hensyn til personvern er det imidlertid ikke synlige koplinger i korpuset.
Tekstene i NORINT tekst foreligger i tre ulike formater: en håndskrevet originalversjon i pdf-format, en innskrevet nøyaktig kopi av originalversjonen og en versjon der alle ortografiske feil er rettet. Tekstversjonene og de korrigerte versjonene er lenket sammen.

Korpuset er søkbart i søkeverktøyet Glossa der transkripsjonene dessuten er koplet til lyd- og videofiler.

resourceShortName: NORINT

url: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/projects/norint/index.html

url: https://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/norint/index.html

PID: http://hdl.handle.net/11538/0000-000B-C01E-B

distributionInfo [ComponentId=‘clarin.eu:cr1:c_1396012485124’]:

licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’]:

userCategory: Academic

distributionAccessMedium: accessibleThroughInterface

executionLocation: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/norint/

executionLocation: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/projects/norint/index.html

licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:

licenceFamily: CLARIN

licenceName: CLARIN_ACA-NC-LOC-PRIV-ND-*

licenceURL: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1

conditionsOfUse: BY

conditionsOfUse: ID

conditionsOfUse: LOC

conditionsOfUse: NC

conditionsOfUse: ND

conditionsOfUse: NORED

conditionsOfUse: PRIV

nonStandardConditionsOfUse: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory.
Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.

licensor:

actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:

actorType: organization

organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:

organizationName [xml:lang=‘en’]: University of Oslo

organizationName [xml:lang=‘no’]: Universitetet i Oslo

organizationShortName [xml:lang=‘no’]: UiO

organizationShortName [xml:lang=‘en’]: UoO

departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies

departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)

communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:

email: l.a.harnas@iln.uio.no

email: annely.tomson@iln.uio.no

url: http://www.hf.uio.no/iln/

address: Box 1102 Blindern

zipCode: 0317

city: OSLO

country: Norway

distributionRightsHolder:

actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:

actorType: organization

organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:

organizationName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies, University of Oslo

organizationShortName [xml:lang=‘en’]: ILN

departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies, University of Oslo

communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:

email: tekstlab-post@iln.uio.no

url: http://www.hf.uio.no/iln/english/

address: Box 1102 Blindern

zipCode: 0317

city: OSLO

country: Norway

contact:

actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:

actorType: organization

organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:

organizationName: The Text Laboratory

organizationShortName: Textlab

departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo

communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:

email: tekstlab-post@iln.uio.no

url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/

address: Box 1102 Blindern

zipCode: 0317

city: OSLO

country: Norway

actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:

actorType: person

personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:

surname: Harnæs

givenName: Liv Andlem

communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:

email: l.a.harnas@iln.uio.no

actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:

actorType: person

personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:

surname: Tomson

givenName: Annely

communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:

email: annely.tomson@iln.uio.no

metadataInfo [ComponentId=‘clarin.eu:cr1:c_1407745711922’]:

metadataCreationDate: 2017-03-21

metadataLastDateUpdated: 2018-06-05

metadataCreator:

actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:

actorType: person

personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:

surname: Hagen

givenName: Kristin

organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:

organizationName: The Text Laboratory

organizationShortName: Textlab

departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo

communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:

email: kristin.hagen@iln.uio.no

url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/

address: Box 1102 Blindern

zipCode: 0317

city: OSLO

country: Norway

versionInfo [ComponentId=‘clarin.eu:cr1:c_1430905751648’]:

version: 1

lastDateUpdated: 2016-09-01

resourceDocumentationInfo [ComponentId=‘clarin.eu:cr1:c_1355150532301’]:

documentationStructured [ComponentId=‘clarin.eu:cr1:c_1361876010648’]:

role: documentation

documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:

documentType: manual

title [xml:lang=‘nb’]: Brukerveiledning til Norint-korpuset

author: Kristin Hagen and Viktoria Holund in cooperation with Annely Thomson

year: 2017

url: http://tekstlab.uio.no/norint/index.html

documentLanguageName: Norwegian Bokmål

documentLanguageId: nb

resourceCreationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711921’]:

creationStartDate: 2014-01-01

creationEndDate: 2016-09-01

resourceCreator:

actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:

actorType: person

personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:

surname: Tomson

givenName: Annely

communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:

email: annely.tomson@iln.uio.no

actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:

actorType: person

personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:

surname: Harnæs

givenName: Liv Andlem

communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:

email: l.a.harnas@iln.uio.no

actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:

actorType: organization

organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:

organizationName: The Text Laboratory

organizationShortName: Textlab

departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo

communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:

email: tekstlab-post@iln.uio.no

url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/

address: Box 1102 Blindern

zipCode: 0317

city: OSLO

country: Norway

fundingProject:

projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:

projectName: The NORINT Corpus

fundingType: ownFunds

funder: Department of Linguistic and Scandinavian Studies, University of Oslo

corpusInfo [ComponentId=‘clarin.eu:cr1:c_1407745711878’]:

corpusType: Written Corpus

corpusType: Multimodal Corpus

corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:

mediaType: text

corpusTextInfo [ComponentId=‘clarin.eu:cr1:c_1396012485188’]:

textFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477072’]:

mimeType: txt

characterEncodingInfo [ComponentId=‘clarin.eu:cr1:c_1447674760355’]:

characterEncoding: utf-8

corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:

mediaType: audio

corpusAudioInfo [ComponentId=‘clarin.eu:cr1:c_1404130561236’]:

audioSizeInfo [ComponentId=‘clarin.eu:cr1:c_1360230992160’]:

sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:

size: 57 participants x 3 audio files each for NORINT opplest (Recited)

sizeUnit: files

settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:

naturality: readSpeech

conversationalType: monologue

scenarioType: other

audience: no

interactivity: nonInteractive

audioFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477070’]:

mimeType: mp3 and wav

corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:

mediaType: video

corpusVideoInfo [ComponentId=‘clarin.eu:cr1:c_1407745711880’]:

videoContentInfo [ComponentId=‘clarin.eu:cr1:c_1360931019779’]:

typeOfVideoContent: Grown up foreign students learning Norwegian as their second language

settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:

naturality: spontaneous

conversationalType: dialogue

interactivity: overlapping

interaction: Each informant participates in one conversation with another informant and an interview with a teacher.

videoFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477073’]:

mimeType: mp4

corpusPartGeneralInfo [ComponentId=‘clarin.eu:cr1:c_1407745711882’]:

sourceWorkInfo [ComponentId=‘clarin.eu:cr1:c_1407745712071’]:

workDescription: The NORINT Corpus is divided into three sub-parts:

- NORINT Speech: The speech part of the corpus consists of interviews and conversations, 111,000 words all together. In the interviews, a teacher asks L2 learners general questions about their background, studies, work, and future plans. In addition, the same L2 learners converse in pairs about optional themes such as culture, leisure, travel, or life in Norway. There are both audio and video recordings of the interviews and conversations.
The recordings are transcribed orthographically with the transcription tool Elan.

- NORINT Recited: 57 L2 learners, 47 of whom contributed to the NORINT Speech sub-part, recite a short story, as well as 60 non-contextualized sentences. This part of the corpus has been audio-recorded.

- NORINT Text: The text part of the corpus consists of 53,247 words from 116 exam papers written by adult L2 learners taking their Norwegian exams. The informants are partially the same as in NORINT Speech and NORINT Recited but the identification of participants is not possible in the corpus because of privacy protection.
The texts are available in three formats: one original hand written version in pdf format, one written digital copy of the original version and one version where all the orthographic errors are corrected. The original text version and the corrected version are linked together.

personSourceSetInfo [ComponentId=‘clarin.eu:cr1:c_1360931019775’]:

numberOfPersons: 57

ageOfPersons: adult

sexOfPersons: mixed

originOfPersons: nonNative

dialectAccentOfPersons: Foreign students learning Norwegian.

lingualityInfo [ComponentId=‘clarin.eu:cr1:c_1355150532313’]:

lingualityType: monolingual

languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:

languageId: nb

languageName: Norwegian Bokmål

modalityInfo [ComponentId=‘clarin.eu:cr1:c_1447674760356’]:

modalityType: writtenLanguage

sizePerModality [ComponentId=‘clarin.eu:cr1:c_1447674760351’]:

sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:

size: 53 247 in NORINT tekst (Text)

sizeUnit: words

modalityInfo [ComponentId=‘clarin.eu:cr1:c_1447674760356’]:

modalityType: spokenLanguage

sizePerModality [ComponentId=‘clarin.eu:cr1:c_1447674760351’]:

sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:

size: 110 979 in NORINT tale (Speech)

sizeUnit: words

modalityInfo [ComponentId=‘clarin.eu:cr1:c_1447674760356’]:

modalityType: spokenLanguage

modalityTypeDetails: recited text

sizePerModality [ComponentId=‘clarin.eu:cr1:c_1447674760351’]:

sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:

size: 36 895 in NORINT opplest (Recited)

sizeUnit: words

annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:

annotationType: lemmatization

annotationType: morphosyntacticAnnotation-posTagging

segmentationLevel: word

tagset: The Oslo Bergen-tagger tagset: http://tekstlab.uio.no/obt-ny/english/index.html

tagsetLanguageId: Nb

tagsetLanguageName: Norwegian Bokmål

theoreticModel: Constraint Grammar

annotationMode: automatic

annotationManualUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532325’]:

role: annotationManual

documentUnstructured: http://www.tekstlab.uio.no/obt-ny/english/index.html

annotationTool [ComponentId=‘clarin.eu:cr1:c_1355150532326’]:

targetResourceNameURI: The Oslo-Bergen Tagger: http://tekstlab.uio.no/obt-ny/english/index.html

annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:

annotationType: morphosyntacticAnnotation-posTagging

annotatedElements: other

segmentationLevel: word

tagset: POS tagset created for the statistical NoTa-tagger - based on the tagset of the Oslo Bergen Tagger.

tagsetLanguageId: Nb

tagsetLanguageName: Norwegian Bokmål

theoreticModel: TreeTagger

annotationMode: automatic

annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:

role: annotationManual

documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:

documentType: article

title [xml:lang=‘en’]: Tagging a Norwegian Speech Corpus

author: Anders Nøklestad and Åshild Søfteland

editor: Joakim Nivre,Heiki-Jaan Kaalep,Kadri Muischnek, Mare Koit

year: 2007

bookTitle: Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007

pages: 245–248

conference: Nodalida 2007

documentLanguageName: English

documentLanguageId: en

annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:

role: annotationManual

documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:

documentType: article

title [xml:lang=‘nb’]: Manuell morfologisk
tagging av NoTa-materialet med støtte fra en statistisk tagger.

author: Åshild Søfteland og Anders Nøklestad

editor: Janne Bondi Johannessen og Kristin Hagen

year: 2008

publisher: Novus forlag

bookTitle: Språk i Oslo. Ny forskning omkring talespråk

pages: 226–234.

ISBN: 978-82-7099-471-7

documentLanguageName: Norwegian

documentLanguageId: nb

annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:

role: annotationManual

documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:

documentType: manual

title [xml:lang=‘nb’]: NoTa-taggeren: TAGGEVEILEDNING

author: Åshild Søfteland

year: 2007

url: http://www.tekstlab.uio.no/nota/oslo/Taggeveiledning2.pdf

documentLanguageName: Norwegian bokmål

documentLanguageId: nb

classificationInfo [ComponentId=‘clarin.eu:cr1:c_1403588862809’]:

genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:

genreType: textGenre

genre: unstandardised

unstandardisedGenre: Exam papers written by students
The texts are available in three different versions: one scanned original in pdf format and two transcribed versions in txt format: one original transcription with errors and one version where the errors are corrected.
All versions are linked and it is possible to search in both transcribed versions.

genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:

genreType: speechGenre

genre: informal

genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:

genreType: speechGenre

genre: recited

timeCoverageInfo [ComponentId=‘clarin.eu:cr1:c_1447674760358’]:

timeCoverage: 2014