CMDI 1.2 Metadata
Header
cmd:MdCreator: Kristin Hagen
cmd:MdCreationDate: 2022-11-16
cmd:MdSelfLink:
cmd:MdProfile: clarin.eu:cr1:p_1407745711925
cmd:MdCollectionDisplayName: Clarino - Textlab
Resources
cmd:ResourceProxyList:
cmd:ResourceProxy [id=‘ndc-lp’]:
cmd:ResourceType [mimetype=‘’]: LandingPage
cmd:ResourceRef: http://tekstlab.uio.no/nota/scandiasyn/
cmd:ResourceProxy [id=‘ndc-treebank’]:
cmd:ResourceType [mimetype=‘’]: Resource
cmd:ResourceRef: http://tekstlab.uio.no/nota/scandiasyn/treebank.html
cmd:ResourceProxy [id=‘ndc-corpus’]:
cmd:ResourceType [mimetype=‘’]: Resource
cmd:ResourceRef: https://tekstlab.uio.no/glossa3/ndc2
cmd:ResourceProxy [id=‘ndc-treebank-conllx’]:
cmd:ResourceType [mimetype=‘’]: Resource
cmd:ResourceRef: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-BokmaalNDC
cmd:ResourceProxy [id=‘ndc-treebank-glossa’]:
cmd:ResourceType [mimetype=‘’]: Resource
cmd:ResourceRef: https://tekstlab.uio.no/glossa3/ndctrebanken
cmd:JournalFileProxyList:
cmd:ResourceRelationList:
cmd:ResourceRelation:
cmd:RelationType: treebank
cmd:Resource:
cmd:Role:
cmd:Resource:
cmd:Role:
cmd:ResourceRelation:
cmd:RelationType: version
cmd:Resource:
cmd:Role:
cmd:Resource:
cmd:Role:
cmd:ResourceRelation:
cmd:RelationType: version
cmd:Resource:
cmd:Role:
cmd:Resource:
cmd:Role:
Components
cmdp:corpusProfile:
cmdp:resourceCommonInfo [cmd:ref=‘ndc-treebank’]:
cmdp:resourceType [cmd:cmd=‘ndc-treebank’]: corpus
cmdp:identificationInfo [cmd:ref=‘ndc-treebank’]:
cmdp:resourceName [xml:lang=‘nb’]: NDC-trebanken
cmdp:resourceName [xml:lang=‘en’]: The NDC Treebank
cmdp:description [xml:lang=‘en’]: The NDC Treebank includes 4637 speech segments and 66 042 tokens from the Norwegian part of Nordic Dialect Corpus. The segments are taken from 30 transcribed interviews from 17 places in Norway. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in two versions: A downloadable version in conllx format and a searchable version in the search interface Glossa.
Nordic Dialect Corpus is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spontaneously spoken dialects.
cmdp:description [xml:lang=‘nb’]: NDC-trebanken inneholder 4637 talemålssegment og 66 042 ord/token fra den norske delen av Nordisk dialektkorpus. Segmentene er hentet fra 30 transkriberte intervjuer fra 17 stader i Noreg. Trebanken er annotert med morfologisk og syntaktisk informasjon og manuelt korrigert. Trebanken er tilgjengelig i to versjoner: en nedlastbar versjon i conllx-format og en søkbar i søkegrensesnittet Glossa.
Nordisk dialektkorpus er et talespråkskorpus med spontantale fra norske, svenske, danske, islandske og færøyske dialekter.
cmdp:resourceShortName [xml:lang=‘en’]: The NDC Treebank
cmdp:resourceShortName [xml:lang=‘nb’]: NDC-trebanken
cmdp:url: http://www.tekstlab.uio.no/scandiasyn/index.html
cmdp:url: http://www.tekstlab.uio.no/nota/scandiasyn/treebank.html
cmdp:PID: https://hdl.handle.net/11538/8493fdd3-a
cmdp:distributionInfo [cmd:ref=‘ndc-treebank’]:
cmdp:licenceInfo [cmd:ref=‘ndc-treebank-conllx’]:
cmdp:userCategory: Public
cmdp:distributionAccessMedium: downloadable
cmdp:downloadLocation: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-BokmaalNDC
cmdp:licence:
cmdp:licenceFamily: Creative Commons (CC)
cmdp:licenceName: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
cmdp:conditionsOfUse: BY
cmdp:conditionsOfUse: NC
cmdp:conditionsOfUse: SA
cmdp:licensor [cmd:ref=‘ndc-treebank-conllx’]:
cmdp:actorInfo:
cmdp:actorType: organization
cmdp:organizationInfo:
cmdp:organizationName [xml:lang=‘en’]: University of Oslo
cmdp:organizationName [xml:lang=‘nb’]: Universitetet i Oslo
cmdp:organizationShortName [xml:lang=‘en’]: UoO
cmdp:organizationShortName [xml:lang=‘nb’]: UiO
cmdp:departmentName [xml:lang=‘nb’]: Institutt for lingvistiske og nordiske studier (ILN)
cmdp:departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
cmdp:communicationInfo:
cmdp:email: tekstlab-post@iln.uio.no
cmdp:url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
cmdp:address: Box 1102 Blindern
cmdp:zipCode: 0317
cmdp:city: Oslo
cmdp:country: Norway
cmdp:distributionRightsHolder [cmd:ref=‘ndc-treebank-conllx’]:
cmdp:actorInfo:
cmdp:actorType: organization
cmdp:organizationInfo:
cmdp:organizationName [xml:lang=‘en’]: University of Oslo
cmdp:organizationName [xml:lang=‘nb’]: Universitetet i Oslo
cmdp:organizationShortName [xml:lang=‘nb’]: UiO
cmdp:organizationShortName [xml:lang=‘en’]: UoO
cmdp:departmentName [xml:lang=‘nb’]: Institutt for lingvistiske og nordiske studier (ILN)
cmdp:departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
cmdp:communicationInfo:
cmdp:email: tekstlab-post@iln.uio.no
cmdp:url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
cmdp:address: Box 1102 Blindern
cmdp:zipCode: 0317
cmdp:city: OSLO
cmdp:country: Norway
cmdp:licenceInfo [cmd:ref=‘ndc-treebank-glossa’]:
cmdp:userCategory: Academic
cmdp:distributionAccessMedium: accessibleThroughInterface
cmdp:executionLocation: https://tekstlab.uio.no/glossa3/ndctrebanken
cmdp:licence:
cmdp:licenceFamily: CLARIN
cmdp:licenceName: CLARIN_ACA-NC-LOC-PRIV-ND-*
cmdp:licenceURL: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca? ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
cmdp:conditionsOfUse: BY
cmdp:conditionsOfUse: ID
cmdp:conditionsOfUse: LOC
cmdp:conditionsOfUse: NC
cmdp:conditionsOfUse: ND
cmdp:conditionsOfUse: NORED
cmdp:conditionsOfUse: PRIV
cmdp:nonStandardConditionsOfUse: The treebank has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the treebank linked to audio and video files is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory.
Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
cmdp:licensor [cmd:ref=‘ndc-treebank-glossa’]:
cmdp:actorInfo:
cmdp:actorType: organization
cmdp:organizationInfo:
cmdp:organizationName [xml:lang=‘en’]: University of Oslo
cmdp:organizationName [xml:lang=‘nb’]: Universitetet i Oslo
cmdp:organizationShortName [xml:lang=‘nb’]: UiO
cmdp:organizationShortName [xml:lang=‘en’]: UoO
cmdp:departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
cmdp:departmentName [xml:lang=‘nb’]: Institutt for lingvistiske og nordiske studier (ILN)
cmdp:communicationInfo:
cmdp:email: tekstlab-post@iln.uio.no
cmdp:url: http://www.hf.uio.no/iln/english/
cmdp:address: Box 1102 Blindern
cmdp:zipCode: 0317
cmdp:city: OSLO
cmdp:country: Norway
cmdp:contact [cmd:ref=‘ndc-treebank’]:
cmdp:actorInfo:
cmdp:actorType: organization
cmdp:organizationInfo:
cmdp:organizationName: The Text Laboratory
cmdp:organizationShortName: Textlab
cmdp:departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
cmdp:communicationInfo:
cmdp:email: tekstlab-post@iln.uio.no
cmdp:url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
cmdp:address: Box 1102 Blindern
cmdp:zipCode: 0317
cmdp:city: Oslo
cmdp:country: Norway
cmdp:metadataInfo:
cmdp:metadataCreationDate: 2022-11-16
cmdp:metadataLastDateUpdated: 2026-01-23
cmdp:metadataCreator:
cmdp:actorInfo:
cmdp:actorType: person
cmdp:personInfo:
cmdp:surname: Hagen
cmdp:givenName: Kristin
cmdp:organizationInfo:
cmdp:organizationName: The Text Laboratory
cmdp:organizationShortName: Textlab
cmdp:departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
cmdp:communicationInfo:
cmdp:email: kristin.hagen@iln.uio.no
cmdp:url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
cmdp:address: Box 1102 Blindern
cmdp:zipCode: 0317
cmdp:city: OSLO
cmdp:country: Norway
cmdp:versionInfo:
cmdp:version: conllx and glossa version November 2022
cmdp:validationInfo:
cmdp:validated: true
cmdp:validationType: content
cmdp:validationMode: manual
cmdp:validationModeDetails: The treebank is manually corrected by at least one person
cmdp:validationExtent: partial
cmdp:resourceDocumentationInfo:
cmdp:documentationUnstructured:
cmdp:role: documentation
cmdp:documentUnstructured: http://www.tekstlab.uio.no/nota/scandiasyn/treebank.html
cmdp:documentationStructured:
cmdp:role: documentation
cmdp:documentInfo:
cmdp:documentType: proceedings
cmdp:title [xml:lang=‘en’]: The Norwegian Dialect Corpus Treebank
cmdp:author: Andre Kåsen and Kristin Hagen and Anders Nøklestad and Joel Priestley and Per Erik Solberg and Dag Trygve Truslew Haug
cmdp:editor: Nicoletta Calzolari et al
cmdp:year: 2022
cmdp:bookTitle: Proceedings of the Thirteenth Language Resources and Evaluation Conference
cmdp:url: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.516.pdf
cmdp:resourceCreationInfo:
cmdp:creationStartDate: 2021-06-01
cmdp:creationEndDate: 2022-12-01
cmdp:resourceCreator:
cmdp:actorInfo:
cmdp:actorType: organization
cmdp:communicationInfo:
cmdp:email: tekstlab-post@iln.uio.no
cmdp:url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
cmdp:address: Box 1102 Blindern
cmdp:zipCode: 0317
cmdp:city: OSLO
cmdp:country: Norway
cmdp:fundingProject:
cmdp:projectInfo:
cmdp:projectName: Common Language Resources and Technology Infrastructure Norway +
cmdp:projectShortName: CLARINO +
cmdp:projectID: 295700
cmdp:url: http://clarin.b.uib.no/
cmdp:fundingType: nationalFunds
cmdp:funder: the Research Council of Norway
cmdp:fundingCountry: Norway
cmdp:projectStartDate: 2020-03-01
cmdp:projectEndDate: 2023-12-31
cmdp:corpusInfo [cmd:ref=‘ndc-treebank’]:
cmdp:corpusType [cmd:cmd=‘ndc-treebank’]: Treebank
cmdp:corpusPartInfo [cmd:ref=‘ndc-treebank’]:
cmdp:mediaType: text
cmdp:corpusTextInfo:
cmdp:textFormatInfo [cmd:ref=‘ndc-treebank-conllx’]:
cmdp:mimeType: Downloadable in conllx-format
cmdp:sizePerTextFormat:
cmdp:sizeInfo [cmd:ref=‘ndc-treebank-conllx’]:
cmdp:size: 66 042
cmdp:sizeUnit: tokens
cmdp:sizeInfo [cmd:ref=‘ndc-treebank-conllx’]:
cmdp:size: 4637 speech segments
cmdp:sizeUnit: utterances
cmdp:characterEncodingInfo:
cmdp:characterEncoding: utf-8
cmdp:corpusPartInfo [cmd:ref=‘ndc-treebank-glossa’]:
cmdp:mediaType: audio
cmdp:corpusAudioInfo:
cmdp:audioSizeInfo:
cmdp:sizeInfo:
cmdp:size: 30 wav files
cmdp:sizeUnit: files
cmdp:settingInfo:
cmdp:naturality: spontaneous
cmdp:conversationalType: dialogue
cmdp:audience: few
cmdp:interactivity: overlapping
cmdp:interaction: Semiformal or informal interviews with one interviewer.
cmdp:audioFormatInfo:
cmdp:mimeType: wav and mp3
cmdp:compressionInfo:
cmdp:compression: true
cmdp:compressionName: mp3
cmdp:corpusPartInfo [cmd:ref=‘ndc-treebank-glossa’]:
cmdp:mediaType: video
cmdp:corpusVideoInfo:
cmdp:videoContentInfo:
cmdp:typeOfVideoContent: Semiformal or informal interviews with one interviewer.
cmdp:settingInfo:
cmdp:naturality: spontaneous
cmdp:conversationalType: dialogue
cmdp:audience: few
cmdp:interactivity: overlapping
cmdp:interaction: Semiformal or informal interviews with one interviewer.
cmdp:videoFormatInfo:
cmdp:mimeType: mp4
cmdp:compressionInfo:
cmdp:compression: true
cmdp:compressionName: mpg
cmdp:corpusPartGeneralInfo [cmd:ref=‘ndc-treebank’]:
cmdp:personSourceSetInfo [cmd:ref=‘ndc-treebank’]:
cmdp:ageOfPersons: teenager
cmdp:ageOfPersons: adult
cmdp:ageOfPersons: elderly
cmdp:ageRangeStart: 14
cmdp:ageRangeEnd: 91
cmdp:sexOfPersons: mixed
cmdp:originOfPersons: native
cmdp:dialectAccentOfPersons: Dialects from 17 places in Norway
cmdp:geographicDistributionOfPersons: All over Norway
cmdp:lingualityInfo:
cmdp:lingualityType: monolingual
cmdp:languageInfo:
cmdp:languageId: No
cmdp:languageName: Norwegian
cmdp:languageInfo:
cmdp:languageId: Nb
cmdp:languageName: Norwegian Bokmål
cmdp:modalityInfo:
cmdp:modalityType: spokenLanguage
cmdp:modalityTypeDetails: Norwegian dialects. Orthographic transcription
cmdp:sizeInfo [cmd:ref=‘ndc-treebank-glossa’]:
cmdp:size: 66 042
cmdp:sizeUnit: tokens
cmdp:sizeInfo [cmd:ref=‘ndc-treebank-glossa’]:
cmdp:size: 4637 speech segments
cmdp:sizeUnit: utterances
cmdp:annotationInfo:
cmdp:annotationType: speechAnnotation-orthographicTranscription
cmdp:annotationType: morphosyntacticAnnotation-posTagging
cmdp:annotationType: syntacticAnnotation-treebanks
cmdp:annotationDescription: Original version in conllx-format, annotated with morphological and dependency-style syntactic analysis.
cmdp:annotationManualStructured:
cmdp:role: annotationManual
cmdp:documentInfo:
cmdp:documentType: manual
cmdp:title [xml:lang=‘en’]: NDT Guidelines for Morphological and Syntactic Annotation
cmdp:author: Kari Kinn, Per Erik Solberg og Pål Kristian Eriksen.
Translated from Norwegian to English by Per Erik Solberg
cmdp:year: 2013
cmdp:url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/
cmdp:annotationManualStructured:
cmdp:role: annotationManual
cmdp:documentInfo:
cmdp:documentType: manual
cmdp:title [xml:lang=‘bm’]: Retningslinjer for syntaktisk annotasjon i LIA
cmdp:author: Andre Kåsen, Kristin Hagen, Lilja Øvrelid, Signe Laake, Håvard Østli
cmdp:year: 2019
cmdp:url: http://tekstlab.uio.no/LIA/pdf/parseretningslinjer-lia12042019.pdf
cmdp:annotationTool:
cmdp:targetResourceNameURI: ConlluEditor.
cmdp:annotationTool:
cmdp:targetResourceNameURI: https://aclanthology.org/W19-8010/
cmdp:classificationInfo:
cmdp:genreInfo:
cmdp:genreType: speechGenre
cmdp:genre: informal
cmdp:unstandardisedGenre: informal interviews
cmdp:classificationInfo:
cmdp:genreInfo:
cmdp:genreType: speechGenre
cmdp:genre: semi formal
cmdp:unstandardisedGenre: semi formal interviews
cmdp:timeCoverageInfo:
cmdp:timeCoverage: 2007 - 2010
cmdp:geographicCoverageInfo:
cmdp:geographicCoverage: 17 places from all over Norway