CMDI 1.2 Metadata

Header

cmd:MdCreator: Kristin Hagen

cmd:MdCreationDate: 2022-11-16

cmd:MdSelfLink:

cmd:MdProfile: clarin.eu:cr1:p_1407745711925

cmd:MdCollectionDisplayName: Clarino - Textlab

Resources

cmd:ResourceProxyList:

cmd:ResourceProxy [id=‘ndc-lp’]:

cmd:ResourceType [mimetype=‘’]: LandingPage

cmd:ResourceRef: http://tekstlab.uio.no/nota/scandiasyn/

cmd:ResourceProxy [id=‘ndc-treebank’]:

cmd:ResourceType [mimetype=‘’]: Resource

cmd:ResourceRef: http://tekstlab.uio.no/nota/scandiasyn/treebank.html

cmd:ResourceProxy [id=‘ndc-corpus’]:

cmd:ResourceType [mimetype=‘’]: Resource

cmd:ResourceRef: https://tekstlab.uio.no/glossa3/ndc2

cmd:ResourceProxy [id=‘ndc-treebank-conllx’]:

cmd:ResourceType [mimetype=‘’]: Resource

cmd:ResourceRef: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-BokmaalNDC

cmd:ResourceProxy [id=‘ndc-treebank-glossa’]:

cmd:ResourceType [mimetype=‘’]: Resource

cmd:ResourceRef: https://tekstlab.uio.no/glossa3/ndctrebanken

cmd:JournalFileProxyList:

cmd:ResourceRelationList:

cmd:ResourceRelation:

cmd:RelationType: treebank

cmd:Resource:

cmd:Role:

cmd:Resource:

cmd:Role:

cmd:ResourceRelation:

cmd:RelationType: version

cmd:Resource:

cmd:Role:

cmd:Resource:

cmd:Role:

cmd:ResourceRelation:

cmd:RelationType: version

cmd:Resource:

cmd:Role:

cmd:Resource:

cmd:Role:

Components

cmdp:corpusProfile:

cmdp:resourceCommonInfo [cmd:ref=‘ndc-treebank’]:

cmdp:resourceType [cmd:cmd=‘ndc-treebank’]: corpus

cmdp:identificationInfo [cmd:ref=‘ndc-treebank’]:

cmdp:resourceName [xml:lang=‘nb’]: NDC-trebanken

cmdp:resourceName [xml:lang=‘en’]: The NDC Treebank

cmdp:description [xml:lang=‘en’]: The NDC Treebank includes 4637 speech segments and 66 042 tokens from the Norwegian part of Nordic Dialect Corpus. The segments are taken from 30 transcribed interviews from 17 places in Norway. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in two versions: A downloadable version in conllx format and a searchable version in the search interface Glossa.
Nordic Dialect Corpus is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spontaneously spoken dialects.

cmdp:description [xml:lang=‘nb’]: NDC-trebanken inneholder 4637 talemålssegment og 66 042 ord/token fra den norske delen av Nordisk dialektkorpus. Segmentene er hentet fra 30 transkriberte intervjuer fra 17 stader i Noreg. Trebanken er annotert med morfologisk og syntaktisk informasjon og manuelt korrigert. Trebanken er tilgjengelig i to versjoner: en nedlastbar versjon i conllx-format og en søkbar i søkegrensesnittet Glossa.
Nordisk dialektkorpus er et talespråkskorpus med spontantale fra norske, svenske, danske, islandske og færøyske dialekter.

cmdp:resourceShortName [xml:lang=‘en’]: The NDC Treebank

cmdp:resourceShortName [xml:lang=‘nb’]: NDC-trebanken

cmdp:url: http://www.tekstlab.uio.no/scandiasyn/index.html

cmdp:url: http://www.tekstlab.uio.no/nota/scandiasyn/treebank.html

cmdp:PID: https://hdl.handle.net/11538/8493fdd3-a

cmdp:distributionInfo [cmd:ref=‘ndc-treebank’]:

cmdp:licenceInfo [cmd:ref=‘ndc-treebank-conllx’]:

cmdp:userCategory: Public

cmdp:distributionAccessMedium: downloadable

cmdp:downloadLocation: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-BokmaalNDC

cmdp:licence:

cmdp:licenceFamily: Creative Commons (CC)

cmdp:licenceName: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)

cmdp:conditionsOfUse: BY

cmdp:conditionsOfUse: NC

cmdp:conditionsOfUse: SA

cmdp:licensor [cmd:ref=‘ndc-treebank-conllx’]:

cmdp:actorInfo:

cmdp:actorType: organization

cmdp:organizationInfo:

cmdp:organizationName [xml:lang=‘en’]: University of Oslo

cmdp:organizationName [xml:lang=‘nb’]: Universitetet i Oslo

cmdp:organizationShortName [xml:lang=‘en’]: UoO

cmdp:organizationShortName [xml:lang=‘nb’]: UiO

cmdp:departmentName [xml:lang=‘nb’]: Institutt for lingvistiske og nordiske studier (ILN)

cmdp:departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies

cmdp:communicationInfo:

cmdp:email: tekstlab-post@iln.uio.no

cmdp:url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/

cmdp:address: Box 1102 Blindern

cmdp:zipCode: 0317

cmdp:city: Oslo

cmdp:country: Norway

cmdp:distributionRightsHolder [cmd:ref=‘ndc-treebank-conllx’]:

cmdp:actorInfo:

cmdp:actorType: organization

cmdp:organizationInfo:

cmdp:organizationName [xml:lang=‘en’]: University of Oslo

cmdp:organizationName [xml:lang=‘nb’]: Universitetet i Oslo

cmdp:organizationShortName [xml:lang=‘nb’]: UiO

cmdp:organizationShortName [xml:lang=‘en’]: UoO

cmdp:departmentName [xml:lang=‘nb’]: Institutt for lingvistiske og nordiske studier (ILN)

cmdp:departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies

cmdp:communicationInfo:

cmdp:email: tekstlab-post@iln.uio.no

cmdp:url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/

cmdp:address: Box 1102 Blindern

cmdp:zipCode: 0317

cmdp:city: OSLO

cmdp:country: Norway

cmdp:licenceInfo [cmd:ref=‘ndc-treebank-glossa’]:

cmdp:userCategory: Academic

cmdp:distributionAccessMedium: accessibleThroughInterface

cmdp:executionLocation: https://tekstlab.uio.no/glossa3/ndctrebanken

cmdp:licence:

cmdp:licenceFamily: CLARIN

cmdp:licenceName: CLARIN_ACA-NC-LOC-PRIV-ND-*

cmdp:licenceURL: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca? ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1

cmdp:conditionsOfUse: BY

cmdp:conditionsOfUse: ID

cmdp:conditionsOfUse: LOC

cmdp:conditionsOfUse: NC

cmdp:conditionsOfUse: ND

cmdp:conditionsOfUse: NORED

cmdp:conditionsOfUse: PRIV

cmdp:nonStandardConditionsOfUse: The treebank has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the treebank linked to audio and video files is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory.
Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.

cmdp:licensor [cmd:ref=‘ndc-treebank-glossa’]:

cmdp:actorInfo:

cmdp:actorType: organization

cmdp:organizationInfo:

cmdp:organizationName [xml:lang=‘en’]: University of Oslo

cmdp:organizationName [xml:lang=‘nb’]: Universitetet i Oslo

cmdp:organizationShortName [xml:lang=‘nb’]: UiO

cmdp:organizationShortName [xml:lang=‘en’]: UoO

cmdp:departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies

cmdp:departmentName [xml:lang=‘nb’]: Institutt for lingvistiske og nordiske studier (ILN)

cmdp:communicationInfo:

cmdp:email: tekstlab-post@iln.uio.no

cmdp:url: http://www.hf.uio.no/iln/english/

cmdp:address: Box 1102 Blindern

cmdp:zipCode: 0317

cmdp:city: OSLO

cmdp:country: Norway

cmdp:contact [cmd:ref=‘ndc-treebank’]:

cmdp:actorInfo:

cmdp:actorType: organization

cmdp:organizationInfo:

cmdp:organizationName: The Text Laboratory

cmdp:organizationShortName: Textlab

cmdp:departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo

cmdp:communicationInfo:

cmdp:email: tekstlab-post@iln.uio.no

cmdp:url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/

cmdp:address: Box 1102 Blindern

cmdp:zipCode: 0317

cmdp:city: Oslo

cmdp:country: Norway

cmdp:metadataInfo:

cmdp:metadataCreationDate: 2022-11-16

cmdp:metadataLastDateUpdated: 2026-01-23

cmdp:metadataCreator:

cmdp:actorInfo:

cmdp:actorType: person

cmdp:personInfo:

cmdp:surname: Hagen

cmdp:givenName: Kristin

cmdp:organizationInfo:

cmdp:organizationName: The Text Laboratory

cmdp:organizationShortName: Textlab

cmdp:departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo

cmdp:communicationInfo:

cmdp:email: kristin.hagen@iln.uio.no

cmdp:url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/

cmdp:address: Box 1102 Blindern

cmdp:zipCode: 0317

cmdp:city: OSLO

cmdp:country: Norway

cmdp:versionInfo:

cmdp:version: conllx and glossa version November 2022

cmdp:validationInfo:

cmdp:validated: true

cmdp:validationType: content

cmdp:validationMode: manual

cmdp:validationModeDetails: The treebank is manually corrected by at least one person

cmdp:validationExtent: partial

cmdp:resourceDocumentationInfo:

cmdp:documentationUnstructured:

cmdp:role: documentation

cmdp:documentUnstructured: http://www.tekstlab.uio.no/nota/scandiasyn/treebank.html

cmdp:documentationStructured:

cmdp:role: documentation

cmdp:documentInfo:

cmdp:documentType: proceedings

cmdp:title [xml:lang=‘en’]: The Norwegian Dialect Corpus Treebank

cmdp:author: Andre Kåsen and Kristin Hagen and Anders Nøklestad and Joel Priestley and Per Erik Solberg and Dag Trygve Truslew Haug

cmdp:editor: Nicoletta Calzolari et al

cmdp:year: 2022

cmdp:bookTitle: Proceedings of the Thirteenth Language Resources and Evaluation Conference

cmdp:url: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.516.pdf

cmdp:resourceCreationInfo:

cmdp:creationStartDate: 2021-06-01

cmdp:creationEndDate: 2022-12-01

cmdp:resourceCreator:

cmdp:actorInfo:

cmdp:actorType: organization

cmdp:communicationInfo:

cmdp:email: tekstlab-post@iln.uio.no

cmdp:url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/

cmdp:address: Box 1102 Blindern

cmdp:zipCode: 0317

cmdp:city: OSLO

cmdp:country: Norway

cmdp:fundingProject:

cmdp:projectInfo:

cmdp:projectName: Common Language Resources and Technology Infrastructure Norway +

cmdp:projectShortName: CLARINO +

cmdp:projectID: 295700

cmdp:url: http://clarin.b.uib.no/

cmdp:fundingType: nationalFunds

cmdp:funder: the Research Council of Norway

cmdp:fundingCountry: Norway

cmdp:projectStartDate: 2020-03-01

cmdp:projectEndDate: 2023-12-31

cmdp:corpusInfo [cmd:ref=‘ndc-treebank’]:

cmdp:corpusType [cmd:cmd=‘ndc-treebank’]: Treebank

cmdp:corpusPartInfo [cmd:ref=‘ndc-treebank’]:

cmdp:mediaType: text

cmdp:corpusTextInfo:

cmdp:textFormatInfo [cmd:ref=‘ndc-treebank-conllx’]:

cmdp:mimeType: Downloadable in conllx-format

cmdp:sizePerTextFormat:

cmdp:sizeInfo [cmd:ref=‘ndc-treebank-conllx’]:

cmdp:size: 66 042

cmdp:sizeUnit: tokens

cmdp:sizeInfo [cmd:ref=‘ndc-treebank-conllx’]:

cmdp:size: 4637 speech segments

cmdp:sizeUnit: utterances

cmdp:characterEncodingInfo:

cmdp:characterEncoding: utf-8

cmdp:corpusPartInfo [cmd:ref=‘ndc-treebank-glossa’]:

cmdp:mediaType: audio

cmdp:corpusAudioInfo:

cmdp:audioSizeInfo:

cmdp:sizeInfo:

cmdp:size: 30 wav files

cmdp:sizeUnit: files

cmdp:settingInfo:

cmdp:naturality: spontaneous

cmdp:conversationalType: dialogue

cmdp:audience: few

cmdp:interactivity: overlapping

cmdp:interaction: Semiformal or informal interviews with one interviewer.

cmdp:audioFormatInfo:

cmdp:mimeType: wav and mp3

cmdp:compressionInfo:

cmdp:compression: true

cmdp:compressionName: mp3

cmdp:corpusPartInfo [cmd:ref=‘ndc-treebank-glossa’]:

cmdp:mediaType: video

cmdp:corpusVideoInfo:

cmdp:videoContentInfo:

cmdp:typeOfVideoContent: Semiformal or informal interviews with one interviewer.

cmdp:settingInfo:

cmdp:naturality: spontaneous

cmdp:conversationalType: dialogue

cmdp:audience: few

cmdp:interactivity: overlapping

cmdp:interaction: Semiformal or informal interviews with one interviewer.

cmdp:videoFormatInfo:

cmdp:mimeType: mp4

cmdp:compressionInfo:

cmdp:compression: true

cmdp:compressionName: mpg

cmdp:corpusPartGeneralInfo [cmd:ref=‘ndc-treebank’]:

cmdp:personSourceSetInfo [cmd:ref=‘ndc-treebank’]:

cmdp:ageOfPersons: teenager

cmdp:ageOfPersons: adult

cmdp:ageOfPersons: elderly

cmdp:ageRangeStart: 14

cmdp:ageRangeEnd: 91

cmdp:sexOfPersons: mixed

cmdp:originOfPersons: native

cmdp:dialectAccentOfPersons: Dialects from 17 places in Norway

cmdp:geographicDistributionOfPersons: All over Norway

cmdp:lingualityInfo:

cmdp:lingualityType: monolingual

cmdp:languageInfo:

cmdp:languageId: No

cmdp:languageName: Norwegian

cmdp:languageInfo:

cmdp:languageId: Nb

cmdp:languageName: Norwegian Bokmål

cmdp:modalityInfo:

cmdp:modalityType: spokenLanguage

cmdp:modalityTypeDetails: Norwegian dialects. Orthographic transcription

cmdp:sizeInfo [cmd:ref=‘ndc-treebank-glossa’]:

cmdp:size: 66 042

cmdp:sizeUnit: tokens

cmdp:sizeInfo [cmd:ref=‘ndc-treebank-glossa’]:

cmdp:size: 4637 speech segments

cmdp:sizeUnit: utterances

cmdp:annotationInfo:

cmdp:annotationType: speechAnnotation-orthographicTranscription

cmdp:annotationType: morphosyntacticAnnotation-posTagging

cmdp:annotationType: syntacticAnnotation-treebanks

cmdp:annotationDescription: Original version in conllx-format, annotated with morphological and dependency-style syntactic analysis.

cmdp:annotationManualStructured:

cmdp:role: annotationManual

cmdp:documentInfo:

cmdp:documentType: manual

cmdp:title [xml:lang=‘en’]: NDT Guidelines for Morphological and Syntactic Annotation

cmdp:author: Kari Kinn, Per Erik Solberg og Pål Kristian Eriksen.
Translated from Norwegian to English by Per Erik Solberg

cmdp:year: 2013

cmdp:url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/

cmdp:annotationManualStructured:

cmdp:role: annotationManual

cmdp:documentInfo:

cmdp:documentType: manual

cmdp:title [xml:lang=‘bm’]: Retningslinjer for syntaktisk annotasjon i LIA

cmdp:author: Andre Kåsen, Kristin Hagen, Lilja Øvrelid, Signe Laake, Håvard Østli

cmdp:year: 2019

cmdp:url: http://tekstlab.uio.no/LIA/pdf/parseretningslinjer-lia12042019.pdf

cmdp:annotationTool:

cmdp:targetResourceNameURI: ConlluEditor.

cmdp:annotationTool:

cmdp:targetResourceNameURI: https://aclanthology.org/W19-8010/

cmdp:classificationInfo:

cmdp:genreInfo:

cmdp:genreType: speechGenre

cmdp:genre: informal

cmdp:unstandardisedGenre: informal interviews

cmdp:classificationInfo:

cmdp:genreInfo:

cmdp:genreType: speechGenre

cmdp:genre: semi formal

cmdp:unstandardisedGenre: semi formal interviews

cmdp:timeCoverageInfo:

cmdp:timeCoverage: 2007 - 2010

cmdp:geographicCoverageInfo:

cmdp:geographicCoverage: 17 places from all over Norway