CMDI 1.1 Metadata
Header
MdCreator: Kristin Hagen
MdCreationDate: 2022-11-16
MdProfile: clarin.eu:cr1:p_1407745711925
MdCollectionDisplayName: Clarino - Textlab
Resources
ResourceProxyList:
ResourceProxy [id=‘ndc-lp’]:
ResourceType [mimetype=‘’]: LandingPage
ResourceRef: http://tekstlab.uio.no/nota/scandiasyn/
ResourceProxy [id=‘ndc-treebank’]:
ResourceType [mimetype=‘’]: Resource
ResourceRef: http://tekstlab.uio.no/nota/scandiasyn/treebank.html
ResourceProxy [id=‘ndc-corpus’]:
ResourceType [mimetype=‘’]: Resource
ResourceRef: https://tekstlab.uio.no/glossa3/ndc2
ResourceProxy [id=‘ndc-treebank-conllx’]:
ResourceType [mimetype=‘’]: Resource
ResourceRef: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-BokmaalNDC
ResourceProxy [id=‘ndc-treebank-glossa’]:
ResourceType [mimetype=‘’]: Resource
ResourceRef: https://tekstlab.uio.no/glossa3/ndctrebanken
JournalFileProxyList:
ResourceRelationList:
ResourceRelation:
RelationType: treebank
Res1 [ref=‘ndc-corpus’]:
Res2 [ref=‘ndc-treebank’]:
ResourceRelation:
RelationType: version
Res1 [ref=‘ndc-treebank’]:
Res2 [ref=‘ndc-treebank-conllx’]:
ResourceRelation:
RelationType: version
Res1 [ref=‘ndc-treebank’]:
Res2 [ref=‘ndc-treebank-glossa’]:
IsPartOfList:
Components
corpusProfile:
resourceCommonInfo [ComponentId=‘clarin.eu:cr1:c_1396012485126’] [ref=‘ndc-treebank’]:
resourceType [cmd=‘ndc-treebank’]: corpus
identificationInfo [ComponentId=‘clarin.eu:cr1:c_1396012485125’] [ref=‘ndc-treebank’]:
resourceName [xml:lang=‘nb’]: NDC-trebanken
resourceName [xml:lang=‘en’]: The NDC Treebank
description [xml:lang=‘en’]: The NDC Treebank includes 4637 speech segments and 66 042 tokens from the Norwegian part of Nordic Dialect Corpus. The segments are taken from 30 transcribed interviews from 17 places in Norway. The treebank is annotated with morphological and dependency-style syntactic analysis and manually corrected. The treebank is available in two versions: A downloadable version in conllx format and a searchable version in the search interface Glossa.
Nordic Dialect Corpus is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spontaneously spoken dialects.
description [xml:lang=‘nb’]: NDC-trebanken inneholder 4637 talemålssegment og 66 042 ord/token fra den norske delen av Nordisk dialektkorpus. Segmentene er hentet fra 30 transkriberte intervjuer fra 17 stader i Noreg. Trebanken er annotert med morfologisk og syntaktisk informasjon og manuelt korrigert. Trebanken er tilgjengelig i to versjoner: en nedlastbar versjon i conllx-format og en søkbar i søkegrensesnittet Glossa.
Nordisk dialektkorpus er et talespråkskorpus med spontantale fra norske, svenske, danske, islandske og færøyske dialekter.
resourceShortName [xml:lang=‘en’]: The NDC Treebank
resourceShortName [xml:lang=‘nb’]: NDC-trebanken
url: http://www.tekstlab.uio.no/scandiasyn/index.html
url: http://www.tekstlab.uio.no/nota/scandiasyn/treebank.html
PID: http://hdl.handle.net/11538/0000-0005-E7C7-6
distributionInfo [ComponentId=‘clarin.eu:cr1:c_1396012485124’] [ref=‘ndc-treebank’]:
licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’] [ref=‘ndc-treebank-conllx’]:
userCategory: Public
distributionAccessMedium: downloadable
downloadLocation: https://github.com/textlab/spoken_norwegian_resources/tree/master/treebanks/Norwegian-BokmaalNDC
licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:
licenceFamily: Creative Commons (CC)
licenceName: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
conditionsOfUse: BY
conditionsOfUse: NC
conditionsOfUse: SA
licensor [ref=‘ndc-treebank-conllx’]:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘nb’]: Universitetet i Oslo
organizationShortName [xml:lang=‘en’]: UoO
organizationShortName [xml:lang=‘nb’]: UiO
departmentName [xml:lang=‘nb’]: Institutt for lingvistiske og nordiske studier (ILN)
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: Oslo
country: Norway
distributionRightsHolder [ref=‘ndc-treebank-conllx’]:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘nb’]: Universitetet i Oslo
organizationShortName [xml:lang=‘nb’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘nb’]: Institutt for lingvistiske og nordiske studier (ILN)
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’] [ref=‘ndc-treebank-glossa’]:
userCategory: Academic
distributionAccessMedium: accessibleThroughInterface
executionLocation: https://tekstlab.uio.no/glossa3/ndctrebanken
licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:
licenceFamily: CLARIN
licenceName: CLARIN_ACA-NC-LOC-PRIV-ND-*
licenceURL: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
conditionsOfUse: BY
conditionsOfUse: ID
conditionsOfUse: LOC
conditionsOfUse: NC
conditionsOfUse: ND
conditionsOfUse: NORED
conditionsOfUse: PRIV
nonStandardConditionsOfUse: The treebank has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the treebank linked to audio and video files is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory.
Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
licensor [ref=‘ndc-treebank-glossa’]:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘nb’]: Universitetet i Oslo
organizationShortName [xml:lang=‘nb’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘nb’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/english/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
contact [ref=‘ndc-treebank’]:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: Oslo
country: Norway
metadataInfo [ComponentId=‘clarin.eu:cr1:c_1407745711922’]:
metadataCreationDate: 2022-11-16
metadataLastDateUpdated: 2024-01-04
metadataCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Hagen
givenName: Kristin
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: kristin.hagen@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
versionInfo [ComponentId=‘clarin.eu:cr1:c_1430905751648’]:
version: conllx and glossa version November 2022
validationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711923’]:
validated: true
validationType: content
validationMode: manual
validationModeDetails: The treebank is manually corrected by at least one person
validationExtent: partial
resourceDocumentationInfo [ComponentId=‘clarin.eu:cr1:c_1355150532301’]:
documentationUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532302’]:
role: documentation
documentUnstructured: http://www.tekstlab.uio.no/nota/scandiasyn/treebank.html
documentationStructured [ComponentId=‘clarin.eu:cr1:c_1361876010648’]:
role: documentation
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: proceedings
title [xml:lang=‘en’]: The Norwegian Dialect Corpus Treebank
author: Andre Kåsen and Kristin Hagen and Anders Nøklestad and Joel Priestley and Per Erik Solberg and Dag Trygve Truslew Haug
editor: Nicoletta Calzolari et al
year: 2022
bookTitle: Proceedings of the Thirteenth Language Resources and Evaluation Conference
url: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.516.pdf
resourceCreationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711921’]:
creationStartDate: 2021-06-01
creationEndDate: 2022-12-01
resourceCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName: Common Language Resources and Technology Infrastructure Norway +
projectShortName: CLARINO +
projectID: 295700
url: http://clarin.b.uib.no/
fundingType: nationalFunds
funder: the Research Council of Norway
fundingCountry: Norway
projectStartDate: 2020-03-01
projectEndDate: 2023-12-31
corpusInfo [ComponentId=‘clarin.eu:cr1:c_1407745711878’] [ref=‘ndc-treebank’]:
corpusType [cmd=‘ndc-treebank’]: Treebank
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’] [ref=‘ndc-treebank’]:
mediaType: text
corpusTextInfo [ComponentId=‘clarin.eu:cr1:c_1396012485188’]:
textFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477072’] [ref=‘ndc-treebank-conllx’]:
mimeType: Downloadable in conllx-format
sizePerTextFormat [ComponentId=‘clarin.eu:cr1:c_1447674760342’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’] [ref=‘ndc-treebank-conllx’]:
size: 66 042
sizeUnit: tokens
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’] [ref=‘ndc-treebank-conllx’]:
size: 4637 speech segments
sizeUnit: utterances
characterEncodingInfo [ComponentId=‘clarin.eu:cr1:c_1447674760355’]:
characterEncoding: utf-8
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’] [ref=‘ndc-treebank-glossa’]:
mediaType: audio
corpusAudioInfo [ComponentId=‘clarin.eu:cr1:c_1404130561236’]:
audioSizeInfo [ComponentId=‘clarin.eu:cr1:c_1360230992160’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 30 wav files
sizeUnit: files
settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:
naturality: spontaneous
conversationalType: dialogue
audience: few
interactivity: overlapping
interaction: Semiformal or informal interviews with one interviewer.
audioFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477070’]:
mimeType: wav and mp3
compressionInfo [ComponentId=‘clarin.eu:cr1:c_1360230992165’]:
compression: true
compressionName: mp3
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’] [ref=‘ndc-treebank-glossa’]:
mediaType: video
corpusVideoInfo [ComponentId=‘clarin.eu:cr1:c_1407745711880’]:
videoContentInfo [ComponentId=‘clarin.eu:cr1:c_1360931019779’]:
typeOfVideoContent: Semiformal or informal interviews with one interviewer.
settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:
naturality: spontaneous
conversationalType: dialogue
audience: few
interactivity: overlapping
interaction: Semiformal or informal interviews with one interviewer.
videoFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477073’]:
mimeType: mp4
compressionInfo [ComponentId=‘clarin.eu:cr1:c_1360230992165’]:
compression: true
compressionName: mpg
corpusPartGeneralInfo [ComponentId=‘clarin.eu:cr1:c_1407745711882’] [ref=‘ndc-treebank’]:
personSourceSetInfo [ComponentId=‘clarin.eu:cr1:c_1360931019775’] [ref=‘ndc-treebank’]:
ageOfPersons: teenager
ageOfPersons: adult
ageOfPersons: elderly
ageRangeStart: 14
ageRangeEnd: 91
sexOfPersons: mixed
originOfPersons: native
dialectAccentOfPersons: Dialects from 17 places in Norway
geographicDistributionOfPersons: All over Norway
lingualityInfo [ComponentId=‘clarin.eu:cr1:c_1355150532313’]:
lingualityType: monolingual
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: No
languageName: Norwegian
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: Nb
languageName: Norwegian Bokmål
modalityInfo [ComponentId=‘clarin.eu:cr1:c_1447674760356’]:
modalityType: spokenLanguage
modalityTypeDetails: Norwegian dialects. Orthographic transcription
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’] [ref=‘ndc-treebank-glossa’]:
size: 66 042
sizeUnit: tokens
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’] [ref=‘ndc-treebank-glossa’]:
size: 4637 speech segments
sizeUnit: utterances
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: speechAnnotation-orthographicTranscription
annotationType: morphosyntacticAnnotation-posTagging
annotationType: syntacticAnnotation-treebanks
annotationDescription: Original version in conllx-format, annotated with morphological and dependency-style syntactic analysis.
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: manual
title [xml:lang=‘en’]: NDT Guidelines for Morphological and Syntactic Annotation
author: Kari Kinn, Per Erik Solberg og Pål Kristian Eriksen.
Translated from Norwegian to English by Per Erik Solberg
year: 2013
url: https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-10/
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: manual
title [xml:lang=‘bm’]: Retningslinjer for syntaktisk annotasjon i LIA
author: Andre Kåsen, Kristin Hagen, Lilja Øvrelid, Signe Laake, Håvard Østli
year: 2019
url: http://tekstlab.uio.no/LIA/pdf/parseretningslinjer-lia12042019.pdf
annotationTool [ComponentId=‘clarin.eu:cr1:c_1355150532326’]:
targetResourceNameURI: ConlluEditor.
annotationTool [ComponentId=‘clarin.eu:cr1:c_1355150532326’]:
targetResourceNameURI: https://aclanthology.org/W19-8010/
classificationInfo [ComponentId=‘clarin.eu:cr1:c_1403588862809’]:
genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:
genreType: speechGenre
genre: informal
unstandardisedGenre: informal interviews
classificationInfo [ComponentId=‘clarin.eu:cr1:c_1403588862809’]:
genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:
genreType: speechGenre
genre: semi formal
unstandardisedGenre: semi formal interviews
timeCoverageInfo [ComponentId=‘clarin.eu:cr1:c_1447674760358’]:
timeCoverage: 2007 - 2010
geographicCoverageInfo [ComponentId=‘clarin.eu:cr1:c_1447674760357’]:
geographicCoverage: 17 places from all over Norway