CMDI 1.2 Metadata
Header
cmd:MdCreator: Kristin Hagen
cmd:MdCreationDate: 2025-01-09
cmd:MdSelfLink:
cmd:MdProfile: clarin.eu:cr1:p_1422885449331
cmd:MdCollectionDisplayName: Clarino - Textlab
Resources
cmd:ResourceProxyList:
cmd:ResourceProxy [id=‘humit-tagger’]:
cmd:ResourceType [mimetype=‘’]: LandingPage
cmd:ResourceRef: https://www.hf.uio.no/humit/english/resources/humit-tagger/index.html
cmd:ResourceProxy [id=‘online’]:
cmd:ResourceType [mimetype=‘’]: Resource
cmd:ResourceRef: https://tekstlab.uio.no/humtag_nett/
cmd:ResourceProxy [id=‘null’]:
cmd:ResourceType: Resource
cmd:ResourceRef: https://github.com/humit-oslo/humit-tagger
cmd:JournalFileProxyList:
cmd:ResourceRelationList:
Components
cmdp:toolProfile:
cmdp:resourceCommonInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485126’] [cmd:ref=‘humit-tagger’]:
cmdp:resourceType [cmd:ref=‘obt’]: toolService
cmdp:identificationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485125’] [cmd:ref=‘humit-tagger’]:
cmdp:resourceName [cmd:ref=‘obt’] [xml:lang=‘en’]: The Humit Tagger
cmdp:resourceName [cmd:ref=‘obt’] [xml:lang=‘no’]: Humit-taggeren
cmdp:description [cmd:ref=‘obt’] [xml:lang=‘en’]: The Humit Tagger is a morphological AI tagger for Norwegian Bokmål and Nynorsk developed at Humit, University of Oslo.
The tagger is based on a neural network, more precisely a pre-trained BERT model for Norwegian, developed by the National Library of Norway. The tagger is a so-called sequence classifier, which selects morphological tags but not lemmas.
In this first version of the Humit Tagger, the full-form word list from Norsk ordbank is used as a basis for lemma selection.
cmdp:resourceShortName [cmd:ref=‘obt’]: humit-tagger
cmdp:url [cmd:ref=‘obt’]: https://www.hf.uio.no/humit/english/resources/humit-tagger/index.html
cmdp:distributionInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485124’] [cmd:ref=‘humit-tagger’]:
cmdp:licenceInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485158’] [cmd:ref=‘humit-tagger’]:
cmdp:distributionAccessMedium: Downloadable
cmdp:downloadLocation: https://github.com/humit-oslo/humit-tagger
cmdp:executionLocation: https://tekstlab.uio.no/humtag_nett/
cmdp:licence [cmd:ComponentRef=‘clarin.eu:cr1:c_1447674760330’]:
cmdp:licenceFamily: MIT
cmdp:licenceName: MIT
cmdp:licenceURL: http://en.wikipedia.org/wiki/MIT_License
cmdp:conditionsOfUse: BY
cmdp:licensor:
cmdp:actorInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485194’]:
cmdp:actorType: organization
cmdp:organizationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1407745711883’]:
cmdp:organizationName [xml:lang=‘en’]: University of Oslo
cmdp:organizationName [xml:lang=‘no’]: Universitetet i Oslo
cmdp:organizationShortName [xml:lang=‘no’]: UiO
cmdp:organizationShortName [xml:lang=‘en’]: UoO
cmdp:departmentName [xml:lang=‘no’]: Humit – senter for digital utvikling på HF
cmdp:departmentName [xml:lang=‘en’]: Humit – Centre for digital development at HF
cmdp:communicationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1352813745460’]:
cmdp:email: humit@hf.uio.no
cmdp:url: https://www.hf.uio.no/humit/english/
cmdp:city: OSLO
cmdp:country: Norway
cmdp:distributionRightsHolder:
cmdp:actorInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485194’]:
cmdp:actorType: organization
cmdp:organizationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1407745711883’]:
cmdp:organizationName [xml:lang=‘en’]: University of Oslo
cmdp:organizationName [xml:lang=‘no’]: Universitetet i Oslo
cmdp:organizationShortName [xml:lang=‘no’]: UiO
cmdp:organizationShortName [xml:lang=‘en’]: UoO
cmdp:departmentName [xml:lang=‘en’]: Humit – Centre for digital development at HF
cmdp:departmentName [xml:lang=‘no’]: Humit – senter for digital utvikling på HF
cmdp:communicationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1352813745460’]:
cmdp:email: humit@hf.uio.no
cmdp:url: https://www.hf.uio.no/humit/english/
cmdp:city: Oslo
cmdp:country: Norway
cmdp:iprHolder:
cmdp:actorInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485194’] [cmd:ref=‘cg’]:
cmdp:actorType [cmd:ref=‘cg’]: organization
cmdp:organizationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1407745711883’]:
cmdp:organizationName: Humit – Centre for digital development at HF
cmdp:organizationShortName: Humit
cmdp:actorInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485194’]:
cmdp:contact [cmd:ref=‘humit-tagger’]:
cmdp:actorInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485194’] [cmd:ref=‘humit-tagger’]:
cmdp:actorType: person
cmdp:organizationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1407745711883’]:
cmdp:organizationName: Humit – Centre for digital development at HF
cmdp:organizationShortName: Humit
cmdp:communicationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1352813745460’]:
cmdp:email: humit@hf.uio.no
cmdp:url: https://www.hf.uio.no/humit/english/index.html
cmdp:city: OSLO
cmdp:country: Norway
cmdp:metadataInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1407745711922’] [cmd:ref=‘humit-tagger’]:
cmdp:metadataCreationDate: 2025-01-10
cmdp:metadataCreator [cmd:ref=‘humit-tagger’]:
cmdp:actorInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485194’]:
cmdp:actorType: person
cmdp:personInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485192’]:
cmdp:surname: Hagen
cmdp:givenName: Kristin
cmdp:organizationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1407745711883’]:
cmdp:organizationName: Humit – Centre for digital development at HF
cmdp:organizationShortName: Humit
cmdp:communicationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1352813745460’]:
cmdp:email: kristin.hagen@iln.uio.no
cmdp:url: https://www.hf.uio.no/humit/english/
cmdp:city: OSLO
cmdp:country: Norway
cmdp:versionInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1430905751648’] [cmd:ref=‘humit-tagger’]:
cmdp:version [cmd:ref=‘obt’]: First version
cmdp:validationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1407745711923’] [cmd:ref=‘humit-tagger’]:
cmdp:validated: true
cmdp:validationModeDetails [cmd:ref=‘cg’]: So far, the tagger has only been evaluated on a test part of the Norwegian Dependency Treebank where there is only one correct answer for each word form. The Humit tagger then has an accuracy of 0.98 for tags and 0.99 for lemmas.
cmdp:validationReportUnstructured [cmd:ComponentRef=‘clarin.eu:cr1:c_1353678848789’]:
cmdp:documentUnstructured: See home page
https://www.hf.uio.no/humit/english/resources/humit-tagger/index.html
cmdp:resourceDocumentationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1355150532301’] [cmd:ref=‘humit-tagger’]:
cmdp:documentationUnstructured [cmd:ComponentRef=‘clarin.eu:cr1:c_1355150532302’]:
cmdp:documentUnstructured: See home page
https://www.hf.uio.no/humit/english/resources/humit-tagger/index.html
cmdp:documentationUnstructured [cmd:ComponentRef=‘clarin.eu:cr1:c_1355150532302’]:
cmdp:documentUnstructured: Haug, D. T. T., Yildirim, A., Hagen, K., & Nøklestad, A. (2023). Rules and neural nets for morphological tagging of Norwegian-Results and challenges. NEALT Proceedings Series, 425-435.
cmdp:resourceCreationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1407745711921’] [cmd:ref=‘humit-tagger’]:
cmdp:creationStartDate: 2022
cmdp:creationEndDate: 2024
cmdp:resourceCreator [cmd:ref=‘humit-tagger’]:
cmdp:actorInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1396012485194’]:
cmdp:actorType: organization
cmdp:communicationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1352813745460’]:
cmdp:email: humit@hf.uio.no
cmdp:fundingProject [cmd:ref=‘humit-tagger’]:
cmdp:projectInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1430905751647’]:
cmdp:projectName: Common Language Resources and Technology Infrastructure Norway +
cmdp:projectShortName: CLARINO +
cmdp:projectID: 295700
cmdp:url: http://clarin.b.uib.no/
cmdp:fundingType: nationalFunds
cmdp:funder: the Research Council of Norway
cmdp:fundingCountry: Norway
cmdp:projectStartDate: 2020-03-01
cmdp:projectEndDate: 2023-12-31
cmdp:toolInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1422885449327’]:
cmdp:inputInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1360931019804’]:
cmdp:mediaType: text
cmdp:resourceType: corpus
cmdp:modalityType: writtenLanguage
cmdp:languageName: Norwegian
cmdp:languageName: Norwegian Bokmål
cmdp:languageName: Norwegian Nynorsk
cmdp:languageId: No
cmdp:languageId: Nb
cmdp:languageId: Nn
cmdp:mimeType: txt, xml
cmdp:characterEncoding: utf-8
cmdp:annotationType: lemmatization
cmdp:annotationType: morphosyntacticAnnotation-posTagging
cmdp:tagset: http://www.tekstlab.uio.no/obt-ny/english/tagset.html
cmdp:segmentationLevel: word
cmdp:segmentationLevel: clause
cmdp:outputInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1360931019824’]:
cmdp:mediaType: text
cmdp:resourceType: corpus
cmdp:modalityType: writtenLanguage
cmdp:languageName: Norwegian
cmdp:languageName: Norwegian Bokmål
cmdp:languageName: Norwegian Nynorsk
cmdp:languageId: No
cmdp:languageId: Nb
cmdp:languageId: Nn
cmdp:mimeType: txt, xml
cmdp:characterEncoding: utf-8
cmdp:tagset: http://www.tekstlab.uio.no/obt-ny/english/tagset.html
cmdp:segmentationLevel: clause
cmdp:segmentationLevel: word
cmdp:toolServiceOperationInfo [cmd:ComponentRef=‘clarin.eu:cr1:c_1360931019835’]:
cmdp:operatingSystem: See https://github.com/humit-oslo/humit-tagger