CMDI 1.1. Metadata
Header
MdCreator: Kristin Hagen
MdCreationDate: 2015-03-16
MdSelfLink:
MdProfile: clarin.eu:cr1:p_1422885449331
MdCollectionDisplayName: Clarino - Textlab
Resources
ResourceProxyList:
ResourceProxy [id=‘ref1’]:
ResourceType [mimetype=‘’]: LandingPage
ResourceRef: http://www.tekstlab.uio.no/obt-ny/
ResourceProxy [id=‘ref2’]:
ResourceType [mimetype=‘’]: Resource
ResourceRef: https://github.com/noklesta/The-Oslo-Bergen-Tagger/tree/master/cg
ResourceProxy [id=‘ref3’]:
ResourceType [mimetype=‘’]: Resource
ResourceRef: https://github.com/noklesta/The-Oslo-Bergen-Tagger
JournalFileProxyList:
ResourceRelationList:
ResourceRelation:
RelationType: partOf
Res1 [ref=‘ref2’]:
Res2 [ref=‘ref1’]:
ResourceRelation:
RelationType: partOf
Res1 [ref=‘ref3’]:
Res2 [ref=‘ref1’]:
IsPartOfList:
Components
toolProfile:
resourceCommonInfo [ComponentId=‘clarin.eu:cr1:c_1396012485126’]:
resourceType [ref=‘ref1’]: toolService
identificationInfo [ComponentId=‘clarin.eu:cr1:c_1396012485125’] [ref=‘ref1’]:
resourceName [ref=‘ref1’] [xml:lang=‘en’]: The Oslo-Bergen Tagger
description [ref=‘ref1’] [xml:lang=‘en’]: The Oslo-Bergen tagger is a robust morphological and syntactic tagger developed at the University of Oslo and at Uni Computing in Bergen over several years. The tagger consists of three main modules: a preprocessor with multitagger and compound analyser (ref3), a grammar module for morphological and syntactic disambiguation (Constraint Grammar) (ref2) and a statistical module that removes the last of the remaining morphological ambiguity (only for Bokmål). The Constraint Grammar module uses a compiler developed at the University of Southern Denmark in Odense. The multitagger uses the lexicon Norsk ordbank.
resourceShortName [ref=‘ref1’]: obt
url [ref=‘ref1’]: http://www.tekstlab.uio.no/obt-ny/english/index.html
PID: http://hdl.handle.net/11538/0000-0005-E7C6-7
distributionInfo [ComponentId=‘clarin.eu:cr1:c_1396012485124’] [ref=‘ref1’]:
licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’] [ref=‘ref1’]:
userCategory: Public
distributionAccessMedium: downloadable
downloadLocation: https://github.com/noklesta/The-Oslo-Bergen-Tagger
licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:
licenceFamily: GNU
licenceName: General Public License (GPL)
licenceURL: http://www.gnu.org/licenses/gpl.html
conditionsOfUse: BY
conditionsOfUse: SA
licensor:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
distributionRightsHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/english/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
iprHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’] [ref=‘ref1 ref2’]:
actorType [ref=‘ref2’]: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’] [ref=‘ref3’]:
actorType [ref=‘ref2’]: person
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: Uni Research AS
departmentName [xml:lang=‘en’]: Uni Research Computing
contact [ref=‘ref1’]:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’] [ref=‘ref1 ref2’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’] [ref=‘ref3’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname [xml:lang=‘en’]: Meurer
givenName [xml:lang=‘en’]: Paul
sex: male
position: Senior researcher
affiliation:
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: Uni Research AS
departmentName [xml:lang=‘en’]: Uni Research Computing
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: paul.meurer@uni.no
metadataInfo [ComponentId=‘clarin.eu:cr1:c_1407745711922’] [ref=‘ref1’]:
metadataCreationDate: 2015-03-16
metadataLastDateUpdated: 2017-06-08
metadataCreator [ref=‘ref1’]:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Hagen
givenName: Kristin
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: kristin.hagen@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
validationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711923’] [ref=‘ref2’]:
validated: true
validationModeDetails [ref=‘ref2’]: Bokmål: The evaluation of the morphological constraint grammar modul shows a success rate (recall) of 99% and a precision of 96%. This gives an f-measure of 97.5% (if recall and precision are weighted equally).

The tagger was tested on a 30 000 words long evaluation corpus with texts from newspapers, magazines, journals, government reports and novels.

Including the statistical module to perform complete disambiguation of the evaluation corpus yields a tagger accuracy of 96.5%. This number includes both fully disambiguation of morphology and lemma.

Nynorsk: Evaluation is so far only made for the original CG1-module of the Oslo-Bergen tagger. This module had a success rate (recall) of 98.7% with 93.6% precision. This gives an f-measure of 96.2%.

The evaluation corpus for Nynorsk also had about 30 000 words taken from newspapers, magazines, journals, government reports and novels.
validationReportUnstructured [ComponentId=‘clarin.eu:cr1:c_1353678848789’]:
role [ref=‘ref2’]: validationReport
documentUnstructured: See in publications:
http://www.tekstlab.uio.no/obt-ny/english/publications.html
resourceDocumentationInfo [ComponentId=‘clarin.eu:cr1:c_1355150532301’] [ref=‘ref1’]:
documentationUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532302’]:
role: documentation
documentUnstructured: http://www.tekstlab.uio.no/obt-ny/english/index.html
resourceCreationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711921’] [ref=‘ref1’]:
creationStartDate: 1996
creationEndDate: 2009
resourceCreator [ref=‘ref1’]:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘en’]: the Tagger Project (Taggerprosjektet 1996 - 1998)
fundingType: nationalFunds
funder: The Research Council of Norway
fundingCountry: Norway
projectStartDate: 1996-01-01
projectEndDate: 1998-12-31
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘en’]: Norwegian Newspaper Corpus (2007-2009)
fundingType: nationalFunds
funder: The Research Council of Norway
fundingCountry: Norway
projectStartDate: 2007-01-01
projectEndDate: 2009-12-31
toolInfo [ComponentId=‘clarin.eu:cr1:c_1422885449327’]:
description: The tagger consists of three parts:

1) A multitagger (tokenizer, morphological analyzer, and compund analyzer). The multitagger is currently only distributed in binary form. (ref3)
2) A Constraint Grammar (CG) tagger (ref2)
a) VISL CG-3 compiler from University of Southern Denmark
b) Constraint grammar rules
3) OBT+stat - A statistical (HunPoS) tagger removing ambiguity not resolved in the CG step (currently only for bokmål)
inputInfo [ComponentId=‘clarin.eu:cr1:c_1360931019804’]:
mediaType: text
resourceType: corpus
modalityType: writtenLanguage
languageName: Norwegian
languageName: Norwegian Bokmål
languageName: Norwegian Nynorsk
languageId: No
languageId: Nb
languageId: Nn
mimeType: txt, xml
characterEncoding: latin1, utf-8
annotationType: lemmatization
annotationType: morphosyntacticAnnotation-posTagging
tagset: http://www.tekstlab.uio.no/obt-ny/english/tagset.html
segmentationLevel: word
segmentationLevel: clause
outputInfo [ComponentId=‘clarin.eu:cr1:c_1360931019824’]:
mediaType: text
resourceType: corpus
modalityType: writtenLanguage
languageName: Norwegian
languageName: Norwegian Bokmål
languageName: Norwegian Nynorsk
languageId: No
languageId: Nb
languageId: Nn
mimeType: txt, xml
characterEncoding: latin1, utf-8
tagset: http://www.tekstlab.uio.no/obt-ny/english/tagset.html
segmentationLevel: clause
segmentationLevel: word
toolServiceOperationInfo [ComponentId=‘clarin.eu:cr1:c_1360931019835’]:
operatingSystem: linux
operatingSystem: mac-OS
runningEnvironmentInfo [ComponentId=‘clarin.eu:cr1:c_1360931019826’]:
requiredSoftware [ComponentId=‘clarin.eu:cr1:c_1360931019827’]:
targetResourceNameURI: VISL CG3: http://beta.visl.sdu.dk/cg3/chunked/installation.html.
requiredSoftware [ComponentId=‘clarin.eu:cr1:c_1360931019827’]:
targetResourceNameURI: HunPos: https://code.google.com/p/hunpos/