CMDI 1.1. Metadata
Header
MdCreator: Kristin Hagen
MdCreationDate: 2014-11-28
MdSelfLink:
MdProfile: clarin.eu:cr1:p_1407745711925
MdCollectionDisplayName: Clarino - Textlab
Resources
ResourceProxyList:
ResourceProxy [id=‘ref1’]:
ResourceType [mimetype=‘txt’]: Resource
ResourceRef: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
ResourceProxy [id=‘ref2’]:
ResourceType [mimetype=‘txt’]: Resource
ResourceRef: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/tjenester/nowac-frequency.html
JournalFileProxyList:
ResourceRelationList:
ResourceRelation:
RelationType: source
Res1 [ref=‘ref1’]:
Res2 [ref=‘ref2’]:
IsPartOfList:
Components
corpusProfile:
resourceCommonInfo [ComponentId=‘clarin.eu:cr1:c_1396012485126’]:
resourceType: corpus
identificationInfo [ComponentId=‘clarin.eu:cr1:c_1396012485125’]:
resourceName: NoWaC v 1.0 (Norwegian Web as Corpus)
description [xml:lang=‘en’]: Web-based corpus of Bokmål Norwegian containing about 700 million tokens. The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet).

There are no information about author, publisher, genre etc in the corpus.

NoWaC can be downloaded (scrambled version) or accessed through a search interface (Glossa).
resourceShortName: NoWaC
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
PID: http://hdl.handle.net/11538/0000-0005-E7C0-D
distributionInfo [ComponentId=‘clarin.eu:cr1:c_1396012485124’]:
licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’]:
userCategory: Public
distributionAccessMedium: downloadable
downloadLocation: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:
licenceFamily: Creative Commons (CC)
licenceName: Creative_Commons-BY-NC-SA (CC-BY-NC-SA)
licenceURL: http://creativecommons.org/licenses/by-nc-sa/2.0/
conditionsOfUse: BY
conditionsOfUse: NC
conditionsOfUse: SA
licensor:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
distributionRightsHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’]:
userCategory: Academic
distributionAccessMedium: accessibleThroughInterface
executionLocation: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:
licenceFamily: CLARIN
licenceName: CLARIN_ACA-NC-LOC-ND
licenceURL: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&NORED=1&ND=1
conditionsOfUse: BY
conditionsOfUse: ID
conditionsOfUse: LOC
conditionsOfUse: NC
conditionsOfUse: ND
conditionsOfUse: NORED
nonStandardConditionsOfUse: The unscrambled corpus is accesible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
licensor:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
distributionRightsHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
iprHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Guevara
givenName: Emiliano
affiliation:
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
departmentName: Department of Linguistics and Scandinavian Studies
contact:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
metadataInfo [ComponentId=‘clarin.eu:cr1:c_1407745711922’]:
metadataCreationDate: 2014-11-28
metadataLastDateUpdated: 2017-06-08
metadataCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Hagen
givenName: Kristin
sex: female
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: kristin.hagen@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
versionInfo [ComponentId=‘clarin.eu:cr1:c_1430905751648’]:
version: v 1.0
resourceDocumentationInfo [ComponentId=‘clarin.eu:cr1:c_1355150532301’]:
documentationUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532302’]:
role: documentation
documentUnstructured: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html
resourceCreationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711921’]:
creationStartDate: 2009-08-01
creationEndDate: 2010-12-31
resourceCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Guevara
givenName: Emiliano
affiliation:
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: emiguevara@gmail.com
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName: Emiliano Guevara's PhD project
fundingType: nationalFunds
relationInfo [ComponentId=‘clarin.eu:cr1:c_1396012485134’]:
resourceRelation [ComponentId=‘clarin.eu:cr1:c_1396012485137’]:
relatedResource [ComponentId=‘clarin.eu:cr1:c_1430905751623’] [ref=‘ref2’] [relationRole=‘derivate-from’] [role=‘source’]:
referenceScope [ref=‘ref2’]: externalResource
resourceReference [ref=‘ref2’]: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/tjenester/nowac-frequency.html
relatedResource [ComponentId=‘clarin.eu:cr1:c_1430905751623’] [ref=‘ref2’] [role=‘isPartOf’] [relationRole=‘source’]:
referenceScope [ref=‘ref1’]: thisResource
relationType [ComponentId=‘clarin.eu:cr1:c_1396012485127’]:
relationName [ref=‘ref2’]: derivate
corpusInfo [ComponentId=‘clarin.eu:cr1:c_1407745711878’]:
corpusType: Written Corpus
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:
mediaType: text
corpusTextInfo [ComponentId=‘clarin.eu:cr1:c_1396012485188’]:
textFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477072’]:
mimeType: txt
characterEncodingInfo [ComponentId=‘clarin.eu:cr1:c_1447674760355’]:
characterEncoding: utf-8
corpusPartGeneralInfo [ComponentId=‘clarin.eu:cr1:c_1407745711882’]:
lingualityInfo [ComponentId=‘clarin.eu:cr1:c_1355150532313’]:
lingualityType: monolingual
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: Nb
languageName: Norwegian Bokmål
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 7000 000
sizeUnit: tokens
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: morphosyntacticAnnotation-posTagging
annotationType: lemmatization
segmentationLevel: word
tagset: The Oslo Bergen-tagger tagset: http://tekstlab.uio.no/obt-ny/english/index.html
tagsetLanguageId: NB
tagsetLanguageName: Norwegian bokmål
annotationMode: automatic
annotationManualUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532325’]:
role: annotationManual
documentUnstructured: http://www.tekstlab.uio.no/obt-ny/english/index.html
annotationTool [ComponentId=‘clarin.eu:cr1:c_1355150532326’]:
targetResourceNameURI: The Oslo-Bergen Tagger: http://tekstlab.uio.no/obt-ny/english/index.html
classificationInfo [ComponentId=‘clarin.eu:cr1:c_1403588862809’]:
conformanceToClassificationScheme: other
genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:
genreType: textGenre
genre: unstandardised
unstandardisedGenre: scrambled web corpus/searchable web corpus
timeCoverageInfo [ComponentId=‘clarin.eu:cr1:c_1447674760358’]:
timeCoverage: November 2009 - January 2010