CMDI 1.1 Metadata
Header
MdCreator: Kristin Hagen
MdProfile: clarin.eu:cr1:p_1407745711925
MdCollectionDisplayName: Clarino - Textlab
Resources
ResourceProxyList:
ResourceProxy [id=‘bigbrother-lp’]:
ResourceType [mimetype=‘’]: LandingPage
ResourceRef: http://www.tekstlab.uio.no/nota/bigbrother/index.html
ResourceProxy [id=‘bb-transcriptions’]:
ResourceType [mimetype=‘’]: Resource
ResourceRef: http://www.tekstlab.uio.no/nota/bigbrother/index.html#transkripsjon
ResourceProxy [id=‘bb-corpus’]:
ResourceType: Resource
ResourceRef: https://tekstlab.uio.no/glossa3/bb
JournalFileProxyList:
ResourceRelationList:
ResourceRelation:
RelationType: transcriptions
Res1 [ref=‘bb-corpus’]:
Res2 [ref=‘bb-transcripts’]:
IsPartOfList:
Components
corpusProfile:
resourceCommonInfo [ComponentId=‘clarin.eu:cr1:c_1396012485126’]:
resourceType [ref=‘bb-corpus’]: corpus
identificationInfo [ComponentId=‘clarin.eu:cr1:c_1396012485125’]:
resourceName [ref=‘bb-corpus’] [xml:lang=‘en’]: The BigBrother Corpus
resourceName [ref=‘bb-corpus’] [xml:lang=‘nb’]: BigBrother-korpuset
description [ref=‘bb-corpus’] [xml:lang=‘nb’]: BigBrother-korpuset er et talespråkskorpus som består av den første sesongen av realityserien BigBrother som ble sendt på TVNorge våren 2001. Deltakerne i BigBrother er i alderen 23-36 år og snakker ulike dialekter.

BigBrother-korpuset inneholder lyd- og videoopptak av nesten alle de 100 sendingene som ble vist på tv, cirka 44 300 tokens. Materialet er ortografisk transkribert og lenket til lyd og video. Transkripsjonene er også tagget morfologisk.

BigBrother-korpuset er et unikt talespråkskorpus der deltakerne arbeider sammen, diskutere, argumenterer, krangler, gråter, ler, roper og elsker. I motsetning til kontrollerte talespråksinnspillinger som ofte er begrenset til intervjuer og dialog, har BigBrother-materialet samtaler om alle mulige temaer og innen ulike genre. Noen ganger er sterke følelser i sving, og dette kan tenkes å innvirkning på språket.

Transkripsjonene er nedlastbare.
description [ref=‘bb-corpus’] [xml:lang=‘en’]: The BigBrother Corpus is a speech corpus with recordings from the first season of the BigBrother show, sent on Norwegian television by TVNorge in the first half of 2001. The participants in BigBrother speak different dialects, but primarily they come from the east of Norway. They are aged 23-36 years.

The BigBrother Corpus contains audio and video recordings of almost all the 100 broadcasts that was shown on television, approx. 440 300 tokens. The recordings are linked to the orthographic transcriptions. The transcriptions are also tagged morphologically.

The BigBrother Corpus is a unique speech corpus where the participants work together, discuss, argue, quarrel, cries, laugh, shout, make love etc. In contrast to controlled recordings that are limited to interviews and dialogue, the BigBrother-material has conversations about all possible topics and within different genre. Sometimes strong feelings are in turn, which also can conceivably have an impact on the language.

The transcripts can be downloaded.
resourceShortName: BigBrother
url: http://www.tekstlab.uio.no/nota/bigbrother/index.html
PID: http://hdl.handle.net/11538/0000-0005-E7C1-C
distributionInfo [ComponentId=‘clarin.eu:cr1:c_1396012485124’]:
licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’] [ref=‘bb-corpus’]:
userCategory: Academic
distributionAccessMedium: accessibleThroughInterface
executionLocation [ref=‘bb-corpus’]: https://tekstlab.uio.no/glossa3/bb
licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:
licenceFamily: CLARIN
licenceName: CLARIN_ACA-NC-LOC-PRIV-ND-*
licenceURL: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
conditionsOfUse: *
conditionsOfUse: BY
conditionsOfUse: ID
conditionsOfUse: LOC
conditionsOfUse: NC
conditionsOfUse: ND
conditionsOfUse: NORED
conditionsOfUse: PRIV
nonStandardConditionsOfUse: The corpus has audio and video recordings classified as personal data.The production company Nordic Entertainment has generously given their consent to the usage of the videos as a speech corpus. Every individual researcher is responsible for treating the participants with resepct and sincerity. Furthermore, the informants in the corpora should be anonymized, e.g. by changing their names, in every published paper or other output.
licensor:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘nb’]: Universitetet i Oslo
organizationName [xml:lang=‘en’]: University of Oslo
organizationShortName [xml:lang=‘nb’]: UiO
organizationShortName [xml:lang=‘en’]: UoU
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘nb’]: Institutt for lingvistiske og nordiske studier
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
distributionRightsHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
iprHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: Nordic Entertainment (ipr holder of the videos)
contact:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: The Text Laboratory
organizationShortName [xml:lang=‘en’]: Textlab
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
metadataInfo [ComponentId=‘clarin.eu:cr1:c_1407745711922’]:
metadataCreationDate: 2015-02-24
metadataLastDateUpdated: 2023-11-22
metadataCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Hagen
givenName: Kristin
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: kristin.hagen@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
versionInfo [ComponentId=‘clarin.eu:cr1:c_1430905751648’]:
version: Second version
validationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711923’]:
validated: true
validationType: content
validationMode: manual
validationModeDetails: The transcriptions are proof read against the audio files.
validationExtent: full
validator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
resourceDocumentationInfo [ComponentId=‘clarin.eu:cr1:c_1355150532301’]:
documentationUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532302’]:
role: documentation
documentUnstructured: http://www.tekstlab.uio.no/nota/bigbrother/index.html
resourceCreationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711921’]:
creationStartDate: 2007-08-01
creationEndDate: 2009-12-31
resourceCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName: Developing and completing language resources: The Big Brother show as a modern speech corpus
url: http://www.tekstlab.uio.no/nota/bigbrother/index.html
fundingType: nationalFunds
funder: The Research Council of Norway, the KUNSTI program (Kunnskapsutvikling for norsk språkteknologi).
fundingCountry: Norway
projectStartDate: 2007-08-31
projectEndDate: 2007-12-31
corpusInfo [ComponentId=‘clarin.eu:cr1:c_1407745711878’]:
corpusType [ref=‘bb-corpus’]: Multimodal Corpus
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’] [ref=‘bb-corpus’]:
mediaType [ref=‘bigbrother-lp’]: text
corpusTextInfo [ComponentId=‘clarin.eu:cr1:c_1396012485188’] [ref=‘bigbrother-lp’]:
textFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477072’] [ref=‘bigbrother-lp’]:
mimeType: txt
sizePerTextFormat [ComponentId=‘clarin.eu:cr1:c_1447674760342’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 440 338
sizeUnit: tokens
characterEncodingInfo [ComponentId=‘clarin.eu:cr1:c_1447674760355’] [ref=‘bigbrother-lp’]:
characterEncoding: utf8
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’] [ref=‘bb-corpus’]:
mediaType: video
corpusVideoInfo [ComponentId=‘clarin.eu:cr1:c_1407745711880’]:
videoContentInfo [ComponentId=‘clarin.eu:cr1:c_1360931019779’]:
typeOfVideoContent: Recordings from the first season of the BigBrother show, that means audio and video recordings of almost all the 100 broadcasts that was shown on television.
textIncludedInVideo: none
settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:
naturality: spontaneous
conversationalType: multilogue
audience: some
interactivity: overlapping
interaction: All kinds of situations in the BigBrother house. The participants prepare dinner, eat, sleep, make love, discuss, work together etc etc.
videoFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477073’]:
mimeType: videos in mpeg4 streaming format available through Glossa
compressionInfo [ComponentId=‘clarin.eu:cr1:c_1360230992165’]:
compression: true
compressionName: mpg
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’] [ref=‘bb-corpus’]:
mediaType: audio
corpusAudioInfo [ComponentId=‘clarin.eu:cr1:c_1404130561236’]:
audioSizeInfo [ComponentId=‘clarin.eu:cr1:c_1360230992160’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: Approx 29 GB
sizeUnit: gb
audioContentInfo [ComponentId=‘clarin.eu:cr1:c_1360230992161’]:
settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:
naturality: spontaneous
conversationalType: multilogue
audience: some
interactivity: overlapping
interaction: All kinds of situations in the BigBrother house. The participants prepare dinner, eat, sleep, make love, discuss, work together etc etc.
audioFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477070’]:
mimeType: wav and mpeg
signalEncoding: linearPCM
samplingRate: 32
quantization: 64
numberOfTracks: 1
recordingQuality: medium
compressionInfo [ComponentId=‘clarin.eu:cr1:c_1360230992165’]:
compression: true
compressionName: mpg
corpusPartGeneralInfo [ComponentId=‘clarin.eu:cr1:c_1407745711882’] [ref=‘bb-corpus’]:
personSourceSetInfo [ComponentId=‘clarin.eu:cr1:c_1360931019775’]:
numberOfPersons: 12
ageOfPersons: adult
ageRangeStart: 23
ageRangeEnd: 36
sexOfPersons: mixed
originOfPersons: native
dialectAccentOfPersons: Some dialects represented, all of them from Southern Norway.
lingualityInfo [ComponentId=‘clarin.eu:cr1:c_1355150532313’]:
lingualityType: monolingual
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: No
languageName: Norwegian
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: Nb
languageName: Norwegian Bokmål
modalityInfo [ComponentId=‘clarin.eu:cr1:c_1447674760356’]:
modalityType: spokenLanguage
modalityTypeDetails: Informal language from all settings in the BigBrother house.
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: morphosyntacticAnnotation-posTagging
annotatedElements: other
segmentationLevel: word
tagset: POS tagset created for the statistical NoTa-tagger - based on the tagset of the Oslo Bergen Tagger.
tagsetLanguageId: nb
tagsetLanguageName: Norwegian Bokmål
theoreticModel: TreeTagger
annotationMode: automatic
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: manual
title [xml:lang=‘nb’]: NoTa-taggeren: TAGGEVEILEDNING
author: Åshild Søfteland
year: 2007
url: http://www.tekstlab.uio.no/nota/oslo/Taggeveiledning2.pdf
documentLanguageName: Norwegian bokmål
documentLanguageId: nb
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: article
title [xml:lang=‘en’]: Tagging a Norwegian Speech Corpus
author: Anders Nøklestad and Åshild Søfteland
editor: Joakim Nivre,Heiki-Jaan Kaalep,Kadri Muischnek, Mare Koit
year: 2007
bookTitle: Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007
pages: 245–248
conference: Nodalida 2007
documentLanguageName: English
documentLanguageId: en
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: article
title [xml:lang=‘nb’]: Manuell morfologisk
tagging av NoTa-materialet med støtte fra en statistisk tagger.
author: Åshild Søfteland og Anders Nøklestad
editor: Janne Bondi Johannessen og Kristin Hagen
year: 2008
publisher: Novus forlag
bookTitle: Språk i Oslo. Ny forskning omkring talespråk
pages: 226–234.
ISBN: 978-82-7099-471-7
documentLanguageName: Norwegian
documentLanguageId: nb
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: speechAnnotation-orthographicTranscription
annotationManualUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532325’]:
role: annotationManual
documentUnstructured: Orthographic transcription,cf Bokmålsordboka (Wangensteen 2004)
annotationManualUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532325’]:
role: annotationManual
documentUnstructured: http://www.tekstlab.uio.no/nota/bigbrother/
annotationTool [ComponentId=‘clarin.eu:cr1:c_1355150532326’]:
targetResourceNameURI: Transcriber (http://trans.sourceforge.net/en/presentation.php )
classificationInfo [ComponentId=‘clarin.eu:cr1:c_1403588862809’]:
genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:
genreType: speechGenre
genre: informal
unstandardisedGenre: All kinds of situations in the BigBrother house. The participants prepare dinner, eat, sleep, make love, discuss, work together etc etc. Lots of emotions.
timeCoverageInfo [ComponentId=‘clarin.eu:cr1:c_1447674760358’]:
timeCoverage: 2001