Nordic Dialect Corpus

Nordic Dialect Corpus and Syntax Database

The corpus and database have been initiated under the ScanDiaSyn research network umbrella and the Nordic Centre of Excellence NORMS. The technical solutions are provided by the Text Laboratory. Both language resources are intended for research and education.

LATEST
• New search interfaces for Nordic Dialect Corpus v. 4.0 (November 2023)
• The NDC Parser: Speech parser trained on the NDC Treebank (June 2023)
• The NDC Treebank includes 4637 speech segments and 66 042 tokens from the Norwegian part of Nordic Dialect Corpus. The treebank is annotated with morphological and dependency-style syntactic analysis and is both searchable in Glossa and available as downloadable conllx files. (2022)
• Nordic Dialect Corpus v. 4.0: Dialect recordings and transcriptions from 1998 - 2015 only. (September 2019)
The older recordings from Målførearkivet are moved to LIA Norwegian - Corpus of Old Dialect Recordings.
• User Manual for Nordic Dialect Corpus in the new search interface (June 2019)
• Nordic Dialect Corpus v. 3.0: Expanded Icelandic and Swedish part. 16 new informants from Iceland and 24 from Sweden and Swedia2000 (Sept. 2017)
• New search interfaces for Nordic Dialect Corpus v. 2.0 and Nordic Syntax Database. (2017)
• Nordic Atlas of Language Structures (NALS) Journal is published (2014)
• Nordic Dialect Corpus: Expanded Icelandic part - 20 new informants from 6 places (2013)

Nordic Dialect Corpus
Nordic Dialect Corpus is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see Data Collection). The corpus contains in excess 2.75 million words from conversations and interviews by dialect speakers. It is transcribed and linked to audio and video, has a map function, and can be searched in a large variety of ways. Even if the aim of the corpus is Nordic syntax research, the corpus is a general one, a Norwegian Dialect Corpus, a Swedish Dialect Corpus and so on, to be used in a wide range of research areas, such as phonology, morphology and lexicography.

How to refer to the corpus: Johannessen, Janne Bondi, Joel Priestley, Kristin Hagen, Tor Anders Åfarli, and Øystein Alexander Vangsnes. 2009. The Nordic Dialect Corpus - an Advanced Research Tool. In Jokinen, Kristiina and Eckhard Bick (eds.): Proceedings of the 17th Nordic Conference of Computational Linguistics NODALIDA 2009. NEALT Proceedings Series Volume 4. (Read the paper)
Please also add the corpus handle:
The Nordic Dialect Corpus: https://hdl.handle.net/11538/0000-0005-E7C7-6

Read also: Johannessen, Janne Bondi, Øystein Alexander Vangsnes, Joel Priestley, and Kristin Hagen. 2014. A multilingual speech corpus of North-Germanic languages. In: Spoken Corpora and Linguistic Studies. John Benjamins Publishing Company p. 69-83. (Read the paper)

Nordic Syntax Database
The database consists of judgments by 924 Nordic dialect speakers from 207 places to a list of sentences that illustrate various syntactic phenomena. Many of the speakers are the same in both database and corpus. The sentences have been given grades, and on the basis of this, dialet maps can be generated, and isoglosses drawn. The judgments can be sorted and filtered in many ways according to place, age, sex of informants or type of syntactic phenomenon.

How to refer to the database: Lindstad, Arne Martinus; Nøklestad, Anders; Johannessen, Janne Bondi; Vangsnes, Øystein Alexander. 2009. The Nordic Dialect Database: Mapping Microsyntactic Variation in the Scandinavian Languages. In Jokinen, Kristiina and Eckhard Bick (eds.): NEALT Proceedings Series;Volum 4. (Read the paper).
Please also add the database handle:
The Nordic Syntax Database: https://hdl.handle.net/11538/0000-0005-E7C8-5