History
- The tagger was originally developed by the Tagger Project (Taggerprosjektet 1996 - 1998) with a CG1 Constraint Grammar rule interpreter from Lingsoft. The Tagger Project was funded by the Norwegian Research Council together with own efforts from the Documentation Project and the Text Laboratory. (Read about the Tagger Project and the first versjon of the Bokmål Tagger - in Norwegian only).
These were developed: A preprocessor (The Documentation Project and the Text Laboratory), a compound analyser (the Text Laboratory), a multitagger (The Documentation Project) and CG1-rules for morphology and syntax for Bokmål and Nynorsk (The Text Laboratory). Also, a first version of the lexicon Norsk ordbank was made in connection with the project.
The syntactic CG rules are not included in the current version of the tagger. - Later, the tagger was further developed and reimplemented through a collaboration between Paul Meurer, Uni Computing (then Aksis, University of Bergen), the Text Laboratory and the Documentation Project, University of Oslo. This version was written in Allegro Lisp while the Constraint Grammar rules were retained in CG1 format. The tagger was named The Oslo-Bergen Tagger.
- A module for identification and classification of proper names was developed by the Nomen Nescio Project (2001 - 2004), Uni Computing (then Aksis) and the Text Laboratory. This part is not included in the current version of the tagger.
- A downloadable version of The Oslo-Bergen tagger was financed by the LOGON Project (2006).
- With funding from the NFR Project Norwegian Newspaper Corpus (2007-2009), the Text Laboratory converted the morphological and syntactic rules from the original CG1 format to the newer and more sophisticated CG3 format, and wrote some more rules. Uni Computing made a stand-alone version of the preprocessor and multitagger in Clozure Common Lisp that could work together with the CG3-compiler (VISL-CG3) from the University of Southern Denmark in Odense. Finally, the Text Laboratory trained a HunPos tagger which removes the remaining ambiguity left from the the morphological CG3-tagging.
- With funding from the Ministry of Foreign Affairs and the infrastructure project Clarino+, a new version of the multitagger was created in Python (2018-2022). Here, among other things, the multi-word expressions from the original multitagger are gone so that each word gets its own reading.
- Both the lexicon, the multitagger and the CG rules are revised and modernized with financing from the Clarino+-project (2020-2023).