Till startsida
To content Read more about how we use cookies on gu.se

Språkbanken (the Swedish Language Bank)

In recognition of the groundbreaking corpus linguistic work initiated by Sture Allén at Göteborg University in the 1960s (which had resulted in the creation of one of the first large electronic text corpora in another language than English, Press-65, one million words of newstext), Språkbanken (the Swedish Language Bank) was instated in 1975 as a national center (and funded on the national level) with a remit to collect, process and store (Swedish) text corpora, and to make linguistic data extracted from the corpora available to researchers and to the public. Through Språkbanken such users have been able to access linguistic and statistical data about a diverse range of Swedish text since the 1970s.

Administratively, Språkbanken is a separate unit in the Department of Swedish Language, Göteborg University. Although not anymore funded on the national level, Språkbanken is still perceived as a national resource and is much used by scholars in Sweden and internationally (particularly in Finland, where Swedish is an official language), for empirical research on various aspects of the Swedish language. Språkbanken is also used as a resource in the teaching of Swedish linguistics at the university level in a number of Swedish and Finnish universities.

Today, Språkbanken possesses a unique combination of competences in the areas of Swedish text corpora, parallel text corpora, Swedish computational lexicons, and language technology tools for the processing, annotation and presentation of text corpora, coupled with the kind of stable organization required for lasting large-scale corpus processing and presentation. The corpus material is continually being added to and diversified. The emphasis is on modern Swedish newstext and fiction, but Swedish texts in other genres and from other time periods - in fact, most periods of written Swedish - are increasingly represented in Språkbanken, as are other languages; apart from parallel corpora where Swedish is one of the languages, there are plans to collect material in the national minority languages and a multilingual parallel Scandinavian language corpus.

Språkbanken's presentation of corpora and linguistic data is primarily in the form of concordances, accessed through a search interface. This is the presentation mode of choice for the purposes of traditional linguistic research, whereas language technology researchers often need whole corpora, e.g. in order to apply machine learning algorithms. There is an ambition that Språkbanken be able to fulfil the latter needs better in the future.

Lars Borin

first-name dot last-name at svenska dot gu dot se

Page Manager: Webbredaktionen|Last update: 7/2/2018

The University of Gothenburg uses cookies to provide you with the best possible user experience. By continuing on this website, you approve of our use of cookies.  What are cookies?