The Corpus of Advanced Learner Finnish (LAS2): Database and toolkit to study academic learner Finnish

Ilmari Ivaska

Learner language, Finnish as a second language, Corpus typology, Annotation, Corpus tools


This paper introduces the Corpus of Advanced Learner Finnish (LAS2), one of the existing corpora of learner Finnish. The corpus was started at the University of Turku in 2007, and the initial motivation for its collection was to make it possible to deal with novel linguistic challenges posed by academic immigration and to contribute to corpus linguistics, Finnish linguistics and the study of second language acquisition. This paper describes the typological standpoint of the LAS2, its position with respect to other corpora of learner Finnish, the compilation criteria, the annotation applied and the workflow implemented. The corpus consists of three subcorpora of written academic texts of non-native speakers of Finnish. The subcorpora are 1) texts for examination purposes, 2) texts for publishing and graduating purposes, and 3) texts for studying and learning purposes. The informants either study or work in Finnish within academia in Finland. When available, the data has been collected longitudinally. A reference corpus for each subcorpus written by native speakers has also been compiled. Three query tools designed within the framework of the LAS2 are also introduced. These tools enable queries based on any combinations of the linguistic annotation. They can also be used to analyse the typical inner or cotextual variation of any user-specified linguistic node or to create frequency lists of multiword units defined at any level of the annotation. The queries can be limited to a user-specified subset of the data.

Apples - Journal of Applied Language Studies