Lexicon of Slovene
The bibliographical data of the Sloleks Morphological Lexicon of Slovene can be found at the link below.
Date of last update:
22. 10. 2019
The thesaurus database is available at the CLARIN.SI repository.
This work is licensed under a
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International licence.
Sloleks is a lexicon of Slovene word forms. As a structured database, it contains essential information on Slovene words, e.g. their part-of-speech category and their grammatical features. For each word, the database also contains all its inflected forms. Slovene is a morphologically rich language (inflected forms are found in nouns, adjectives, pronouns, numerals, verbs, and adverbs), so the number of words and their inflected forms is very high. Sloleks as such an extensive database is thus useful both for language speakers interested in Slovene word inflection and developers of language technologies for Slovene. In version 2.0, word forms have also been assigned accents and IPA transcriptions using automatic methods, and users can rate them as either adequate or inadequate.
Sloleks 2.0 contains 100,802 headwords and 2,792,003 word forms with their grammatical and accentual features. The distribution of headwords by part-of-speech category is shown in the graph on the right. Nouns account for the highest percentage, with a total of 54,260 headwords (43,908 of these are common nouns, while 10,352 are proper nouns). They are followed by adjectives (26,612 headwords), verbs (10,242), adverbs (6,906), numerals (2,240), and other parts of speech (169 pronouns, 96 prepositions, 85 interjections, 70 abbreviations, 68 particles, and 54 conjunctions). Each word form also features its frequency in the Gigafida 2.0 corpus as well as links to corpus examples in which the form occurs.
In addition to accented word forms, version 2.0 also includes pronunciation recordings and IPA transcriptions. A total of 3,069,151 accents were assigned automatically to word forms using neural networks (Krsnik 2018), while the pronunciation recordings were generated with the eBralec speech synthesis system. In accordance with the concept of the responsive dictionary, the Sloleks 2.0 interface allows users to contribute to the database and help improve it in several ways: by up- or downvoting accented word forms, IPA transcriptions or their recordings, by reporting a missing accent pattern for a given headword, and by adding user recordings of headword pronunciation.
The Sloleks lexicon is intended both for language users who can browse the database through an online interface and find information on the inflectional, grammatical, and accentual features of Slovene word forms, as well as for developers of language technology applications (such as speech recognizers and synthesizers, accent assignment systems, inflection generators, morphosyntactic taggers) and researchers in linguistics. For the latter two, Sloleks 2.0 is also available in XML format at the CLARIN.SI repository under the Creative Commons Attribution-ShareAlike 4.0 International licence (CC BY-SA 4.0).
The Sloleks Morphological Lexicon of Slovene is part of an organized effort by the Centre for Language Resources and Technologies of the University of Ljubljana to establish an infrastructure for Slovene that is comparable to the infrastructures of larger languages. We believe that, in terms of methodology, the construction of language resources should follow the contemporary zeitgeist and that all data prepared through publicly financed initiatives and projects should be openly accessible to all potential users for the further development of language technologies, considering the actual needs of modern language users in the digital age. The process of constructing the Sloleks Morphological Lexicon of Slovene thus also puts considerable effort into establishing a dedicated community that not only uses the lexicon, but also contributes to its development.
DOBROVOLJC, Kaja, KREK, Simon, ERJAVEC, Tomaž. Leksikon besednih oblik Sloleks in smernice njegovega razvoja. V: Vojko Gorjanc, Polona Gantar, Iztok Kosem, Simon Krek: Slovar sodobne slovenščine: problemi in rešitve. Ljubljana: Znanstvena založba Filozofske fakultete, 2015. 80-105.
DOBROVOLJC, Kaja, KREK, Simon, ERJAVEC, Tomaž. The Sloleks Morphological Lexicon and its Future Development. V: Vojko Gorjanc, Polona Gantar, Iztok Kosem, Simon Krek: Dictionary of Modern Slovene: Problems and Solutions. Ljubljana: Znanstvena založba Filozofske fakultete, 2018. 42-63.
KRSNIK, Luka. Napovedovanje naglasa slovenskih besed z metodami strojnega učenja. Magistrsko delo. 2018.
DOBROVOLJC, Kaja. Oblikoslovne informacije v sodobnih slovarskih priročnikih. V: Vojko Gorjanc, Polona Gantar, Iztok Kosem, Simon Krek: Slovar sodobne slovenščine: problemi in rešitve. Ljubljana: Znanstvena založba Filozofske fakultete, 2015. 64-79.
KREK, Simon, ERJAVEC, Tomaž, HOLOZAN, Peter. Specifikacije za leksikon besednih oblik (kazalnik 3). Projekt Sporazumevanje v slovenskem jeziku, 2008.
ARHAR, Špela. Učni korpus SSJ in leksikon besednih oblik za slovenščino. Jezik in slovstvo 54/3–4, 2009, 43–56.
FIŠER, Darja, ČIBEJ, Jaka, DOBROVOLJC, Kaja, GANTAR, Polona, KOSEM, Iztok, ARHAR HOLDT, Špela, POPIČ, Damjan, ERJAVEC, Tomaž. Množičenje za slovar sodobnega slovenskega jezika. V: Vojko Gorjanc, Polona Gantar, Iztok Kosem, Simon Krek: Slovar sodobne slovenščine: problemi in rešitve. Ljubljana: Znanstvena založba Filozofske fakultete, 2015. 566-586.
ARHAR HOLDT, Špela, DOBROVOLJC, Kaja, POPIČ, Damjan. Reprezentacija standardnega in nestandardnega v virih SSJ. V: Družbena funkcijskost jezika: (vidiki, merila, opredelitve). Ljubljana: Znanstvena založba Filozofske fakultete, 2013, 19-27.
REJC, Rok. Generiranje slovenskih besednih oblik s pomočjo strojnega učenja. Diplomsko delo. 2017.
KREK, Simon. Leksikografska orodja za slovenščino: slovnica besednih skic. V: Vojko Gorjanc, Polona Gantar, Iztok Kosem, Simon Krek: Slovar sodobne slovenščine: problemi in rešitve. Ljubljana: Znanstvena založba Filozofske fakultete, 2015, 358-378.
ARHAR HOLDT, Špela, ČIBEJ, Jaka. Oblikoslovni vzorci v leksikonu Sloleks: izhodiščni nabor za samostalnike. Slovnične raziskave za jezikovni opis, Letn. 6, št. 2 (2018). Ljubljana: Trojina, zavod za uporabno slovenistiko, 2018, 33-66.
The data for the upgrade of Sloleks to version 2.0 was prepared based on Sloleks 1.0 by an interdisciplinary group of researchers at the Centre for Language Resources and Technologies of the University of Ljubljana.
The development of Sloleks 2.0 was financed by the CJVT and CLARIN.SI infrastructural programs. Research was funded by the ARRS P6-0411 - Language resources and technologies for Slovene and P6-0215 Slovene language – basic, contrastive, and applied studies research programs.
The interface was designed by Studio Kruh
and developed by Leon Noe Jovan.