CJVT Kolokacije

CJVT kolokacije 2.2

Collocations Dictionary of Modern Slovene

Impressum

About the Collocations Dictionary

In its current version, the Collocations Dictionary of Modern Slovene contains 0 headwords and 0 collocations.

About the Collocations Dictionary

Current version

The current version is 2.2.
Date of publication: 28. 11. 2025

Versions

Availability

The Collocations Dictionary database is available at the CLARIN.SI repository.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International licence.

About the Collocations Dictionary

Collocations are typical co-occurrences of words and represent an important part of the language. By providing information on what is typical in language, collocations dictionaries are useful in language production and acquisition. In its current version, the Thesaurus of Modern Slovene contains 0 headwords and 0 collocations, is the first dictionary of collocations for Slovene and represents the first step towards filling the gap in the field of language resources for Slovene, particularly those aimed at facilitating language production.

The dictionary has been compiled using advanced computational methods for the automatic extraction of Slovene collocations, which have already been tested and evaluated and are constantly improved. In terms of financial resources, computer-assisted data preparation is significantly less demanding and more economical than manual processing, and is also significantly less time-consuming. This enables regular updates and upgrades to the resource, making the dictionary a dynamic source of language information.

The Collocations Dictionary of Modern Slovene is the second responsive dictionary published in Slovenia (the first being the Thesaurus of Modern Slovene). With responsive dictionaries, the compilation of the dictionary database is immediately followed by providing the language community with access to a large amount of relevant, albeit somewhat noisy language information. One of the main advantages of responsive dictionaries is the fact that the data can be quickly updated based on both the progress in the database as well as the changes in language use.

Based on the findings of various user studies, we have introduced a new format of entry presentation. The five stages of development used in the first version, indicated by different levels of colouration of the pyramid icon, have been replaced by three stages, which are easily identifiable to the users through the section titles:

Stage 1: contains only the section “Automatically extracted collocations”. For these entries, sense division and collocation analysis have not yet been conducted.
Stage 2: contains both sections “Collocations” and “Automatically extracted collocations”. These entries already contain sense division. While a part of the collocations have already been manually analysed and distributed under relevant senses, there remain some automatically extracted collocations.
Stage 3: contains only the section “Collocations”. These entries contain sense division and manually analysed collocations.

The timestamp, along with archived previous versions of the dictionary database, enables tracking any changes in dictionary entries.

An important addition is a new, adapted method of user participation in improving the dictionary. This method has been introduced based on research and user studies. The users can now evaluate automatically selected good dictionary examples, focussing on whether the selected corpus sentence suitably exemplifies the selected collocation. If the entry already contains senses, the users can also evaluate whether the example is listed under the correct sense or, in the case of automatically extracted collocations, select the relevant sense for the example (and the collocation).

Although automatic data extraction and ranking are never perfectly accurate, the results are useful to dictionary users even before lexicographic post-processing. This has been confirmed by resources and tools for other languages (e.g. Merriam-Webster, the Digital Dictionary of German DWDS), which include automatically extracted data in their entries. An example of a successfully automatically generated language resource for Slovene is the Thesaurus of Modern Slovene. The effectiveness of automatic collocation extraction has also been confirmed by a linguistic evaluation, during which the automatically extracted collocations in the ten most frequent syntactic structures of 333 headwords have been rated as adequate or inadequate. For the second version of the dictionary, the method of automatic collocation extraction has been improved, enabling the inclusion of collocations from additional syntactic structures in dictionary entries.

The Collocations Dictionary of Modern Slovene is part of an organized effort to establish an infrastructure for Slovene that is comparable to the infrastructures of larger languages. We believe that, in terms of methodology, the construction of language resources should follow the contemporary zeitgeist and that all data prepared through publicly financed initiatives and projects should be openly accessible for the further development of language technologies, considering the actual needs of modern language users in the digital age. The process of constructing the Collocations Dictionary of Modern Slovene thus also puts considerable effort into establishing a dedicated community that not only uses the dictionary but also contributes to its development.

New Features

Version 2.1 contained 2,890 manually compiled entries with collocations distributed under senses (800 new ones). Collocations 2.2 contains 4,377 manually compiled entries (1,487 new ones); furthermore, the dictionary includes an additional 2,549 sense-divided entries containing a combination of manually reviewed collocations, distributed under meanings, and automatic collocations. During the development of version 2.2, the greatest attention was paid to collocation analysis – the number of manually confirmed collocations now exceeds 268,000, which is twice as many as in the previous version. At the same time, we removed more than 3,000 headwords from the dictionary due to the identification and elimination of bad collocation candidates and/or headwords. Versions 2.1 and 2.2 were developed as part of the project Data Upgrade and Gamification of Dictionary Resources at CJVT UL (PODVIG), which was funded by the Ministry of Culture of the Republic of Slovenia.

Publications

PORI, Eva, KOSEM, Iztok, ČIBEJ, Jaka, ARHAR HOLDT, Špela. Evalvacija uporabniškega vmesnika Kolokacijskega slovarja sodobne slovenščine. V: KOSEM, Iztok (ur.). Kolokacije v slovenščini. 1. izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2021. Str. 235-268, ilustr. Zbirka Sporazumevanje. ISBN 978-961-06-0537-9. ISSN 2738-4527. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/318/465/6973-1.

GANTAR, Polona, KREK, Simon, KOSEM, Iztok. Opredelitev kolokacij v digitalnih slovarskih virih za slovenščino. V: KOSEM, Iztok (ur.). Kolokacije v slovenščini. 1. izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2021. Str. 15-41, ilustr. Zbirka Sporazumevanje. ISBN 978-961-06-0537-9. ISSN 2738-4527. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/318/465/6969-1.

KREK, Simon, GANTAR, Polona, KOSEM, Iztok, DOBROVOLJC, Kaja. Opis modela za pridobivanje in strukturiranje kolokacijskih podatkov iz korpusa. V: ARHAR HOLDT, Špela (ur.). Nova slovnica sodobne standardne slovenščine : viri in metode. 1. izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2021. Str. 160-194, ilustr. Zbirka Sporazumevanje. ISBN 978-961-06-0547-8. ISSN 2738-4527. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/325/477/7313-1.

KOSEM, Iztok, LOGAR, Nataša, DOBROVOLJC, Kaja, LJUBEŠIĆ, Nikola. Razvrščanje in relevantnost kolokatorjev v slovenščini : novi pristopi. V: KOSEM, Iztok (ur.). Kolokacije v slovenščini. 1. izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2021. Str. 79-124, ilustr. Zbirka Sporazumevanje. ISBN 978-961-06-0537-9. ISSN 2738-4527. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/318/465/6971-1.

LJUBEŠIĆ, Nikola, LOGAR, Nataša, KOSEM, Iztok. Collocation ranking : frequency vs semantics. Slovenščina 2.0 : empirične, aplikativne in interdisciplinarne raziskave. 2021, letn. 9, št. 2, str. 41-70, ilustr. ISSN 2335-2736. https://revije.ff.uni-lj.si/slovenscina2/article/view/10365/9997, DOI: 10.4312/slo2.0.2021.2.41-70.

ARHAR HOLDT, Špela. Razvrstitev kolokacij v slovarskem vmesniku : uporabniške prioritete. V: KOSEM, Iztok (ur.). Kolokacije v slovenščini. 1. izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2021. Str. 125-157, ilustr. Zbirka Sporazumevanje. ISBN 978-961-06-0537-9. ISSN 2738-4527. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/318/465/6974-1.

PORI, Eva, ČIBEJ, Jaka, ARHAR HOLDT, Špela, KOSEM, Iztok. The attitude of dictionary users towards automatically extracted collocation data: a user study. V: KOSEM, Iztok (ur.), GANTAR, Polona (ur.). Kolokacije v leksikografiji : obstoječe rešitve in izzivi za prihodnost = Collocations in lexicography : existing solutions and future challenges. Ljubljana: Znanstvena založba Filozofske fakultete, 2020. Letn. 8, št. 2, str. 168-201, ilustr. Slovenščina 2.0, 2, 2020. ISBN 978-961-06-0360-3. ISSN 2335-2736. https://revije.ff.uni-lj.si/slovenscina2/article/view/9143/9075, DOI: 10.4312/slo2.0.2020.2.168-201.

KOSEM, Iztok, KREK, Simon, GANTAR, Polona. Defining collocation for Slovenian lexical resources. V: KOSEM, Iztok (ur.), GANTAR, Polona (ur.). Kolokacije v leksikografiji : obstoječe rešitve in izzivi za prihodnost = Collocations in lexicography : existing solutions and future challenges. Ljubljana: Znanstvena založba Filozofske fakultete, 2020. Letn. 8, št. 2, str. 1-27, ilustr. Slovenščina 2.0, 2, 2020. ISBN 978-961-06-0360-3. ISSN 2335-2736. https://revije.ff.uni-lj.si/slovenscina2/article/view/9338/9069, DOI: 10.4312/slo2.0.2020.2.1-27.

KOSEM, Iztok, KREK, Simon, GANTAR, Polona, ARHAR HOLDT, Špela, ČIBEJ, Jaka, LASKOWSKI, Cyprian. Kolokacijski slovar sodobne slovenščine. V: FIŠER, Darja (ur.), PANČUR, Andrej (ur.). Zbornik konference Jezikovne tehnologije in digitalna humanistika / Proceedings of the conference on Language Technologies & Digital Humanities, 20.-21. september 2018, Ljubljana. Ljubljana: Znanstvena založba Filozofske fakultete v Ljubljani. 2018, str. 133.139, http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Kosem-et-al_Kolokacijski-slovar-sodobne-slovenscine.pdf.

KOSEM, Iztok, KREK, Simon, GANTAR, Polona, ARHAR HOLDT, Špela, ČIBEJ, Jaka, LASKOWSKI, Cyprian. Collocations dictionary of modern Slovene. V: ČIBEJ, Jaka (ur.), et al. Proceedings of the 18th EURALEX International Congress: lexicography in global contexts, 17-21 July 2018, Ljubljana. Ljubljana: Ljubljana University Press, Faculty of Arts. 2018, str. 989-997, ilustr. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/118/211/3000-1.

KOSEM, Iztok, KOPPEL, Kristina, ZINGANO KUHN, Tanara, MICHELFEIT, Jan, TIBERIUS, Carole. Identification and automatic extraction of good dictionary examples: the case(s) of GDEX. International journal of lexicography, https://academic.oup.com/ijl/advance-article/doi/10.1093/ijl/ecy014/5075863.

GANTAR, Polona, GORJANC, Vojko, KOSEM, Iztok, KREK, Simon. Going semi-automatic and crowdsourced: collocation dictionary of Slovene. V: KOSEM, Iztok (ur.). Electronic lexicography in the 21st century: linking lexical data in the digital age. Ljubljana: Trojina, Institute for Applied Slovene Studies; Brighton: Lexical Computing. 2015, str. 37.

GORJANC, Vojko, GANTAR, Polona, KOSEM, Iztok, KREK, Simon (ur.) Slovar sodobne slovenščine: problemi in rešitve. Ljubljana: Znanstvena založba Filozofske fakultete. 2015. Deloma prevedeno v: GORJANC, Vojko, GANTAR, Polona, KOSEM, Iztok, KREK, Simon (ur.) Dictionary of modern Slovene: problems and solutions. Ljubljana: Ljubljana University Press, Faculty of Arts, 2017. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/book/15

GANTAR, Polona, KOSEM, Iztok, KREK, Simon. Discovering automated lexicography = the case of Slovene lexical database. International journal of lexicography, 2016, vol. 29, issue 2, str. 200-225. https://academic.oup.com/ijl/article/29/2/200/2413284/Discovering-Automated-Lexicography-The-Case-of-the?guestAccessKey=95f18766-f10f-4994-a6fa-448cf75ac55e

KOSEM, Iztok, GANTAR, Polona, KREK, Simon. Avtomatizacija leksikografskih postopkov. V: ERJAVEC, Tomaž (ur.), ŽGANEC GROS, Jerneja (ur.). Jezikovne tehnologije, Slovenščina 2.0, letn. 1, št. 2. Ljubljana: Trojina, zavod za uporabno slovenistiko. 2013, str. 139-164. http://www.trojina.org/slovenscina2.0/arhiv/2013/2/Slo2.0_2013_2_07.pdf

ČIBEJ, Jaka, FIŠER, Darja, KOSEM, Iztok. The role of crowdsourcing in lexicography. V: KOSEM, Iztok (ur.), et al. Electronic lexicography in the 21st century: linking lexical data in the digital age. Ljubljana: Trojina, Institute for Applied Slovene Studies; Brighton: Lexical Computing. 2015, str. 70-83. https://elex.link/elex2015/proceedings/eLex_2015_05_Cibej+Fiser+Kosem.pdf

ARHAR HOLDT, Špela, ČIBEJ, Jaka, ZWITTER VITEZ, Ana. Value of language-related questions and comments in digital media for lexicographical user research. International journal of lexicography, 2017, vol. 30, issue 3, str. 285-308. http://ijl.oxfordjournals.org/content/early/2016/04/20/ijl.ecw017.full.pdf?keytype=ref&ijkey=SP5Yb4PHvfykRkk.

ARHAR HOLDT, Špela, KOSEM, Iztok, GANTAR, Polona. Dictionary user typology: the Slovenian case. V: MARGALITADZE, Tinatin (ur.), MELADZE, George (ur.). Lexicography and linguistic diversity: proceedings of the XVII EURALEX International Congress. Tbilisi: Ivane Javakhishvili Tbilisi State University. 2016, str. 179-187. http://euralex2016.tsu.ge/publication2016.pdf

KOSEM, Iztok, GANTAR, Polona, KREK, Simon. Automation of lexicographic work: an opportunity for both lexicographers and crowd-sourcing. V: KOSEM, Iztok (ur.), et al. Electronic lexicography in the 21st century: thinking outside the paper. Ljubljana: Trojina, Institute for Applied Slovene Studies; Tallinn: Eesti Keele Instituut. 2013, str. 32-48. http://eki.ee/elex2013/proceedings/eLex2013_03_Kosem+Gantar+Krek.pdf

KOSEM, Iztok, HUSAK, Milos, MCCARTHY, Diana. GDEX for Slovene. V: KOSEM, Iztok (ur.), KOSEM, Karmen (ur.). Electronic lexicography in the 21st century: new applications for new users. Ljubljana: Trojina, Institute for Applied Slovene Studies. 2011, str. 150-159. http://www.trojina.si/elex2011/elex2011_proceedings.pdf

LOGAR, Nataša, GRČAR, Miha, BRAKUS, Marko, ERJAVEC, Tomaž, ARHAR HOLDT, Špela, KREK, Simon. Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES : gradnja, vsebina, uporaba. Ljubljana: Trojina, zavod za uporabno slovenistiko: Fakulteta za družbene vede, 2012.

KREK, Simon, GANTAR, Polona, ARHAR HOLDT, Špela, GORJANC, Vojko. Nadgradnja korpusov Gigafida, Kres, ccGigafida in ccKres. V: ERJAVEC, Tomaž (ur.), FIŠER, Darja (ur.). Zbornik konference Jezikovne tehnologije in digitalna humanistika. Ljubljana: Znanstvena založba Filozofske fakultete. 2016, str. 200-202. http://www.sdjt.si/wp/wp-content/uploads/2016/09/JTDH-2016_Krek-et-al_Nadgradnja-korpusov-Gigafida-Kres-ccGigafida-ccKres.pdf

KREK, Simon, KOSEM, Iztok, GANTAR, Polona. Predlog za izdelavo Slovarja sodobnega slovenskega jezika. Izd. 1.1. Ljubljana: s. n., 2013. http://www.sssj.si/datoteke/Predlog_SSSJ_v1.1.pdf

KREK, Simon, LASKOWSKI, Cyprian, ROBNIK-ŠIKONJA, Marko. From translation equivalents to synonyms: creation of a Slovene thesaurus using word co-occurrence network analysis. V: KOSEM, Iztok (ur.) et al., Proceedings of eLex 2017: Lexicography from Scratch, 19-21 September 2017, Leiden, Netherlands. https://elex.link/elex2017/wp-content/uploads/2017/09/paper05.pdf

Impressum

Kolokacije 2.2

Kolokacijski slovar sodobne slovenščine

Online dictionary at viri.cjvt.si
Viri CJVT
ISSN 2630-4015

Ljubljana, 2023

This work is licensed under a Creative Commons licence:
Creative Commons Attribution-ShareAlike International 4.0.

Edited by
Iztok Kosem (member of editorial board, author)
Špela Arhar Holdt (member of editorial board, author)
Simon Krek (member of editorial board, author)
Polona Gantar (member of editorial board, author)
Eva Pori (member of editorial board, author)
Urška Kamenšek (author)
Primož Ponikvar (author)
Rebeka Roblek (author)
Jure Šešet (author)
Petra Zaranšek (author)
Karolina Zgaga (author)
Jaka Čibej (member of editorial board)
Bojan Klemenc (member of editorial board)
Cyprian Laskowski (member of editorial board)
Kaja Dobrovoljc (member of editorial board)
Vojko Gorjanc (member of editorial board)
Nikola Ljubešić (member of editorial board)

Interface design
Gašper Uršič
Gregor Makovec
(Studio Kruh)

Interface development
Leon Noe Jovan

Issued by
Centre for Language Resources and Technologies, University of Ljubljana
Ljubljana University Press, Faculty of Arts

For the issuer
Mojca Schlamberger Brezar, Dean of the Faculty of Arts, University of Ljubljana

Published by
Ljubljana University Press, Faculty of Arts
(until 2022) University of Ljubljana Academic Press

For the publisher
Gregor Majdič, Rector of the University of Ljubljana

Citation
Kolokacije 2.2: Collocations Dictionary of Modern Slovene, viri.cjvt.si/kolokacije, accessed on 07. 07. 2026.

Versions

Collocations Dictionary of Modern Slovene 2.2

Date of publication: 28. 11. 2025
Number of headwords: 0
Number of collocations: 0
Number of examples: 0

URL: http://hdl.handle.net/11356/2090

Collocations Dictionary of Modern Slovene 2.1

Date of publication: 24. 11. 2024
Number of headwords: 81,442
Number of collocations: 4,462,007
Number of examples: 14,542,117

Collocations Dictionary of Modern Slovene 2.0

Date of publication: 15. 11. 2022
Number of headwords: 81,443
Number of collocations: 4,491,958
Number of examples: 14,595,325

URL: http://hdl.handle.net/11356/1933

Collocations Dictionary of Modern Slovene 1.0

Datum izdaje posodobitve: 16. 10. 2018
Number of headwords: 35,989
Number of collocations: 7,338,801
Number of examples: 34,935,880

URL: http://hdl.handle.net/11356/1250