About the Collocations Dictionary

Collocations are typical co-occurrences of words and represent an important part of the language. By providing information on what is typical in language, collocations dictionaries are useful in language production and acquisition. The Collocations Dictionary of Modern Slovene, which in its second version contains 81,443 headwords and 4,491,958 collocations, is the first dictionary of collocations for Slovene and represents the first step towards filling the gap in the field of language resources for Slovene, particularly those aimed at facilitating language production.

The dictionary has been compiled using advanced computational methods for the automatic extraction of Slovene collocations, which have already been tested and evaluated and are constantly improved. In terms of financial resources, computer-assisted data preparation is significantly less demanding and more economical than manual processing, and is also significantly less time-consuming. This enables regular updates and upgrades to the resource, making the dictionary a dynamic source of language information.

The Collocations Dictionary of Modern Slovene is the second responsive dictionary published in Slovenia (the first being the Thesaurus of Modern Slovene). With responsive dictionaries, the compilation of the dictionary database is immediately followed by providing the language community with access to a large amount of relevant, albeit somewhat noisy language information. One of the main advantages of responsive dictionaries is the fact that the data can be quickly updated based on both the progress in the database as well as the changes in language use.

Based on the findings of various user studies, we have introduced a new format of entry presentation. The five stages of development used in the first version, indicated by different levels of colouration of the pyramid icon, have been replaced by three stages, which are easily identifiable to the users through the section titles:

  • Stage 1: contains only the section “Automatically extracted collocations”. For these entries, sense division and collocation analysis have not yet been conducted.
  • Stage 2: contains both sections “Collocations” and “Automatically extracted collocations”. These entries already contain sense division. While a part of the collocations have already been manually analysed and distributed under relevant senses, there remain some automatically extracted collocations.
  • Stage 3: contains only the section “Collocations”. These entries contain sense division and manually analysed collocations.

The timestamp, along with archived previous versions of the dictionary database, enables tracking any changes in dictionary entries.

An important addition is a new, adapted method of user participation in improving the dictionary. This method has been introduced based on research and user studies. The users can now evaluate automatically selected good dictionary examples, focussing on whether the selected corpus sentence suitably exemplifies the selected collocation. If the entry already contains senses, the users can also evaluate whether the example is listed under the correct sense or, in the case of automatically extracted collocations, select the relevant sense for the example (and the collocation).

Although automatic data extraction and ranking are never perfectly accurate, the results are useful to dictionary users even before lexicographic post-processing. This has been confirmed by resources and tools for other languages (e.g. Merriam-Webster, the Digital Dictionary of German DWDS), which include automatically extracted data in their entries. An example of a successfully automatically generated language resource for Slovene is the Thesaurus of Modern Slovene. The effectiveness of automatic collocation extraction has also been confirmed by a linguistic evaluation, during which the automatically extracted collocations in the ten most frequent syntactic structures of 333 headwords have been rated as adequate or inadequate. For the second version of the dictionary, the method of automatic collocation extraction has been improved, enabling the inclusion of collocations from additional syntactic structures in dictionary entries.

The Collocations Dictionary of Modern Slovene is part of an organized effort to establish an infrastructure for Slovene that is comparable to the infrastructures of larger languages. We believe that, in terms of methodology, the construction of language resources should follow the contemporary zeitgeist and that all data prepared through publicly financed initiatives and projects should be openly accessible for the further development of language technologies, considering the actual needs of modern language users in the digital age. The process of constructing the Collocations Dictionary of Modern Slovene thus also puts considerable effort into establishing a dedicated community that not only uses the dictionary but also contributes to its development.

Publications

PORI, Eva, KOSEM, Iztok, ČIBEJ, Jaka, ARHAR HOLDT, Špela. Evalvacija uporabniškega vmesnika Kolokacijskega slovarja sodobne slovenščine. V: KOSEM, Iztok (ur.). Kolokacije v slovenščini. 1. izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2021. Str. 235-268, ilustr. Zbirka Sporazumevanje. ISBN 978-961-06-0537-9. ISSN 2738-4527. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/318/465/6973-1.

GANTAR, Polona, KREK, Simon, KOSEM, Iztok. Opredelitev kolokacij v digitalnih slovarskih virih za slovenščino. V: KOSEM, Iztok (ur.). Kolokacije v slovenščini. 1. izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2021. Str. 15-41, ilustr. Zbirka Sporazumevanje. ISBN 978-961-06-0537-9. ISSN 2738-4527. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/318/465/6969-1.

KREK, Simon, GANTAR, Polona, KOSEM, Iztok, DOBROVOLJC, Kaja. Opis modela za pridobivanje in strukturiranje kolokacijskih podatkov iz korpusa. V: ARHAR HOLDT, Špela (ur.). Nova slovnica sodobne standardne slovenščine : viri in metode. 1. izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2021. Str. 160-194, ilustr. Zbirka Sporazumevanje. ISBN 978-961-06-0547-8. ISSN 2738-4527. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/325/477/7313-1.

KOSEM, Iztok, LOGAR, Nataša, DOBROVOLJC, Kaja, LJUBEŠIĆ, Nikola. Razvrščanje in relevantnost kolokatorjev v slovenščini : novi pristopi. V: KOSEM, Iztok (ur.). Kolokacije v slovenščini. 1. izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2021. Str. 79-124, ilustr. Zbirka Sporazumevanje. ISBN 978-961-06-0537-9. ISSN 2738-4527. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/318/465/6971-1.

LJUBEŠIĆ, Nikola, LOGAR, Nataša, KOSEM, Iztok. Collocation ranking : frequency vs semantics. Slovenščina 2.0 : empirične, aplikativne in interdisciplinarne raziskave. 2021, letn. 9, št. 2, str. 41-70, ilustr. ISSN 2335-2736. https://revije.ff.uni-lj.si/slovenscina2/article/view/10365/9997, DOI: 10.4312/slo2.0.2021.2.41-70.

ARHAR HOLDT, Špela. Razvrstitev kolokacij v slovarskem vmesniku : uporabniške prioritete. V: KOSEM, Iztok (ur.). Kolokacije v slovenščini. 1. izd. Ljubljana: Znanstvena založba Filozofske fakultete, 2021. Str. 125-157, ilustr. Zbirka Sporazumevanje. ISBN 978-961-06-0537-9. ISSN 2738-4527. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/318/465/6974-1.

PORI, Eva, ČIBEJ, Jaka, ARHAR HOLDT, Špela, KOSEM, Iztok. The attitude of dictionary users towards automatically extracted collocation data: a user study. V: KOSEM, Iztok (ur.), GANTAR, Polona (ur.). Kolokacije v leksikografiji : obstoječe rešitve in izzivi za prihodnost = Collocations in lexicography : existing solutions and future challenges. Ljubljana: Znanstvena založba Filozofske fakultete, 2020. Letn. 8, št. 2, str. 168-201, ilustr. Slovenščina 2.0, 2, 2020. ISBN 978-961-06-0360-3. ISSN 2335-2736. https://revije.ff.uni-lj.si/slovenscina2/article/view/9143/9075, DOI: 10.4312/slo2.0.2020.2.168-201.

KOSEM, Iztok, KREK, Simon, GANTAR, Polona. Defining collocation for Slovenian lexical resources. V: KOSEM, Iztok (ur.), GANTAR, Polona (ur.). Kolokacije v leksikografiji : obstoječe rešitve in izzivi za prihodnost = Collocations in lexicography : existing solutions and future challenges. Ljubljana: Znanstvena založba Filozofske fakultete, 2020. Letn. 8, št. 2, str. 1-27, ilustr. Slovenščina 2.0, 2, 2020. ISBN 978-961-06-0360-3. ISSN 2335-2736. https://revije.ff.uni-lj.si/slovenscina2/article/view/9338/9069, DOI: 10.4312/slo2.0.2020.2.1-27.

KOSEM, Iztok, KREK, Simon, GANTAR, Polona, ARHAR HOLDT, Špela, ČIBEJ, Jaka, LASKOWSKI, Cyprian. Kolokacijski slovar sodobne slovenščine. V: FIŠER, Darja (ur.), PANČUR, Andrej (ur.). Zbornik konference Jezikovne tehnologije in digitalna humanistika / Proceedings of the conference on Language Technologies & Digital Humanities, 20.-21. september 2018, Ljubljana. Ljubljana: Znanstvena založba Filozofske fakultete v Ljubljani. 2018, str. 133.139, http://www.sdjt.si/wp/wp-content/uploads/2018/09/JTDH-2018_Kosem-et-al_Kolokacijski-slovar-sodobne-slovenscine.pdf.

KOSEM, Iztok, KREK, Simon, GANTAR, Polona, ARHAR HOLDT, Špela, ČIBEJ, Jaka, LASKOWSKI, Cyprian. Collocations dictionary of modern Slovene. V: ČIBEJ, Jaka (ur.), et al. Proceedings of the 18th EURALEX International Congress: lexicography in global contexts, 17-21 July 2018, Ljubljana. Ljubljana: Ljubljana University Press, Faculty of Arts. 2018, str. 989-997, ilustr. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/118/211/3000-1.

KOSEM, Iztok, KOPPEL, Kristina, ZINGANO KUHN, Tanara, MICHELFEIT, Jan, TIBERIUS, Carole. Identification and automatic extraction of good dictionary examples: the case(s) of GDEX. International journal of lexicography, https://academic.oup.com/ijl/advance-article/doi/10.1093/ijl/ecy014/5075863.

GANTAR, Polona, GORJANC, Vojko, KOSEM, Iztok, KREK, Simon. Going semi-automatic and crowdsourced: collocation dictionary of Slovene. V: KOSEM, Iztok (ur.). Electronic lexicography in the 21st century: linking lexical data in the digital age. Ljubljana: Trojina, Institute for Applied Slovene Studies; Brighton: Lexical Computing. 2015, str. 37.

GORJANC, Vojko, GANTAR, Polona, KOSEM, Iztok, KREK, Simon (ur.) Slovar sodobne slovenščine: problemi in rešitve. Ljubljana: Znanstvena založba Filozofske fakultete. 2015. Deloma prevedeno v: GORJANC, Vojko, GANTAR, Polona, KOSEM, Iztok, KREK, Simon (ur.) Dictionary of modern Slovene: problems and solutions. Ljubljana: Ljubljana University Press, Faculty of Arts, 2017. https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/book/15

GANTAR, Polona, KOSEM, Iztok, KREK, Simon. Discovering automated lexicography = the case of Slovene lexical database. International journal of lexicography, 2016, vol. 29, issue 2, str. 200-225. https://academic.oup.com/ijl/article/29/2/200/2413284/Discovering-Automated-Lexicography-The-Case-of-the?guestAccessKey=95f18766-f10f-4994-a6fa-448cf75ac55e

KOSEM, Iztok, GANTAR, Polona, KREK, Simon. Avtomatizacija leksikografskih postopkov. V: ERJAVEC, Tomaž (ur.), ŽGANEC GROS, Jerneja (ur.). Jezikovne tehnologije, Slovenščina 2.0, letn. 1, št. 2. Ljubljana: Trojina, zavod za uporabno slovenistiko. 2013, str. 139-164. http://www.trojina.org/slovenscina2.0/arhiv/2013/2/Slo2.0_2013_2_07.pdf

ČIBEJ, Jaka, FIŠER, Darja, KOSEM, Iztok. The role of crowdsourcing in lexicography. V: KOSEM, Iztok (ur.), et al. Electronic lexicography in the 21st century: linking lexical data in the digital age. Ljubljana: Trojina, Institute for Applied Slovene Studies; Brighton: Lexical Computing. 2015, str. 70-83. https://elex.link/elex2015/proceedings/eLex_2015_05_Cibej+Fiser+Kosem.pdf

ARHAR HOLDT, Špela, ČIBEJ, Jaka, ZWITTER VITEZ, Ana. Value of language-related questions and comments in digital media for lexicographical user research. International journal of lexicography, 2017, vol. 30, issue 3, str. 285-308. http://ijl.oxfordjournals.org/content/early/2016/04/20/ijl.ecw017.full.pdf?keytype=ref&ijkey=SP5Yb4PHvfykRkk.

ARHAR HOLDT, Špela, KOSEM, Iztok, GANTAR, Polona. Dictionary user typology: the Slovenian case. V: MARGALITADZE, Tinatin (ur.), MELADZE, George (ur.). Lexicography and linguistic diversity: proceedings of the XVII EURALEX International Congress. Tbilisi: Ivane Javakhishvili Tbilisi State University. 2016, str. 179-187. http://euralex2016.tsu.ge/publication2016.pdf

KOSEM, Iztok, GANTAR, Polona, KREK, Simon. Automation of lexicographic work: an opportunity for both lexicographers and crowd-sourcing. V: KOSEM, Iztok (ur.), et al. Electronic lexicography in the 21st century: thinking outside the paper. Ljubljana: Trojina, Institute for Applied Slovene Studies; Tallinn: Eesti Keele Instituut. 2013, str. 32-48. http://eki.ee/elex2013/proceedings/eLex2013_03_Kosem+Gantar+Krek.pdf

KOSEM, Iztok, HUSAK, Milos, MCCARTHY, Diana. GDEX for Slovene. V: KOSEM, Iztok (ur.), KOSEM, Karmen (ur.). Electronic lexicography in the 21st century: new applications for new users. Ljubljana: Trojina, Institute for Applied Slovene Studies. 2011, str. 150-159. http://www.trojina.si/elex2011/elex2011_proceedings.pdf

LOGAR, Nataša, GRČAR, Miha, BRAKUS, Marko, ERJAVEC, Tomaž, ARHAR HOLDT, Špela, KREK, Simon. Korpusi slovenskega jezika Gigafida, KRES, ccGigafida in ccKRES : gradnja, vsebina, uporaba. Ljubljana: Trojina, zavod za uporabno slovenistiko: Fakulteta za družbene vede, 2012.

KREK, Simon, GANTAR, Polona, ARHAR HOLDT, Špela, GORJANC, Vojko. Nadgradnja korpusov Gigafida, Kres, ccGigafida in ccKres. V: ERJAVEC, Tomaž (ur.), FIŠER, Darja (ur.). Zbornik konference Jezikovne tehnologije in digitalna humanistika. Ljubljana: Znanstvena založba Filozofske fakultete. 2016, str. 200-202. http://www.sdjt.si/wp/wp-content/uploads/2016/09/JTDH-2016_Krek-et-al_Nadgradnja-korpusov-Gigafida-Kres-ccGigafida-ccKres.pdf

KREK, Simon, KOSEM, Iztok, GANTAR, Polona. Predlog za izdelavo Slovarja sodobnega slovenskega jezika. Izd. 1.1. Ljubljana: s. n., 2013. http://www.sssj.si/datoteke/Predlog_SSSJ_v1.1.pdf

KREK, Simon, LASKOWSKI, Cyprian, ROBNIK-ŠIKONJA, Marko. From translation equivalents to synonyms: creation of a Slovene thesaurus using word co-occurrence network analysis. V: KOSEM, Iztok (ur.) et al., Proceedings of eLex 2017: Lexicography from Scratch, 19-21 September 2017, Leiden, Netherlands. https://elex.link/elex2017/wp-content/uploads/2017/09/paper05.pdf

Impressum

Kolokacije 2.0

Kolokacijski slovar sodobne slovenščine

Online dictionary at viri.cjvt.si
Viri CJVT
ISSN 2630-4015

Ljubljana, 2023

This work is licensed under a Creative Commons licence:
Creative Commons Attribution-ShareAlike International 4.0.

Edited by
Iztok Kosem
Špela Arhar Holdt
Simon Krek
Polona Gantar
Eva Pori
Jaka Čibej
Bojan Klemenc
Cyprian Laskowski
Kaja Dobrovoljc
Vojko Gorjanc
Nikola Ljubešić

Interface design
Gašper Uršič
Gregor Makovec
(Studio Kruh)

Interface development
Leon Noe Jovan

Issued by
Centre for Language Resources and Technologies, University of Ljubljana

Published by
Ljubljana University Press, Faculty of Arts

For the publisher
Mojca Schlamberger Brezar, Dean of the Faculty of Arts, University of Ljubljana

Citation
Kolokacije 2.0: Collocations Dictionary of Modern Slovene, viri.cjvt.si/kolokacije, accessed on 19. 04. 2024.

Versions

Collocations Dictionary of Modern Slovene 2.0

Date of publication: 15. 11. 2022
Number of headwords: 81,443
Number of collocations: 4,491,958
Number of examples: 14,595,325


Collocations Dictionary of Modern Slovene 1.0

Datum izdaje posodobitve: 16. 10. 2018
Number of headwords: 35,989
Number of collocations: 7,338,801
Number of examples: 34,935,880

URL: http://hdl.handle.net/11356/1250