Language Monitor

The biblographical data of the Language Monitor can be found at the link below.

Current version

1.0

Date of last update:
24. 1. 2021



About the Language Monitor

The Language Monitor shows the information on temporal trends of words and N-grams. The Language Monitor uses the data of the Gigafida 2.0 reference corpus of Modern Slovene (Krek et al. 2020) for the period up to 2018, and the IJS NewsFeed service (from 2019 onwards) that extracts texts from over 100 different Slovenian online sources. Using these two resources two corpora are made: a focus corpus containing texts from the period under analysis (e.g. 2020) and a larger corpus containing texts from the longer preceding period (e.g. 1991-2019). For each of the two corpora, word lists and N-gram lists are then extracted using the LIST 1.2 software (Krsnik et al. 2019).

For identifying the keywords of the period under analysis, the Simple Maths statistical method (Kilgarriff 2009) is used. The method compares relative frequencies of words and N-grams in the focus corpus with those in the larger corpus. The words with the highest Simple Maths scores thus exhibit the highest increase in usage compared to previous years.

Each of the words featured in the Language Monitor, is accompanied with the graph showing its usage in the selected time period. The users can also compare temporal trends of several different words. For the top 100 words, their most frequent N-grams are also provided.



Bibliography

KILGARRIFF, Adam. Simple maths for keywords. V: Mahlberg, M., González-Díaz, V. & Smith, C. (ur.), Proceedings of Corpus Linguistics Conference CL2009, University of Liverpool, UK, July 2009. https://www.sketchengine.eu/wp-content/uploads/2015/04/2009-Simple-maths-for-keywords.pdf

KREK, Simon, ARHAR HOLDT, Špela, ERJAVEC, Tomaž, ČIBEJ, Jaka, REPAR, Andraž, GANTAR, Polona, LJUBEŠIĆ, Nikola, KOSEM, Iztok, DOBROVOLJC, Kaja. Gigafida 2.0: The Reference Corpus of Written Standard Slovene. V: Proceedings of the 12th Language Resources and Evaluation Conference" 2020. European Language Resources Association", str. 3340--3345". https://www.aclweb.org/anthology/2020.lrec-1.409

KRSNIK, Luka, ARHAR HOLDT, Špela, ČIBEJ, Jaka, DOBROVOLJC, Kaja, KLJUČEVŠEK, Aleksander, KREK, Simon, ROBNIK ŠIKONJA, Marko. Corpus extraction tool LIST 1.0, (CLARIN.SI data & tools). Ljubljana: Centre for Language Resources and Technologies: Faculty of Computer and Information Science; Jožef Stefan Institute, 2019. https://www.clarin.si/repository/xmlui/handle/11356/1227.

TRAMPUŠ, Mitja, NOVAK, Blaž. The Internals Of An Aggregated Web News Feed. Proceedings of 15th Multiconference on Information Society 2012 (IS-2012). http://ailab.ijs.si/dunja/SiKDD2012/Papers/Trampus_Newsfeed.pdf