{"id":3237,"date":"2025-11-24T10:11:44","date_gmt":"2025-11-24T08:11:44","guid":{"rendered":"http:\/\/158.129.51.247:8888\/?p=3237"},"modified":"2025-11-24T10:11:45","modified_gmt":"2025-11-24T08:11:45","slug":"key-events-the-vectorized-lithuanian-language-model","status":"publish","type":"post","link":"https:\/\/clarin-lt.lt\/?p=3237","title":{"rendered":"Key events. The vectorized Lithuanian language model"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"19\" src=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1024x19.png\" alt=\"\" class=\"wp-image-3232\" srcset=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1024x19.png 1024w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-300x5.png 300w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-768x14.png 768w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1536x28.png 1536w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-100x2.png 100w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-150x3.png 150w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-200x4.png 200w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-450x8.png 450w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-600x11.png 600w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-900x16.png 900w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15.png 1650w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>On 3 November 2025, the <em>Lithuanian Language Vector Model<\/em> (<em>LT-MLKM-modernBERT<\/em>) developed by the <a href=\"https:\/\/vssa.lrv.lt\/en\/\" target=\"_blank\" rel=\"noreferrer noopener\">State Digital Solutions Agency<\/a> (SDSA) in collaboration with <a href=\"https:\/\/www.vdu.lt\/en\/\" target=\"_blank\" rel=\"noreferrer noopener\">Vytautas Magnus University (VMU)<\/a>, <a href=\"https:\/\/www.neurotechnology.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">UAB Neurotechnology<\/a>, <a href=\"https:\/\/tilde.ai\/lt\/\" target=\"_blank\" rel=\"noreferrer noopener\">UAB Tilde Lietuva<\/a>, and <a href=\"https:\/\/card-ai.eu\/\" target=\"_blank\" rel=\"noreferrer noopener\">MB Krilas<\/a>, was publicly released as a part of implementing the project &#8220;Development of the General Lithuanian Language Corpus and Vectorised Lithuanian Language Models&#8221;. Further details regarding the project can be found on the website of the <a href=\"https:\/\/sitti.vdu.lt\/bendrasis-lietuviu-kalbos-tekstynas-ir-vektorizuoti-modeliai\/\" target=\"_blank\" rel=\"noreferrer noopener\">Institute of Digital Resources and Interdisciplinary Research (SITTI)<\/a>.<\/p>\n\n\n\n<p>\u00a0The SDSA project manager is A. Rakauskas, and the supplier group leader is Assoc. Prof. Dr. <a href=\"https:\/\/www.vdu.lt\/cris\/entities\/person\/andrius-utka\/events\">Andrius Utka<\/a>.<\/p>\n\n\n\n<p><em>LT-MLKM-modernBERT<\/em> is a Lithuanian masked language model (MLM) built on the <em>ModernBERT<\/em> architecture and pre-trained on the <em>BLKT Lithuanian Text Corpus Stage 3<\/em>, comprising over&nbsp;1.87 billion words and 49 billion training tokens from diverse Lithuanian sources, including news, legal, academic, and public-sector texts.&nbsp;Employing a context length of 8,192 tokens, this model effectively processes extensive documents while ensuring linguistic accuracy and textual coherence.<\/p>\n\n\n\n<p>The <em>LT-MLKM-modernBERT<\/em> model represents a high-quality Lithuanian language resource, featuring pre-trained neural models designed to advance research and development in artificial intelligence as well as facilitate the practical adaptation of digital innovations to real-world requirements.<\/p>\n\n\n\n<p>The complete model information, including model description, instructions for getting started with the model, usage options, training details, limitations, and more, can be found <a href=\"https:\/\/huggingface.co\/VSSA-SDSA\/LT-MLKM-modernBERT\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\n\n\n\n<p>The <em>LT-MLKM-modernBERT<\/em> model is already available on the <a href=\"https:\/\/huggingface.co\/VSSA-SDSA\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging Face<\/a> platform in open access.<\/p>\n\n\n\n<p>Also read about the <em>LT-MLKM-modernBERT<\/em> model in the following sources:<\/p>\n\n\n\n<p><a href=\"https:\/\/vssa.lrv.lt\/lt\/naujienos\/sukurtas-pirmasis-lietuviu-kalbos-dirbtinio-intelekto-modelis-lietuviu-tyreju-zingsnis-i-di-ateiti-yx9\/?fbclid=IwZXh0bgNhZW0CMTAAYnJpZBEwb2F6OHpOT05MNHJ1c2ZNVnNydGMGYXBwX2lkEDIyMjAzOTE3ODgyMDA4OTIAAR5l-nO2Et3-8WgIqBe4DHOCm6IiiwqPAwyOKMpdP4fWM7RSD7DxMa-H4O1_rg_aem_se7T12rb8vzXeIqqy4xXvg\" target=\"_blank\" rel=\"noreferrer noopener\">VSSA article 1<\/a><\/p>\n\n\n\n<p><a href=\"https:\/\/data.gov.lt\/datasets\/3923\/\" target=\"_blank\" rel=\"noreferrer noopener\">VSSA article 2<\/a><\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" href=\"https:\/\/www.lrytas.lt\/it\/dirbtinis-intelektas\/2025\/11\/10\/news\/lietuviu-tyreju-zingsnis-i-di-ateiti-sukurtas-pirmasis-lietuviu-kalbos-dirbtinio-intelekto-modelis-40221793\" target=\"_blank\">Lietuvos Rytas article<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"19\" src=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1024x19.png\" alt=\"\" class=\"wp-image-3232\" srcset=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1024x19.png 1024w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-300x5.png 300w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-768x14.png 768w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1536x28.png 1536w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-100x2.png 100w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-150x3.png 150w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-200x4.png 200w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-450x8.png 450w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-600x11.png 600w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-900x16.png 900w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15.png 1650w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>On 3 November 2025, the Lithuanian Language Vector Model (LT-MLKM-modernBERT) developed by the State Digital Solutions Agency (SDSA) in collaboration with Vytautas Magnus University (VMU), UAB Neurotechnology, UAB Tilde Lietuva, and MB Krilas, was publicly released as a part of<span class=\"ellipsis\">&hellip;<\/span><\/p>\n<div class=\"read-more\"><a href=\"https:\/\/clarin-lt.lt\/?p=3237\">Read more &#8250;<\/a><\/div>\n<p><!-- end of .read-more --><\/p>\n","protected":false},"author":7,"featured_media":3233,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3237","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/posts\/3237","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3237"}],"version-history":[{"count":1,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/posts\/3237\/revisions"}],"predecessor-version":[{"id":3238,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/posts\/3237\/revisions\/3238"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/media\/3233"}],"wp:attachment":[{"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3237"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3237"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3237"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}