{"id":3231,"date":"2025-11-24T10:04:00","date_gmt":"2025-11-24T08:04:00","guid":{"rendered":"http:\/\/158.129.51.247:8888\/?p=3231"},"modified":"2025-11-24T10:04:01","modified_gmt":"2025-11-24T08:04:01","slug":"svarbus-ivykiai-mazasis-lietuviu-kalbos-vektorizuotas-modelis","status":"publish","type":"post","link":"https:\/\/clarin-lt.lt\/?p=3231","title":{"rendered":"Svarb\u016bs \u012fvykiai. Ma\u017easis lietuvi\u0173 kalbos vektorizuotas modelis"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"19\" src=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1024x19.png\" alt=\"\" class=\"wp-image-3232\" srcset=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1024x19.png 1024w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-300x5.png 300w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-768x14.png 768w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1536x28.png 1536w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-100x2.png 100w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-150x3.png 150w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-200x4.png 200w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-450x8.png 450w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-600x11.png 600w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-900x16.png 900w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15.png 1650w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>2025 m. lapkri\u010dio 3 d. tapo vie\u0161ai prieinamas ir atviras naudojimui <em>Ma\u017easis lietuvi\u0173 kalbos vektorizuotas modelis<\/em> (<em>LT-MLKM-modernBERT<\/em>), kur\u012f suk\u016br\u0117 <a href=\"https:\/\/vssa.lrv.lt\/lt\/\">Valstyb\u0117s skaitmenini\u0173 sprendim\u0173 agent\u016bra (VSSA)<\/a> kartu su <a href=\"https:\/\/www.vdu.lt\/lt\/\">Vytauto Did\u017eiojo universitetu (VDU)<\/a>, <a href=\"https:\/\/www.neurotechnology.com\/\">UAB Neurotechnology<\/a>, <a href=\"https:\/\/tilde.ai\/lt\/\">UAB Tilde Lietuva<\/a> ir <a href=\"https:\/\/card-ai.eu\/\">MB Krilas<\/a> \u012fgyvendindami projekt\u0105 \u201eBendrojo lietuvi\u0173 kalbos tekstyno ir vektorizuot\u0173 modeli\u0173 suk\u016brimas\u201c. Daugiau informacijos apie projekt\u0105 pateikiama <a href=\"https:\/\/sitti.vdu.lt\/bendrasis-lietuviu-kalbos-tekstynas-ir-vektorizuoti-modeliai\/\">Skaitmenini\u0173 i\u0161tekli\u0173 ir tarpdisciplinini\u0173 tyrim\u0173 instituto<\/a> svetain\u0117je.<\/p>\n\n\n\n<p>VSSA projekto vadovas yra A. Rakauskas, o tiek\u0117j\u0173 grup\u0117s vadovas \u2013 doc. dr. <a href=\"https:\/\/www.vdu.lt\/cris\/entities\/person\/andrius-utka\/events\">Andrius Utka<\/a>.<\/p>\n\n\n\n<p><em>LT-MLKM-modernBERT <\/em>\u2013 tai maskuotosios kalbos (angl. <em>Masked language model<\/em> arba sutrumpintai <em>MLM<\/em>) tipo neuroninis lietuvi\u0173 kalbos modelis, kuris sukurtas naudojant <em>ModernBERT<\/em> architekt\u016br\u0105 ir i\u0161 anksto apmokytas (angl.\u00a0<em>pre-trained<\/em>) pasitelkiant <em>Bendr\u0105j\u012f lietuvi\u0173 kalbos tekstyn\u0105 <\/em>(<em>BLKT Lithuanian Text Corpus Stage 3<\/em>), kur\u012f sudaro daugiau nei 1,87 milijardo \u017eod\u017ei\u0173 ir 49 milijardai mokymo vienet\u0173 (angl. <em>tokens<\/em>) i\u0161 \u012fvairi\u0173 Lietuvi\u0173 kalbos \u0161altini\u0173: naujien\u0173, teis\u0117s, akademini\u0173 ir vie\u0161ojo diskurso tekst\u0173. Konteksto ilgis yra 8192 vienetai (angl. <em>tokens<\/em>), tod\u0117l jis efektyviai apdoroja ilgus dokumentus, i\u0161laikydamas lingvistin\u012f tikslum\u0105 ir nuoseklum\u0105. <em>LT-MLKM-modernBERT<\/em> modelis yra auk\u0161tos kokyb\u0117s lietuvi\u0173 kalbos i\u0161teklius su i\u0161 anksto apmokytu neuroniniu modeliu, kuris pad\u0117s pl\u0117toti mokslinius tyrimus dirbtinio intelekto srityje ir pritaikyti skaitmenines inovacijas realiems poreikiams praktikoje.<\/p>\n\n\n\n<p>I\u0161sam\u016bs <em>LT-MLKM-modernBERT <\/em>modelio duomenys \u2013 sandara, paai\u0161kinimai kaip prad\u0117ti naudotis \u0161iuo i\u0161tekliu, naudojimosi galimyb\u0117s, i\u0161 ankstinio apmokymo duomenys, licensijos tipas ir kita naudinga informacija pateikiama <a href=\"https:\/\/huggingface.co\/VSSA-SDSA\/LT-MLKM-modernBERT\">\u010dia<\/a>. Taip pat su\u017einokite apie <a href=\"https:\/\/github.com\/VSSA-AtvirasKodas-LT\/LT_AI-NER?tab=readme-ov-file\">MLKVM validavimo sprendim\u0105<\/a>.<\/p>\n\n\n\n<p><em>Ma\u017easis lietuvi\u0173 kalbos vektorizuotas modelis<\/em> (<em>LT-MLKM-modernBERT<\/em>) jau pasiekiamas atvirojoje prieigoje \u2013 <a href=\"https:\/\/huggingface.co\/VSSA-SDSA\">Hugging Face<\/a> platformoje.<\/p>\n\n\n\n<p>\u017demiau pateikiamas <em>maskuotosios kalbos<\/em> paai\u0161kinimas, bei skirtumas tarp <em>BERT<\/em> ir <em>ModernBERT <\/em>modeli\u0173.<\/p>\n\n\n\n<p><em>Maskuotoji kalba<\/em> da\u017enai naudojama <em>Nat\u016bralios kalbos apdorojime<\/em> (angl. Natural language processing arba sutrumpintai NLP), tai yra neuroninio tinklo architekt\u016bros tipas ir savaranki\u0161ko mokymosi technika leid\u017eianti modeliui numatyti ne\u017einomus teksto elementus, remiantis \u017einomo sakinio dalimi ar platesniu jo kontekstu. \u0160is mokymosi procesas sukuria turting\u0105 mokymosi aplink\u0105: u\u017efiksuoja dvikrypt\u012f kalbos kontekst\u0105, skatina gilesn\u012f sintaks\u0117s ir semantikos supratim\u0105 tekstiniuose duomenyse.<\/p>\n\n\n\n<p>BERT akronimas rei\u0161kia dvikrypt\u012f transformacinio kodavimo b\u016bd\u0105 (angl. Bidirectional Encoder Representations from Transformers), o ModernBERT yra pa\u017eangus BERT architekt\u016bros patobulinimas, skirtas pagerinti \u012fvairi\u0173 nat\u016bralios kalbos u\u017eduo\u010di\u0173 na\u0161um\u0105, pvz., integruoti naujiniai padeda u\u017etikrinti geresn\u012f kalbos apdorojim\u0105 ir kontekstin\u012f supratim\u0105.<\/p>\n\n\n\n<p>Skaitykite apie <em>LT-MLKM-modernBERT<\/em> model\u012f \u0161iuose \u0161altiniuose:<\/p>\n\n\n\n<p><a href=\"https:\/\/vssa.lrv.lt\/lt\/\">Valstyb\u0117s skaitmenini\u0173 sprendim\u0173 agent\u016bros<\/a> straipsn\u012f <a href=\"https:\/\/vssa.lrv.lt\/lt\/naujienos\/sukurtas-pirmasis-lietuviu-kalbos-dirbtinio-intelekto-modelis-lietuviu-tyreju-zingsnis-i-di-ateiti-yx9\/?fbclid=IwZXh0bgNhZW0CMTAAYnJpZBEwb2F6OHpOT05MNHJ1c2ZNVnNydGMGYXBwX2lkEDIyMjAzOTE3ODgyMDA4OTIAAR5l-nO2Et3-8WgIqBe4DHOCm6IiiwqPAwyOKMpdP4fWM7RSD7DxMa-H4O1_rg_aem_se7T12rb8vzXeIqqy4xXvg\"><em>Sukurtas pirmasis lietuvi\u0173 kalbos dirbtinio intelekto modelis: lietuvi\u0173 tyr\u0117j\u0173 \u017eingsnis \u012f DI ateit\u012f<\/em><\/a>;<\/p>\n\n\n\n<p><a href=\"https:\/\/vssa.lrv.lt\/lt\/\">Valstyb\u0117s skaitmenini\u0173 sprendim\u0173 agent\u016bros<\/a> straipsn\u012f <a href=\"https:\/\/data.gov.lt\/datasets\/3923\/\">Ma\u017easis lietuvi\u0173 kalbos vektorizuotas modelis<\/a>;<\/p>\n\n\n\n<p><a href=\"https:\/\/www.lrytas.lt\/\">Lietuvos ryto<\/a> straipsn\u012f <a href=\"https:\/\/www.lrytas.lt\/it\/dirbtinis-intelektas\/2025\/11\/10\/news\/lietuviu-tyreju-zingsnis-i-di-ateiti-sukurtas-pirmasis-lietuviu-kalbos-dirbtinio-intelekto-modelis-40221793\">Lietuvi\u0173 tyr\u0117j\u0173 \u017eingsnis \u012f DI ateit\u012f: sukurtas pirmasis lietuvi\u0173 kalbos dirbtinio intelekto modelis<\/a>.<\/p>\n\n\n\n<p>Projektu prisidedama prie 2021\u20132030 met\u0173 Lietuvos Respublikos Ekonomikos ir inovacij\u0173 ministerijos valstyb\u0117s skaitmeninimo pl\u0117tros programos pa\u017eangos priemon\u0117s Nr. 05-002-01-07-08 \u201eKurti technologinius sprendimus ir \u012frankius, leid\u017eian\u010dius saugiai ir patogiai naudotis paslaugomis\u201c veiklos \u201eKalbini\u0173 i\u0161tekli\u0173 dirbtinio intelekto technologij\u0173 sprendim\u0173 poreikiams pl\u0117tra\u201c \u012fgyvendinimo.&nbsp;<\/p>\n\n\n\n<p>Projektas \u012fgyvendinamas Ekonomikos gaivinimo ir atsparumo didinimo priemon\u0117s (RRF) l\u0117\u0161omis.\u00a0<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"19\" src=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1024x19.png\" alt=\"\" class=\"wp-image-3232\" srcset=\"https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1024x19.png 1024w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-300x5.png 300w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-768x14.png 768w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-1536x28.png 1536w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-100x2.png 100w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-150x3.png 150w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-200x4.png 200w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-450x8.png 450w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-600x11.png 600w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15-900x16.png 900w, https:\/\/clarin-lt.lt\/wp-content\/uploads\/2025\/11\/image-15.png 1650w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>2025 m. lapkri\u010dio 3 d. tapo vie\u0161ai prieinamas ir atviras naudojimui Ma\u017easis lietuvi\u0173 kalbos vektorizuotas modelis (LT-MLKM-modernBERT), kur\u012f suk\u016br\u0117 Valstyb\u0117s skaitmenini\u0173 sprendim\u0173 agent\u016bra (VSSA) kartu su Vytauto Did\u017eiojo universitetu (VDU), UAB Neurotechnology, UAB Tilde Lietuva ir MB Krilas \u012fgyvendindami projekt\u0105<span class=\"ellipsis\">&hellip;<\/span><\/p>\n<div class=\"read-more\"><a href=\"https:\/\/clarin-lt.lt\/?p=3231\">Read more &#8250;<\/a><\/div>\n<p><!-- end of .read-more --><\/p>\n","protected":false},"author":7,"featured_media":3234,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3231","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/posts\/3231","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3231"}],"version-history":[{"count":2,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/posts\/3231\/revisions"}],"predecessor-version":[{"id":3236,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/posts\/3231\/revisions\/3236"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=\/wp\/v2\/media\/3234"}],"wp:attachment":[{"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3231"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3231"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/clarin-lt.lt\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3231"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}