The First in the Baltic Region: A Propaganda Corpus Exposing Hostile Narratives

Researchers at Vilnius University are developing the first large-scale propaganda corpus in the Baltic region and much of Eastern Europe, now published in “Scientific Data” (“Nature” Portfolio). The project captures what is often invisible to the public: systematic hostile narratives, their linguistic structure and the strategies used to influence emotions and attitudes. The unique dataset of 1,000 articles offers a foundation for analysing how propaganda constructs worldviews and may later support AI-based detection tools. The interdisciplinary team includes Prof. Virginijus Marcinkevičius and PhD candidate Ieva Rizgelienė, Assoc. Prof. Vilma Zubaitienė, and Dr Nerijus Maliukevičius.
The researchers emphasise that propaganda’s impact often lies between the lines: emotional tone, repeated storylines and subtle ideological oppositions. The idea for an automatic detection tool arose before the war, but after February 2022 it became urgent. “When the war started, it became clear that a systematic effort was being made to influence society. I couldn’t stand aside anymore,” says Rizgelienė.
Narratives That Intensify in Times of Crisis
The project focuses specifically on hostile propaganda. These are texts designed to divide society, undermine institutions, discredit democratic processes and promote the image of authoritarian regimes. In the corpus, the recorded narratives align with threats identified in Lithuania’s State Security Department reports: undermining statehood, attacking support for Ukraine, discrediting Western institutions such as NATO and the EU, and amplifying the image of Russia and other authoritarian states. Emotional expression is one of the most frequently used tools.
Quarterly data reveal sharp increases in hostile narratives at key moments: the start of the COVID-19 pandemic, the beginning of Russia’s invasion of Ukraine and the approach to the NATO Summit in Vilnius. The analysis also reveals clear semantic patterns: recurring words such as ‘state’, ‘Lithuania’, ‘Russia’, ‘West’ and ‘people’ are used in contrasting ways across narratives. In texts undermining Lithuanian institutions, state-related terms often appear in negative frames, while narratives supportive of Russia or critical of the West pair similar concepts with positive language. According to Rizgelienė, these contrasts show how language is mobilised to shape emotional and ideological oppositions.
An Exception in the Region
“The researchers examined whether similar tools existed in other Russia neighbouring languages, including Latvian, Estonian, Kazakh and Georgian, but identified none. The sole exception was a project developed in Czech. This confirms that we are the first in the Baltic region, and among only a handful across the former Eastern Bloc, to systematically examine not only the content of the information war but also its underlying linguistic mechanisms,” explains Rizgelienė.
Long-established Western democracies host more propaganda-related corpora, but these largely reflect transatlantic political communication patterns centred on elections, party competition or social media. In Central and Eastern Europe, however, the dominant hostile information flow comes from Russia and follows entirely different aims and structures.
Inside the Making of the Corpus
Sources for the corpus were selected using Lithuania’s State Security Department threat assessments from 2018 to 2024, in cooperation with investigative journalists from national media outlets. The team analysed content from six websites consistently identified as disseminating hostile or manipulative narratives, along with material produced by individuals publicly recognised as disinformation actors. Articles from the national public broadcaster were included as a control sample, and annotation was conducted blindly – coders did not know where each text originated.
Rizgelienė remarks that almost no propagandistic content was detected in the control sample, confirming the reliability of the selection criteria.
“The final dataset contains 1,000 articles, all manually annotated. Each article was annotated independently by two different annotators. After individual annotations were completed, the annotations for each article were merged, and pairs of annotators held weekly discussions to resolve any conflicts, thereby preparing a finalized, consensus-based annotation for each article,” points out Prof. Marcinkevičius.
Recurring Storylines and Dark Rhetoric
Rizgelienė and the team identified eleven recurring narratives. The strongest include distrust in Lithuanian institutions, attacks on statehood, delegitimisation of the West, disinformation about the war in Ukraine and the so-called “new world order.” These narratives often merge, reinforcing emotional oppositions between the “degenerate West” and supposedly “order-preserving” East, such as Russia or China. Conspiracy theories are frequent: from the “new world order” and 5G “dangers” to portraying homosexuality as a symbol of a “collapsing Europe.”
Propagandistic texts also attempt to diminish Ukraine’s struggle, justify Russia’s actions and question Western support. Lithuania is often portrayed as lacking sovereignty, being a puppet state or a “colony” governed from abroad. Institutions are depicted as corrupt and acting against citizens. Quarterly graphs show peaks in these narratives during the early invasion of Ukraine and ahead of the NATO Summit in Vilnius. The “new world order” theme intensified after the arrival of COVID-19.
According to Prof. Marcinkevičius, ten propaganda techniques were analysed, and the most common is emotional rhetoric: “Propaganda often relies not on arguments but on emotions: criticism aimed at one individual is extended to an entire institution; hyperbole, sarcasm and labelling are used, and sometimes even calls for violence.”
Unpacking the Language of Propaganda
The team now plans to examine the grammatical, lexical and stylistic devices that shape propagandistic discourse, in close collaboration with linguists. “These linguistic tools directly affect emotions, highlight contrasts and strengthen the overall persuasive effect. Grammatical, lexical and stylistic devices often overlap, the category depends on the analytical perspective. Slogans, for example, can be seen as grammatical, lexical or stylistic. Epithets are usually described as stylistic, but grammatically they may function as attributes or modifiers. It is difficult to place them into rigid categories,” notes Assoc. Prof. Zubaitienė.
The corpus is used to develop three types of algorithms: models that detect narratives, identify techniques and classify whether a text can be considered propaganda. “Technical solutions alone are not enough. One must understand society, language and communication practices, and working with Lithuanian adds an additional challenge, since it is a low-resource language in global AI development,” remarks Prof. Marcinkevičius.
By April 2026, the team aims to produce an accessible tool: users would upload a text and receive an automated analysis. “Such a tool would be highly valuable in schools, universities, newsrooms and everywhere critical thinking matters,” summarises Rizgelienė.
How to Recognise Hostile Websites
Researchers point out that propagandistic websites often imitate the layout and tone of legitimate media: familiar sections from sports to politics, but with the underlying aim of weakening democratic trust.
“It is crucial to verify sources. First, check whether the listed editors and journalists actually exist or whether their profiles are fabricated. We increasingly see AI-generated articles attributed to fictional authors with synthetic portrait photos,” warns Dr Maliukevičius.
He explains that such content typically manipulates strong emotions: “Anger, fear, panic, apocalyptic language – all of this is characteristic of propaganda, whose aim is not to inform but to provoke an emotional reaction.
Prof. Marcinkevičius adds that propaganda constructs an alternative reality with clear-cut villains and heroes, a seductive simplicity that distorts reality while offering a false sense of clarity. He compares this to criminal narratives: “They create their own world: complete with alibis, fabricated evidence and a neatly arranged story that makes them appear innocent. When new facts emerge, they rewrite the story again.”
At its core, the researchers underline that resisting manipulation depends not on arguing with false narratives but on verifying facts and strengthening critical thinking. Only objective, evidence-based information can build resilience.