Homepage News Russian media on Wikimedia Commons could influence future AI models

Russian media on Wikimedia Commons could influence future AI models

Kremlin Moscow Russian flag
Shutterstock

Artificial intelligence learns about our world by scraping data from the internet every day.

Most people assume this digital diet consists of neutral facts and harmless pictures. But what happens when the images training the next generation of tech are quietly twisted by political agendas, as reported by 02.pl.

AI’s digital diet

Tech companies need massive amounts of media to train their latest models. This hunger has sent automated bots scrambling to gather open-source photos, videos, and audio files.

The Wikimedia Foundation recently noted a huge spike in this activity. Since 2024, the bandwidth used to download files from Wikimedia Commons has jumped by 50 percent.

This traffic does not come from humans. Instead, automated programs vacuum up these open-licensed files to feed AI systems.

Tainted training data

Some experts warn this harvest has a darker side. A report by the Polish Press Agency, known as PAP, reveals that Russian sources are filling Wikimedia Commons with materials about the annexation of Crimea.

Searching the term “annexation of Crimea” in the English database yields over 40 Russian state results out of 51 total. These include the official website of the Russian president.

Marcin Żabiński, an expert from the Kybernetes Institute, believes this is no accident. He warns that Russia is actively using the open platform to spread its own political narrative directly into AI models.

Danger in description

The threat goes far beyond the images themselves. File descriptions and creator information, known as metadata, play a huge role in how AI reads a file.

“A much more interesting meta-information is a brief description of what’s in the photo. And this is already very open to abuse, as it can influence how the recipient or the language model interprets the file,” Żabiński emphasized.

There are also fears about steganography, which involves hiding secret instructions inside digital images. Żabiński warned, “These could be instructions specifically ordering a change in the analysis method or intended to trigger a specific action on the language model.”

Open door policy

The Wikimedia Foundation did not respond to questions from PAP before publication. However, Wikimedia Polska released a statement addressing the concerns.

The organization stated that, based on volunteer feedback, they “see no basis for speaking of documented, coordinated Russian interference.” They noted that many Russian materials end up there simply because they use open licenses.

Still, the reality remains that AI models will keep drinking from this open pool. If the water is tainted, the machine’s worldview might suffer too.

Sources: 02.pl, PAP

Ads by MGDK