Research paper finds AI can bust pseudonymous accounts on a large scale

by
0 comments
Research paper finds AI can bust pseudonymous accounts on a large scale

For almost as long as the Internet has existed, users have been able to speak their minds through pseudonymous accounts that protect them from being spoofed or stalked.

But due to the advent of sophisticated AI, it has become extremely easy to expose pseudonymous users on the internet.

As detailed in A No peer-reviewed paper yetA team of researchers from ETH Zurich and AI company Anthropic found that “large language models can be used to perform denomination at scale.”

In a series of experiments, the researchers showed that their agent could “re-identify” users on a popular platform.S Hacker News and Reddit do something that “would take a dedicated human investigator hours” to do, based on their “pseudonymous online profiles and conversations alone”.

The results were worrying: the AI ​​agent busted a surprising two-thirds of users.

“Our results suggest that the practical ambiguity that protects pseudonymous users online no longer holds and that threat models to online privacy need to be reconsidered,” the researchers warned.

“Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to thousands of candidates,” co-author and ETH Zurich AI engineer Simon Lerman wrote in a paper. blog post with paper.

The implications for online privacy can be substantial.

“The average online user has long operated under an implicit threat model, where they have assumed that pseudonymity provides sufficient protection because targeted anonymization would require extensive effort,” they wrote. “LLMs invalidate this assumption.”

In their experiments, the team collected datasets from public social media sites to test their anonymization AI. They linked hacker news posts to LinkedIn profiles using references in user profiles. They then anonymized the dataset by removing any identifying references from the posts.

Finally, they trained an LLM on the dataset, and asked him to link posts to the original author.

“We found that these AI agents can do something that was previously very difficult: starting from free text (like an anonymized interview transcript) they can work their way up to the full identification of a person,” Lerman. told Ars Technica. “This is a very new capability; previous approaches to re-identification typically required structured data, and two datasets with a similar schema that could be joined together.”

As Lerman explained in his post, the team had to tread carefully, because “you don’t want to anonymize truly anonymous individuals.” Instead, the team “came up with two types of anonymization proxies that allow us to study the effectiveness of LLM in these tasks.”

Even when the data given to the AI ​​was extremely general, such as responses to anthropic questionnaires about how people use AI in their daily lives, LLM could pick up clues to identify people about seven percent of the time.

Although this may seem low, Lerman pointed out arse It’s notable that “AI can do this at all.”

The researchers also found that when given comments from different film communities on Reddit, an AI could identify users with a surprising rate of accuracy. The more users discussed movies, the easier it was for the AI ​​to anonymize them.

However, he also pointed out several limitations. For example, the sample sets “are small because they require verified identity links,” he wrote.

It is also difficult to distinguish what the LLM has collected from its web search.

The researchers acknowledged, “The attack relies on opaque web search systems, making it difficult to distinguish what the LLM agent contributes versus what the search engine embedding contributes.”

Nonetheless, the team warned that their findings paint a worrying picture of the future of online anonymity. “LLMs democratize anonymity,” they concluded, which could potentially allow governments to “link pseudonymous accounts to real identities to monitor dissidents, journalists or activists.”

He added, “Corporations can link anonymous forum posts to customer profiles for hyper-targeted advertising.” “Attackers can create sophisticated profiles of large-scale targets to launch highly personalized social engineering scams.”

In short, the advent of AI has ushered in a new era that requires advanced security measures – or it could even be the death knell of online pseudonymity.

“Users, platforms, and policymakers must recognize that the privacy assumptions underlying much of today’s Internet are no longer valid,” the paper reads.

More on doxxing: Elon Musk’s Grok AI is doxing everyday people’s home addresses

Related Articles

Leave a Comment