Do you have a Reddit alt, Secret AI may have made it too easy to expose you. This is the conclusion of a Recently Published StudiesWhich points to some uncomfortable consequences of remaining private online – even if it’s not time to hold a funeral for anonymity just yet.
The discovery, which has not been peer-reviewed, comes from researchers at ETH Zurich, the Anthropic and Machine Learning Alignment and Theory Scholars Program. They created an automated system of AI agents using unsupervised models – capable of searching the web and interacting with information like a human investigator – to test how effectively large language models can re-identify unknown content. The system “significantly outperforms” traditional computational techniques for anonymizing accounts, scouring text for personal details at scale.
The system works by treating a post or other text as a set of clues. It analyzes text for patterns – writing quirks, stray biographical details, frequency and timing of posting – that may indicate someone’s identity. It then scans other accounts, potentially millions of them, looking for the same mix of symptoms. Possible matches are flagged, compared in more detail, and a short list of possible identities is created.
Instead of targeting unsuspecting users, the team evaluated the system using a dataset built from publicly available posts, including content from Hacker News and LinkedIn, transcripts of interviews with Anthropic scientists about how they use AI, and Reddit accounts that were intentionally split into two anonymous halves for testing. The paper reports that in each setting the LLM-based approach correctly identified 68 percent of matching accounts with 90 percent accuracy. In contrast, comparable non-LLM methods such as connecting scattered data points in large datasets identified almost none.
The results were not uniform across datasets, and, predictably, the model performed better when it had more structured information to work with. In an experiment examining Reddit users posting about films in the main r/Movie subreddit and smaller film communities, the system was able to link accounts that mentioned just one film about 3 percent of the time with 90 percent accuracy. When users mentioned 10 or more movies, the success rate almost halved.
Meanwhile, an experiment using a survey of Anthropic scientists identified nine out of 125 respondents, a recall rate of about 7 percent. In that test, the system created a profile of each respondent based on clues in their answers and then searched publicly available information on the Web for possible matches. In an example match, the researchers highlight how reference to a “supervisor” can suggest a PhD student and the use of British English can indicate UK affiliation. With a background in physics and mention of current work in biology research, the system was able to narrow the field to a particular candidate.
Nevertheless, the researchers argue that the ability to identify any respondent from unstructured text, which would have taken a human investigator hours to replicate in minutes, is remarkable. Furthermore, he told The Verge As AI systems become more capable and gain access to larger pools of data, performance is likely to improve. More broadly, they warn that it is no longer safe to assume that posting under a pseudonym will protect online identities, past or future.
“Everything that has been found by an LLM can theoretically be found by a human investigator.”
“Information on the Internet is forever,” said Daniel Paleka, a researcher at ETH Zurich and one of the study’s authors. Researchers warn that this persistence could translate into real-world risks for journalists, dissidents and activists who rely on pseudonyms, as well as enable “ultra-targeted advertising” and “highly personalized” scams.
The risks of anonymizing accounts are not new, nor are they unique to AI. “Everything that was found in LLM could theoretically be found by a human investigator,” Paleka said. The Verge.
What is new, Paleka argues, is end-to-end automation. The work that once required a diligent investigator, patiently sifting through posts in search of small bits of information, can now be done with far greater ease and on a far greater number of targets.
It is also cheap. The researchers said their experiment cost less than $2,000, with the cost being between $1 and $4 for each profile on which they ran the AI agent. “The economics are completely different now,” said co-author Simon Lerman. The VergeWarning that the lower barrier to entry may expand who has the ability – and the incentive – to try and penetrate online anonymity. He said groups that have historically “flown under the radar” may find it difficult to continue to do so.
People “may misinterpret this important research and conclude that privacy is dead.” It’s not like that.
It is important not to overstate the findings. “Although these algorithms are improving, they are a far cry from what humans can do,” said Luke Rocher, associate professor at the Oxford Internet Institute. The Verge. The work does not clearly depict the real world; The experiments were conducted under laboratory conditions using datasets that were carefully curated and anonymized for testing purposes. He said he worried that people “might misinterpret this important research and conclude that privacy is dead.” Not so, he argued.
Despite years of incremental progress in technologies designed to expose anonymous users, “the identity of Bitcoin’s inventor Satoshi Nakamoto remains a mystery more than a decade later,” Rocher said. Whistleblowers can still communicate with journalists without being exposed, he added, and tools like Signal “have been successful so far in protecting our collective privacy.”
In the paper, the researchers said they avoided testing their system on real pseudonymous users due to ethical concerns. For similar reasons, they did not publish full technical details of their approach and declined to provide demonstrations when asked. The team also wouldn’t say whether they tested the system outside the scope of the study, again citing ethical concerns, leaving open questions about how reliably it would perform against real-world accounts.
For those who are already deeply committed to anonymity, the practical impact may be limited. Basic precautions — keeping accounts separate, limiting personal details, avoiding recognizable patterns like posting only during waking hours in your time zone — are still important.
For those who are more comfortable with pseudonyms, Paleka and Lerman advised users to think carefully about what they post on public forums, even on accounts that appear to be anonymous, and keep in mind that what already exists can be pieced together more easily than many people realize.
Researchers argue that the responsibility should not rest entirely on users. Lerman said AI labs should monitor how their tools are being used and create safeguards to prevent them from being used to anonymize people. He said social media platforms could crack down on the scraping and large-scale data extraction that make such efforts possible.
In other words, Satoshi is potentially safe from AI spies. Your boring AITA posts on Reddit? That could be another thing.
