An activist group claims to have removed millions of tracks from Spotify and is preparing to release them online.
Observers said the apparent leak could boost AI companies looking for material to develop their own technology.
A group called Ana Archive said it scraped 86 million music files and 256 million rows of metadata such as artist and album names from Spotify. Spotify, which hosts over 100m tracks, confirmed that the leak does not represent its entire catalogue.
The Stockholm-based company, which has more than 700 million users worldwide, said it had “identified and disabled nefarious user accounts engaged in illegal scraping”.
“An investigation into the unauthorized access revealed that a third party scraped public metadata and used illegal tactics to circumvent DRM (digital rights management) to access some of the platform’s audio files,” Spotify said.
Spotify does not believe that the music taken from Ana’s archive has been released yet. Ana Archive, known for providing links to pirated books, said in a blog that it wanted to create a “‘preservation archive’ for music”.
The group claimed that the audio files represent 99.6% of all music heard by Spotify users and will be shared via “torrents”, a means of sharing large digital files online.
“Spotify doesn’t have all the music in the world, but it’s a great start,” Anna’s archivist saidWhich describes its mission as “the preservation of the knowledge and culture of humanity.”
The group said, “With your help, humanity’s musical heritage will be forever protected from destruction by natural disasters, wars, budget cuts, and other calamities.”
Ed Newton-Rex, a musician and campaigner for protecting artists’ copyright, said the leaked music would likely be used to develop AI models.
He said, “Training on pirated content is sadly common in the AI industry, so this stolen music is almost certain to be used for training AI models. That’s why governments should insist on AI companies revealing the training data they use.”
Anna’s archive site references Libgen, a massive online collection of pirated books that has reportedly been used by Mark Zuckerberg’s Meta to train its AI models. according to a filed in US courtMeta’s founder and chief executive Zuckerberg approved the use of the Libgen dataset despite warnings within the company’s AI executive team that it was a dataset “we know is pirated”.
Meta successfully defended a copyright infringement claim by the authors, but the plaintiffs in the case are seeking to amend their claim.
The AI startup’s co-founder wrote on LinkedIn that members of the public could theoretically “create their own personal free version of Spotify.” Third Chair co-founder Yoav Zimmerman said it could also allow tech companies to “train on modern music at scale.”
He added: “The only thing stopping them is copyright law and prevention of enforcement.”
Spotify said it has taken new security measures “for these types of anti-copyright attacks” since Anna’s Archive announcement and is “actively monitoring suspicious behavior.”
Copyright has become a battleground between artists, writers and creators on one side and AI companies on the other. AI tools such as chatbots and music generators are trained on vast amounts of data taken from the open web, including copyright-protected works.
In the UK, creative professionals have protested against a government proposal to let AI companies use copyright-protected works without permission unless the owners of the copyright-protected work indicate they do not want their data taken. Almost every respondent belongs to a government Counseling Artists’ concerns over the proposal have been supported.
Liz Kendall, the secretary of state for science, innovation and technology, told parliament this month there was “no clear consensus” on the issue, adding that ministers would “take the time to get this right”. The government has promised to make policy proposals on AI and copyright by March 18 next year.