Nearly 200,000 pounds are used to train artificial intelligence systems by some of the biggest tech companies. The problem? Nobody told the authors that.
The system is called Books3, and according to an investigation by The Atlantic, the dataset is based on a collection of pirated e-books covering every genre from erotic fiction to prose poetry. Books help generative AI systems learn how to communicate information.
Some AI training texts can be taken from articles published on the Internet, but high-quality AI requires high-quality text to absorb the language, according to the Atlantic, and that’s where books come in. game Books3 is already facing several lawsuits against Meta and other companies using the system to train AI.
Now, thanks to a database published by The Atlantic Last week, by leveraging Books3, authors can see if their books are specifically used to train these AI systems. And many are not happy.
“I am completely gutted and broken. I am outraged and at the same time I feel completely helpless. wrote Mary HK Choi on social media, after discovering that his work was being used. “I’m furious and I want to fight but I’m also very tired.”
Choi, whose first novel “Emergency Contact” appeared in the database, further explained his feelings in an email. The book, centered on a young Korean-American woman navigating a new relationship, was “deeply personal,” and Choi was initially told that her story was “too quiet and too particular.” The book went on to become a New York Times bestseller and found an audience around the world.
“A book summarizes infinite choices, unlimited permutations and even faults of the author of the time. To think that all of this life can be thrown into a vast churning pool to be extruded into a giant algorithmic, generative sausage machine reduces so much so quickly,” she said. “Not just financially for authors, but it puts a strain on booksellers, librarians and readers of so many intimacies.”
Min Jin Lee, author of the novels “Pachinko” and “Free Food for Millionaires,” expressed similar thoughts. on social networksbluntly calling the use of his books “theft”.
“I spent three decades of my life writing my books,” she said. “Large Al language models did not “ingest” or “scrape” “data”. Every company stole my work, my time and my creativity. They stole my stories. They stole a part of me.
Nora Roberts, the prolific novelist, has 206 used books in the Books3 database, according to The Atlantic. This number is the highest ever recorded by a living author, and second only to William Shakespeare. She called the database and its use by tech companies “all kinds of bad.”
“We are human beings, we are writers, and we are exploited by people who want to use our work, again without permission or compensation, to “write” books, screenplays, essays because it is easy and cheap,” Roberts said in a statement to CNN.
This exploitation of writers did not shock author Nik Sharma, whose cookbook “Season” was found in the database.
“I am horrified but not surprised that I am being taken advantage of,” he said. said in a post on social media. “Clearly, I was not even asked for permission or received any compensation for using my work to train the AI.”
AI is inevitable, Sharma later said in an email – hence his lack of surprise. What’s most aggravating, he says, is that no one has been contacted regarding usage or payment. After all, education isn’t free in the United States, he said; teachers are paid and textbooks are purchased.
“It’s the Wild West right now with AI, and government policy on it is in its infancy,” Sharma said. “And as a result, tech companies are taking full advantage of it while they can.” I’m glad this is just one cookbook and not my other ones.
Meta, which used the Books3 database according to The Atlantic, did not respond to a request for comment.
A Bloomberg spokesperson noted in a statement that the company “used a number of different data sources,” including Books3, to train its initial BloombergGPT model, an AI model. for the financial sector. But, according to the spokesperson, Bloomberg “will not include the Books3 dataset among the data sources used to train future commercial versions of BloombergGPT.”
Not all authors are upset about their work being used by AI. James Chappel, whose academic book on the modern Catholic Church was used in the database, said: on social networks that he doesn’t care at all.
“I want my book (to be) read!” ” he wrote. “I want it to educate!”
Chappel did not respond to requests for additional comment.
AI, in the hands of big companies, has become a major concern for many writers. The Writers Guild of America went on strike this summer, in part over demand limits on the use of AI in writing for films and television shows. ChatGPT in particular was used for All from writing homework to legal briefs.
Writers are not alone in their concerns. With the popularity of text-to-image AI systems, visual artists were in same situation last year, discovering that their work was being used to train AI without authorization. Together, these two examples highlight concerns about AI’s growing reach across all art forms, where work can sometimes be intensely personal or intimate.
The conversation raised by Books3 comes just as US President Joe Biden announced plans to introduce an executive order on AI this fallsaying the country will lead “the path to responsible AI innovation”.
For writers, however, the constant battles around AI and their work can be deflating. For Choi, discovering that his book had been used in the midst of the WGA strike, during which AI was a hotly debated topic, was “surreal.”
“I was gutted,” she said via email. “It really felt like any gains or traction that needed to be made in one area could be so easily wiped out in another.”
And yet, Choi says she knows her book, among thousands of others, is “insulting and inconsequential,” despite its importance to her.
“I think what’s most unfortunate about all of this is that in my most desperate moments, everything seems absolutely inevitable,” she said.
Choi is not alone in feeling this sense of inevitability. Roberts called for unity among writers and the public to combat these issues.
“We who create stories must come together to fight this abuse of our talent and hard work,” she said. “We must defend our work and that of everyone. I hope readers and viewers will stand with us on this vital issue.