Salesforce Used Pirated Books to Train AI, Authors Allege

Mogin Law LLP
Contact

Mogin Law LLP

Salesforce, the enterprise software powerhouse behind tools like Slack and Tableau, is the latest in a string of companies facing litigation over the use of proprietary materials. In this case the works were used to train the company’s XGen AI models without author consent.

The suit was brought in federal court in San Francisco by writers Molly Tanzer and Jennifer Gilmore, who claim Salesforce continues to infringe their copyrights by storing and processing their works.

Specifically, Salesforce allegedly used datasets like RedPajama and The Pile, which include the controversial Books3 corpus. Books3 is a collection of over 196,000 books reportedly scraped from the private tracker Bibliotik. The authors claim Salesforce initially listed “RedPajama-Books” as part of its training data when launching XGen in June 2023, but later scrubbed references to it from documentation after questions arose about its legality.

Tanzer and Gilmore are asking the court to certify a class of all U.S. copyright holders whose works were used since October 2022. They seek statutory damage, destruction of infringing copies, return of profits, and a declaration of willful infringement.


CRM Market

Salesforce competes with other major enterprise software and cloud computing companies, including Microsoft, Oracle, SAP, and Adobe. In the customer relationship management (CRM) market, Salesforce is the industry leader, consistently ranking as the top provider by market share, followed by Microsoft Dynamics 365 and Oracle CRM. In May 2025, International Data Corp. named it the biggest CRM provider in the world for the 12th year in a row.

Salesforce claims it maintains its position through robust product offerings, frequent innovation, and a strong ecosystem of integrations and partners. While competitors like Microsoft and Oracle have significant resources and global reach, Salesforce’s focus on cloud-based solutions and AI-powered tools has helped it retain its status as the dominant force in the CRM sector.

Salesforce reported $37.9 billion in annual revenue for fiscal year 2025, which ended on Jan. 31, 2025. This represents a 9% year-over-year increase, and is $5 billion more than its four nearest competitors combined. Microsoft is the nearest rival, earning $5.45 billion in 2024. Salesforce projects growth will continue, predicting 2026 revenue will be just under $41 billion. It serves more than 150,000 businesses globally (source: ascendix.com). These range from small startups to Fortune 500 companies, including names like Amazon Web Services, Walmart, American Express, and NASA.

Their customer base is diverse:

  • 49% are small businesses (fewer than 50 employees)
  • 40% are mid-sized companies
  • 11% are large enterprises (more than 1,000 employees)
  • (Source: ibirdsservices.com)

In May 2025, International Data Corp. named it the biggest CRM provider in the world for the 12th year in a row.


Growth Through Acquisition

In addition to its popularity with customers, the company has grown through a robust acquisition campaign.

2025

  • Informatica – $8B – Data integration and management
  • Moonhub – Undisclosed – AI-powered recruiting platform
  • Convergence – Undisclosed – Unknown (likely AI or data-related)

2024

  • Zoomin – $450M – Content delivery and documentation platform
  • Own Company – $1.9B – Backup and recovery for SaaS data
  • Tenyx – Undisclosed – Conversational AI
  • PredictSpring – Undisclosed – Mobile commerce platform
  • Spiff – $374M – Sales commission software

2021

  • Acumen Solutions – $570M – Professional services firm

2020

  • Slack – $27.7B – Collaboration and messaging platform
  • Vlocity – $1.33B – Industry-specific cloud apps

2019

  • Tableau – $15.7B – Data visualization and analytics
  • ClickSoftware – $1.35B – Field service management
  • MapAnything – $225M – Location-based services
  • Bonobo AI – $45M – Conversational analytics

2018

  • MuleSoft – $6.5B – API integration platform
  • Datorama – $800M – Marketing analytics

2016

  • Demandware – $2.8B – E-commerce platform
  • Krux – $800M – Data management platform
  • Quip – $750M – Collaboration and productivity suite

2013

  • ExactTarget – $2.5B – Marketing automation

AI Copyright Suits Piling Up

In addition to the Salesforce lawsuit, the Mogin Law Blog has covered several other prominent AI copyright cases involving major technology companies. One notable case is the lawsuit against OpenAI and Microsoft, where authors allege their books were used without consent to train large language models like ChatGPT. Another widely discussed case involves Meta, which faces similar claims related to the use of copyrighted books in training its AI systems. The blog has also reported on litigation brought by visual artists against Stability AI, Midjourney, and DeviantArt, alleging that their artwork was scraped and used to train generative image models without permission.

These cases highlight the growing legal scrutiny over how AI companies source and utilize copyrighted material for training purposes.

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations. Attorney Advertising.

© Mogin Law LLP

Written by:

Mogin Law LLP
Contact
more
less

What do you want from legal thought leadership?

Please take our short survey – your perspective helps to shape how firms create relevant, useful content that addresses your needs:

Mogin Law LLP on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide