AI Insights: Online Terms of Use and the Training of AI Models

A key building block of artificial intelligence (AI) large language models (LLMs) is that they are trained on vast amounts of content and data. In many cases, this content and data is amassed by running bots or other automated programs that extract information from the web. For example, an earlier version of GPT (GPT-3) was trained in part through the use of filtered data from Common Crawl, an open, but unpermissioned, repository of data extracted through web crawling. Similar methods that programs may employ to extract data include “web scraping” or “bulk downloading.” Importantly, nearly all of these programs are run without obtaining authorization to extract and use the content and data in this manner.

Please see full publication below for more information.

LOADING PDF: If there are any problems, click here to download the file.

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations.

© Skadden, Arps, Slate, Meagher & Flom LLP | Attorney Advertising

Written by:

Skadden, Arps, Slate, Meagher & Flom LLP
Contact
more
less

Skadden, Arps, Slate, Meagher & Flom LLP on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide