July 24, 2023

AI Insights: Online Terms of Use and the Training of AI Models

Mana Ghaemmaghami, Stuart Levi, MacKinzie Neal

Skadden, Arps, Slate, Meagher & Flom LLP

+ Follow Contact

Send

Embed

A key building block of artificial intelligence (AI) large language models (LLMs) is that they are trained on vast amounts of content and data. In many cases, this content and data is amassed by running bots or other automated programs that extract information from the web. For example, an earlier version of GPT (GPT-3) was trained in part through the use of filtered data from Common Crawl, an open, but unpermissioned, repository of data extracted through web crawling. Similar methods that programs may employ to extract data include “web scraping” or “bulk downloading.” Importantly, nearly all of these programs are run without obtaining authorization to extract and use the content and data in this manner.

Please see full publication below for more information.

View PDF

Download PDF [309KB]

Report

LOADING PDF: If there are any problems, click here to download the file.

Send Report

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations.