Book Review: Jim Sullivan, “The Book on AI Doc Review”

EDRM - Electronic Discovery Reference Model
Contact

EDRM - Electronic Discovery Reference Model

Book Review: Jim Sullivan, “The Book on AI Doc Review” by Michael Berman, E-Discovery LLC.
Image: Holley Robinson, EDRM.

The thesis of the book is that “computers are capable of reviewing and classifying document better than humans. And that’s a big deal in eDiscovery.” As its title suggests, the book is focused on AI document review, and contrasting it with TAR and predictive coding. While Technology Assisted Review uses humans to train the machine, AI is trained and uses prompts to tell it what to look for. It does not use “training examples.” Mr. Sullivan provides a sample instruction:

“All documents where an Acme employee suggests that pricing of widgets should be modified.”

You’ll notice the instructions read like a Request for Production, which is exactly what they are. In most cases, we simply copy the exact language from the Request for Production to start our instructions.

Jim Sullivan, The Book on AI Doc Review (eDiscovery AI, 2024).

Mr. Sullivan writes that “AI-powered review… can easily find 95%+ of the relevant documents.” I thought the “how to” chapters were among the most interesting. The book walks through a relevancy review, step-by-step, using random sampling to “QC,” or quality control, the results.

As a validation process, Mr. Sullivan follows the tried and true path of classifying true positives, true negatives, false positives, and false negatives, to create metrics such as recall and precision. In my experience, these techniques have long been used on, for example, keyword searches. Here, they are applied to AI. The book provides the simple formulae:

Recall = TP/(TP + FN)

Precision = TP/(TP + FP)

The author describes preparation of an “answer key” by a subject-matter expert. He uses a term that I have not heard to describe the process of calculating metrics with that key – a “confusion matrix.” The process applies standard techniques, such as sampling the “discard pile,” to improve iterative queries. Frankly, this blog shortchanges Mr. Sullivan’s excellent chapters because it would take too long to summarize them.

The only thing that matters is how you validate the results and demonstrate high-quality output.

Jim Sullivan, The Book on AI Doc Review (eDiscovery AI, 2024).

As to defensibility, Mr. Sullivan wrote: “The only thing that matters is how you validate the results and demonstrate high-quality output.” While (in my opinion) validation may not be the “only” thing, its importance cannot be overstated. The author explains:

So, what does a defensible AI Review look like? It’s a lot like any Predictive Coding review. We need to use sampling to validate the results. Let’s walk through how we can do that. The general process for predictive coding has become pretty straightforward:

  1. Identify the review set.
  2. Train the machine.
  3. Run the documents through the classifier.
  4. Evaluate the results.

Believe it or not, it’s no different with AI.

Jim Sullivan, The Book on AI Doc Review (eDiscovery AI, 2024).

The book is full of concrete examples. For step 1, for example, it suggests removal of ROT (redundant, obsolete, or trivial), documents without extracted text, audio files, images, and huge files, as well as deduplication. That is the classic approach to document review.

Mr. Sullivan also suggests “pre-validation.” This consists of running prompts against a random sample before running them against the full data set. Then a subject-matter-expert reviews the “hits” to determine recall and precision. This provides a benchmark that is analogous to what I have called “richness.” Mr. Sullivan suggests pre-validation as a cost-saving measure.

Another excellent discussion is that prompts may be refined by either inclusion or exclusion criteria. An example of inclusion criteria is that “any discussion about qualifications in hiring should be deemed relevant.” Exclusion would be: “Any discussion about hiring anyone other than coaches or management should be considered not relevant.”

Mr. Sullivan discusses full AI review, but also posits options such as “AI-Powered Linear Review,” in which batches are selected using AI, and AI/CAL Hybrid Review, in which seed documents are reviewed by AI.

As to confidentiality and security, Mr. Sullivan wrote: “If you aren’t paying for a product, you are the product.” He offers questions to ask the AI provider to ensure security.

Written by:

EDRM - Electronic Discovery Reference Model
Contact
more
less

What do you want from legal thought leadership?

Please take our short survey – your perspective helps to shape how firms create relevant, useful content that addresses your needs:

EDRM - Electronic Discovery Reference Model on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide