“E-discovery is pervasive. It’s like understanding civil procedure. You’re not going to be a civil litigator without understanding the rules of civil procedure. Similarly, you’re no longer going to be able to conduct litigation of any complexity without understanding e-discovery... The absence of technical knowledge is a distinct competitive disadvantage.” Joe Dysart, Learn or Lose, ABA Journal, April 2014, at 32-33.
Those are the words of Magistrate Judge James C. Francis of the Southern District of New York at the 2014 LegalTech conference in New York City. Despite this admonition, there are attorneys who still print their client’s electronically stored information (“ESI”) onto paper to conduct relevancy and privilege reviews. Not surprisingly, this is now considered a “worst practice” by e-discovery experts. Anne Kershaw and Joe Howie, Judge’s Guide to Cost-Effective E-Discovery 17 (E-Discovery Institute 2010).
A considerable step up from manual review is the use of search terms or “keywords” to locate relevant or privileged documents in an ESI collection. Search terms can be very useful when employed with smaller ESI collections, and can be helpful in, among other things, identifying privileged materials and materials sent to or by a particular custodian.1 However, search terms also have drawbacks, particularly in larger ESI collections, because they often retrieve “too much irrelevant data (poor precision) and too little of the relevant data (poor recall).” William Hamilton, The Elusive Search for the Ideal Search, Litigation, Vol. 38, No. 2, Winter 2012, at 9. This is because the same word can mean multiple things2, and there can be multiple words that have the same or similar meanings.3 Id. The courts have recognized these drawbacks. See, e.g., United States v. O’Keefe, 537 F. Supp. 2d 14, 24 (D.D.C. 2008) (“[w]hether search terms or ‘keywords’ will yield the information sought is a complicated question involving the interplay, at least, of the sciences of computer technology, statistics, and linguistics....Given this complexity, for lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread”).
An improvement on keyword searching for the identification of relevant or privileged documents is the use of “latent symantic indexing.” Programs that have the ability to perform latent symantic indexing recognize other words found in documents that contain a specific keyword and then begin searching for documents that contain those other words. As a result, these programs can identify potentially relevant documents that do not contain the original keyword.
The latest technological advance in e-document review is called “technology assisted review” (“TAR”), or “predictive coding.” TAR is an iterative process that involves alternating human and computer review of e-documents. The first step involves the review of a sample of an ESI collection (known as a “seed set”) by an individual with in-depth knowledge of the case. The results of that review are analyzed by the TAR technology, which then “reviews” a much larger sample from the same collection and provides suggested “coding”4 for those documents. The human reviewer then samples the e-documents that the computer has reviewed and corrects any problems with the suggested coding. The computer “learns” from the feedback provided by the human reviewer and completes the review and coding and ranks the documents according to its “understanding” of relevance. TAR has been shown to be up to 80 percent accurate – as compared to around 50 percent for manual review by multiple attorneys – and to save up to 86.77 percent of the estimated costs of a manual review. Jenya Moshkovich, Technology-Assisted Document Review, For the Defense, June 2013, at 67-68 (discussing Global Aerospace, Inc. v. Landow Aviation, L.P., et al., CL 61040 (Va. Cir. Ct. Apr. 23, 2012)). Despite understandable hesitancy (because of the fact that a machine, and not an attorney, is making relevancy judgments), recent decisions have indicated that the courts are warming up to the use of TAR by one or both parties. See, e.g. Da Silva Moore v. Publicis Groupe, 201 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24, 2012) (permitting consenting parties to engage in computer assisted review); Global Aerospace, supra (TAR approved over objection); EORHB Inc., et al. v. HOA Holdings, LLC, C.A. No. 7409-VCL (Del Ch. Oct. 15, 2012) (court requires the use of TAR, unless good cause is shown).
There are other technological tools that increase the efficiency of reviewing and processing ESI, including “de-duping” and “e-mail threading”. In de-duping, successive copies of the same e-mail or document are removed so that the document does not have to be reviewed multiple times. De-duping can be performed within a single custodian’s collection, but is most effective when it is performed across multiple custodians. In “e-mail threading” (also known as “clustering” or “near grouping”), e-mail threads or documents that are otherwise related are grouped together so that the reviewer can review them all at the same time, thereby increasing the chances that they will be coded consistently. Finally, counsel should be aware of the fact that much of this technology is available for rent through the “cloud”, thereby allowing firms to save on the up-front cost of the software, as well as the costs and time associated with maintaining and updating the software. Joe Dysart, Eye in the Sky, ABA Journal, April 2014, at 32.
As Magistrate Judge John M. Facciola of the D.C. District Court added at the LegalTech conference, “Lawyers better get crackin’. There’s an awful lot to know.” Joe Dysart, Learn or Lose, ABA Journal, April 2014, at 32.
1 "Custodian" is the term used to describe the individual who had physical possession of the e-document(s) in question prior to collection.
2 Examples include the words "bank" and "spring." This is known as "polysemy."
3 Examples include the words "attorney," "lawyer" and "counselor." This is known as "synonymy."
4"Coding" is the term used to describe the process by which a particular document is marked as containing information relating a specific issue.