The reliability and accuracy of the predictive coding process depends heavily on the identity of the documents in the seed set (including documents deemed irrelevant), because the seed set is the primary source used to teach the computer how to recognize patterns of relevance in the larger document universe. Indeed, miscoding just a few thousand documents—mere kilobytes nestled among terabytes of data—could substantially alter the results of the predictive coding that follows. A biased seed set coding could lead to large swaths of relevant documents being deemed irrelevant, and a smoking gun could be missed. Recognizing the power of this relatively small bit of data, e-savvy attorneys seek to obtain as much information about their adversary's seed set as possible.
Certainly there is no harm in requesting such information, and sometimes parties will agree to disclose an entire seed set, including those documents the producing party has deemed irrelevant. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, 186-87 (S.D.N.Y. Feb. 24, 2012), dealt with voluntary disclosure of irrelevant documents from the seed set, as well as certain information relating to the human reviewers' methodology for coding the set. But when a request is declined, and the parties disagree on the extent to which seeding data must be shared, is either party entitled to receive such information from its adversary? This article explores this question, which remains unsettled.
To address the question, the nonproducing party would argue that transparency teaches that the producing party—which seeks to exploit the cost-efficiency of predictive coding technology—must divulge to its opponent the entire seed set, including documents deemed irrelevant. Only then, the argument goes, could the opponent have a reasonable opportunity to evaluate whether the documents selected by the algorithms in computer-assisted review accurately reflect the entire data set available.
In Da Silva Moore, the court emphasized the importance of transparency and cooperation as major factors to determine whether predictive coding was appropriate to use in discovery. In that case, the parties had agreed that the defendant would divulge the entire set of nonprivileged seed set documents, the issue tags coded for each document, regardless of whether those documents were coded as relevant or irrelevant. Short of requiring this level of transparency, the court highly recommended "that counsel in future cases be willing to at least discuss, if not agree to, such transparency in the computer-assisted review process."
In contrast, the producing party would argue that disclosure of seeding data would amount to the unwarranted disclosure of its counsel's attorney work product, per Federal Rule of Civil Procedure 26(b)(3)(B). After all, in the paper-production context, a party need not divulge documents deemed irrelevant to prove the relevancy of those produced. So why impose such a requirement here when the process is computer-assisted?
In response to such a producing party's assertion that the seed set is privileged, an opponent may cite to a number of recent cases compelling disclosure of search terms, including American Home Assurance v. Greater Omaha Packing, No. 8:11-CV-270 (D. Neb. Sept. 11, 2013); Romero v. Allstate Insurance, 271 F.R.D. 96 (E.D. Pa. 2010); Formfactor v. Micro-Probe, No. C–10–03095 PJH (JCS) (N.D. Cal. May 3, 2012); and Apple v. Samsung Electronics, No. 12–CV–0630–LHK (PSG) (N.D. Cal. May 9, 2013).
Although each of these cases concluded that search terms are not subject to the work-product privilege, the usefulness of applying the cases' holdings to the predictive coding context is questionable. In fact, in American Home, the court never addressed whether search terms were privileged under the work-product doctrine. The remaining cases failed to adequately consider the issue, instead improperly relying on an inapposite citation of precedent from Upjohn v. United States, 449 U.S. 383 (1981), which related to the applicability of the attorney-client privilege, not the work-product privilege.
Even without case law support, counsel's development of search terms is arguably akin to the considerations underlying the determination of which documents should constitute the seed set. This argument, however, can be countered. Just because search terms and seed sets are both used to facilitate electronic discovery does not mean that they necessarily implicate work-product privilege issues in the same way. Search terms may be words or phrases copied and pasted directly from a document request; they may be dictated by one's adversary and implemented neatly without further consideration or any complicated analysis.
In contrast, the producing party would argue, tagging documents for relevance to develop the seed set may involve greater complexity; actively culling through a seed set to determine which specific documents are relevant (and which are not) arguably demands more of an application of the attorney's mental impressions of the claims than coming up with search terms for documents not yet reviewed. Moreover, the argument goes, revealing search terms used in the discovery process does not rise to the level of intrusiveness and unfairness inherent in disclosing documents that bear no relevance to the claims at issue, which might include documents containing information that is commercially valuable or personally embarrassing. Indeed, a lesser burden is likely imposed by the disclosure of search terms, all of which, by definition, are relevant to the issues in the case. Thus, the producing party concludes that a seed set consisting of all documents, whether relevant or not, is much more likely than a set of search terms to be inextricably intertwined with—and thereby reflective of—an attorney's thought processes that are privileged by the work-product doctrine.
Even though nearly two years have passed since the Da Silva Moore opinion was issued, how much transparency the rules require remains an open question. Indeed, an Indiana judge recently held, in an opinion quite contrary to the spirit of Da Silva Moore, that, under the current rules, a party need not even identify to its adversary which of the produced documents were a part of the seed set, much less turn over those documents that had been deemed irrelevant and therefore remained unproduced. That case was In re Biomet M2a Magnum Hip Implant Products Liability Litigation, Case No. 3:12-MD-2391, 2013 U.S. Dist. LEXIS 172570 (N.D. Ind. Aug. 21, 2013).
What's more, it is unclear where courts may look for guidance in resolving the issue of having to produce the entire seed set used in a production based on predictive coding. One thought leader on the subject of e-discovery, the Sedona Conference, often suggests that cooperation and transparency go hand-in-hand. Although not squarely addressing the issue, the Sedona Conference recommends that parties "reach agreement on automated search methodology ... [to] locate and produce the most relevant ESI," including keeping records and comparing results while experimenting with different search methods in an effort to agree on which is the most suitable.
Likewise, the recently proposed changes to the Federal Rules of Civil Procedure clearly include a greater role of cooperation, albeit within proportion to the underlying litigation.
The principle in these sources favoring transparency suggests that a court might be inclined to order the production of an entire seed set. But neither the proposed rules nor the Sedona Conference has directly addressed the counterbalancing issue of the protections required by the attorney work-product doctrine. If both parties benefit from the use of computer-assisted technology, perhaps there is a way to reach an agreement as to the parameters surrounding its use. But if the parties have disparate interests and needs with regard to disclosure of privileged information, there will be a lack of incentive to design cooperative solutions, and the parties will be left with uncertainty until the court weighs in on the issue.
It's clear that, for now, parties that choose to use predictive coding cannot answer the basic question of whether they will have to produce an entire seed set, including the documents they have concluded are nonresponsive. The uncertainty surrounding this important strategic issue may serve to stymie the use of predictive coding, at least until the courts provide a clear answer.