Editor’s Note: Early case assessment (ECA) has reached a critical inflection point, where traditional methods that prioritize speed over strategic insight can no longer keep pace with the explosion of data volumes and compressed timelines. In a recent webcast, legal technology experts tackled fundamental ECA challenges, including unvalidated keyword strategies, suboptimal technology timing, and measurement gaps that leave teams with incomplete visibility into their processes. The panel demonstrated how generative AI’s (GenAI) contextual intelligence represents a shift from rigid rule-based systems to nuanced, language-aware document evaluation that brings human-like reasoning to eDiscovery’s earliest stages. Rather than simply critiquing current practices, the experts provided actionable approaches for integrating emerging AI capabilities with established legal standards while maintaining litigation defensibility. What emerges is both practical insight and strategic perspective for legal teams grappling with increasingly complex data landscapes. The discussion addresses how GenAI might transform early matter assessment approaches while navigating the rapidly evolving intersection of legal strategy and artificial intelligence.
Expert Panelists
+ Esther Birnbaum
Executive Vice President of Legal Data Intelligence, HaystackID
+ Jim Sullivan
Chief Executive Officer, eDiscovery AI
+ Young Yu
VP of Advanced Analytics and Strategic Solutions, HaystackID
[Webcast Transcript] Make Your ECA Process Work for You: GenAI’s Role in Enhanced Legal Decision-Making
By HaystackID Staff
Legal teams often face an impossible equation: shrinking timelines, exploding data volumes, and sky-high stakes. Make the wrong call early, and the entire case suffers as a result. Yet traditional ECA tools dump more data on already overwhelmed teams instead of delivering the strategic intelligence they desperately need.
During the recent HaystackID® webcast, “Make Your ECA Process Work for You: GenAI’s Role in Enhanced Legal Decision-Making,” legal technology experts walked attendees through how GenAI is rewriting the rules, turning data chaos into strategic clarity and impossible deadlines into achievable goals. The panel of experts began the presentation by examining current ECA practices and their limitations, as well as the fact that keyword-based workflows often lack proper validation. Key topics included:
- The importance of expert involvement in keyword development
- The timing considerations for continuous active learning (CAL deployment)
- How the data reduction sequence affects outcomes
The timing of everything matters in ECA: when you deploy search terms, how you sequence email threading, and even whether you include folder paths in your searches. Throughout the conversation, the experts highlighted a crucial point often overlooked: the real culling process begins much earlier than most teams realize, starting with initial data collection and custodian decisions. The panel emphasized that successful ECA requires both technical expertise and strategic thinking, particularly since conventional approaches often overlook important data sources and organizational nuances. “To do an ECA properly, you need to understand the structure of your data,” Birnbaum explained.
The panel discussed the practical realities of eDiscovery decision-making, where budget considerations naturally influence strategy choices. Legal teams often strive to strike a balance between cost predictability and document completeness, a common challenge in complex matters. Yu identified an opportunity for improvement: current recall and precision metrics typically cover reviewed documents rather than full datasets, suggesting room for enhanced measurement approaches. As GenAI becomes more prevalent, the panel explored how to effectively integrate statistical AI capabilities with established legal standards and workflows. The speakers made a compelling point on how GenAI’s language-agnostic capabilities could complement existing methods, offering enhanced flexibility for diverse datasets and multilingual content.
GenAI represents an evolution in ECA, moving beyond traditional keyword and TAR approaches to enable more sophisticated document evaluation. This shift introduces contextual, language-aware analysis that functions more like intelligent human review than rigid rule application. “GenAI is taking the task of assessing each document to determine whether or not it passes muster to make it to the review phase, which is just a different consideration than anything we’ve done before with TAR or search terms,” Sullivan said during the webcast. “And now, with GenAI, we are doing significantly more to assess the document because GenAI is considering the context, the words, and what’s being said.”
Watch the webcast recording and read the transcript to learn about the changing industry standards and practices around GenAI and how it has the potential to deliver stronger and more comprehensive results for your ECA processes.
Transcript
Moderator
Hello everyone, and welcome to today’s webinar. We have a great session lined up for you today. Before we get started, there are just a few general housekeeping points to cover. First and foremost, please use the online question tool to post any questions you have, and we will share them with our speakers. Second, if you experience any technical difficulties today, please use the same question tool, and a member of our admin team will be on hand to support you. And finally, just to note, this session is being recorded, and we’ll be sharing a copy of the recording with you via email in the coming days. So, without further ado, I’d like to hand it over to our speakers to get us started.
Esther Birnbaum
Hey, everybody. Okay, I’m going to kick us off. And okay. Welcome to another HaystackID webcast. I hope you all have caught some in the past, but if it’s your first time, welcome. I’m Esther Birnbaum. I will be moderating this conversation, but it will be more of a conversation than a panel with a moderator. This session is called “Make Your ECA Process Work for You,” and it’s going to talk about GenAI’s role in enhanced legal decision-making. This webcast is part of HaystackID’s ongoing educational series designed to help you stay ahead of the curve in achieving your cybersecurity, information governance, and eDiscovery objectives. We are recording today’s webcast for future on-demand viewing, and we’ll make the recording, along with a complete presentation transcript, available on Haystack’s website at haystackid.com. Today, we’ll explore the use of GenAI to streamline and enhance ECA, enabling you to better organize your data, identify patterns, and take action. Before we get into today’s agenda, we’ll start with some quick speaker introductions. I’m Esther Birnbaum. I have been practicing eDiscovery and technology law for far too long. I have both a law firm background and spent several years as in-house counsel. Now, I am at HaystackID, where I run our legal data intelligence department. Jim, do you want to go ahead?
Jim Sullivan
Hey, guys. I’m Jim Sullivan. I am an attorney as well and have been in eDiscovery for a long time. I have always focused on using analytics, predictive coding, and technology to make document review more efficient, faster, and better.
Young Yu
Thanks, Jim. Hi, I’m Young. I’ve been with HaystackID for about seven years now. I’m the VP of Advanced Analytics and Strategic Solutions, focusing primarily on analytics, the use of GenAI workflows, methodology, and sensitivity. I’ve been doing this for a very, very long time, I don’t even want to say. But Esther, back to you.
Esther Birnbaum
Okay. We’re just going to jump right into this conversation. We have a few slides to help guide us and level-set the discussion, providing some background information. Many people will be familiar with the term ECA. When we say ECA, we’re referring to early case assessment. At the outset, when you have a large dataset and need to identify the actual relevant population, we now have a method for doing so. I’ll turn it over to you guys to talk about some of the processes that we currently use to deal with workflows in early case assessment.
Jim Sullivan
And Young, I am curious: when a client comes to you with a new project and runs ECA, what’s your baseline for how you would recommend they start this, and let’s say, without GenAI?
Young Yu
Okay. So I think the sort of baseline, and it’s not really so much of a recommendation that we’re making. Right? Most clients come to us with search terms, date filters, and possibly custodial guidelines. We process the data and apply the search terms, whether it’s at the ECA level or the processing level. Again, search terms are a mixed bag. Right? You can have very broad search terms, you can have very narrow search terms, but that’s the accepted practice. The other thing here is that we have seen clients do targeted collections. For those clients who have engaged with, let’s say, a forensic collection company or have a very robust internal IT department, we’ve received pre-filtered data sets. Our general recommendation is to get a baseline on your full scope of data to see what your relevance rate or your richness is going to be. Taking the broadest measure of your data is likely to yield the most accurate results. Unfiltered. And I know there’s a reluctance to do that. Jim, I’m sure you’ve experienced that in the past. You and I have had many conversations about this, but there are opportunities to focus on specific data sets. And then there are particular matters that don’t really allow you to apply any sort of subjective filters, right? Let’s say search terms are fairly subjective to the matter. In those cases, realistically, it’s much harder to get a handle on what your review population should be.
Jim Sullivan
Are you seeing keywords in a majority of projects now? Is it still the primary way? We always use date filters. We’re going to cull out junk and try to eliminate, maybe de-dupe, de-thread, whatever. Are keywords still the primary method people use?
Young Yu
I think it’s a 50/50 split, whether it’s pre-filtered on collection or whether we’re applying search terms post-processing in ECA; I’d call it about a 50/50 split. Where we’re seeing those larger, unfiltered data sets primarily is in regulatory settings, or, I would say, mostly regulatory-driven, right? Where regulators are like, “Hey, if you want to go with the search term workflow, we’re going to expand the universe. If you don’t do search terms, we’ll stay here.”
Jim Sullivan
Does using continuous active learning change that logic at all, where a lot of the junk documents will be eliminated with a CAL workflow, certainly more efficiently than what search terms would do? Have you actually seen the impact of that?
Young Yu
Everybody’s caught on to CAL, which is a great thing. We have leveraged CAL in ECA. I wouldn’t say it’s a general practice, but I think for some of our clients who’ve gone down that road before and have used CAL extensively, that is a work truck that has been requested, and it’s one that we do offer. And it’s really dependent upon your dataset, and richness always plays a factor—the percentage of documents that are actually relevant from your entire pop. However, the thing with CAL is that you have a set of responsive documents, and you probably have one or two defining characteristics of responsiveness that are very low in richness. So, in a traditional active learning workflow, you typically don’t find those until you’ve exhausted most of the other responsive documents. You probably have a broad category for responsiveness. Let’s say that of all the responsive documents, it covers 80% of that population. Then you have the disparate; I don’t want to call them issues; let’s say buckets of responsiveness that are less prevalent. In a typical active learning workflow, you won’t see those until the very end of a prioritized review. You can get lucky.
Jim Sullivan
But what I’m saying here is that the benefit of using search terms is to reduce the volume of documents you need to review. So, it makes a lot of sense to bring the whole set into a CAL workflow for review, and then many of the documents that you would have culled out would not have made it to review anyway. You might pay for the additional hosting on those documents, but from a culling method, I think you’ll get a much better result if you put the whole data set in and then use CAL to decide what gets reviewed in the review phase.
Young Yu
Yeah, I’d agree with that. I don’t think there’s a very large delta, right? In terms of how much training you need to do, if you focus on keywords or if you go straight to active learning. I know there were a couple of studies done many years ago by other industry professionals, and that talk track remains true. The flip side to that with search terms is that unless you’re validating extensively on your, let’s say, non-search term hit population, you don’t know what you’re missing.
Jim Sullivan
Let’s talk about that. What kind of validation legitimately is happening in most ECA projects? I mean, obviously, we run search terms, test them, receive reports, and make adjustments if things aren’t looking right. But at the end of the day, I’m running my search terms; what am I doing to validate them? What process are you seeing as an ordinary course of business to validate search term results?
Esther Birnbaum
That question goes to where in the process are search terms being decided? If an attorney enters a meeting, conference, or agrees to search terms without consulting with the Discovery teams or the Discovery counsel, you’re often left with search terms that may be overly limiting or overly broad. It depends on where in the workflow the search terms are coming in.
Jim Sullivan
So, would you say, though, that the way search terms are used and when they’re used in the method do vary dramatically? And wouldn’t that be ripe for gaming a system in a way that might get you a result that might steer towards your favor, either in eliminating documents or saving money on certain things? When you’re saying that you could do it in different ways, a lot of that seems like it would be ripe to game a system.
Young Yu
You might be a little harsh there, Jim. The focus of search terms is to minimize data populations. Nobody’s looking at those search terms that aren’t yielding a ton of hits. Right? Objectively speaking, from past experience, everybody focuses on those overly broad terms or those terms that yield a large number of results.
Esther Birnbaum
Always.
Young Yu
A lot of the time, you’re trying to figure out how you can limit the impact of that overly broad term. In terms of the flip side, right? Validation: it’s if you have agreed upon terms with posing; typically, there really isn’t much validation done on what’s been missed. It’s essentially agreed-upon terms; let’s move forward with them.
Jim Sullivan
So you’re saying that when you agree on terms, people don’t even check to see if those terms are returning anything they want or missing anything?
Young Yu
That’s more of an observation than a statement from me.
Esther Birnbaum
We all know what the shortcomings of search terms are. It’s not a scientific process, and I always found it frustrating because we don’t do any validation. The majority of people don’t validate search terms, especially if they’re agreed upon beyond the scope of something you would have to validate. And then we spend so much of our focus on this review population we’ve identified, running metrics and validation on it, but we don’t know what could be left behind. With the introduction of new technology, we can view data differently. We have to not just look at how we can fit the technology into what we’re currently doing, but take a bigger picture and say, what are the processes that we should be doing now that we can do it better? And I honestly think that it’s our ethical duty to do that. We can’t continue relying on the same processes we’ve used for 20 years. Data has changed, data has grown, and our technology has evolved.
Young Yu
I’ll add to that, Esther. You’re right. There’s a layer of semantics. It’s about where and when you apply any technology or methodology. For instance, if you’re looking to leverage multiple data reduction methodologies, such as search terms and email threading. Right? The order of operations matters more than anything else. Right? So if you do, your search term hits first, right? Bring in families, promote those for review, and then perform email threading. The result is much different than, say, running email threading first, running the search terms, and bringing in all threads with families for those search term hits. So if your search term hit is in the middle of an email thread.
Esther Birnbaum
Have you ever seen that directed in an ESI protocol, that it has to be done in that way?
Young Yu
I’ve seen it with federal regulators, and I’ve seen that be an argument point in civil litigation, where it’s a little more contentious because you always want to put that burden of review on the opposing side. There’s an argument that, yes, you can use search terms, but you have to make this concession. Or if you want to stack leverage multiple data-reducing methodologies or technologies, there’s a give-and-take that needs to take place between both parties.
Jim Sullivan
Can you explain that a little bit more? With threading, though, isn’t that, by definition, the most inclusive thread will contain every keyword that any other document in that thread contains?
Young Yu
You run your search terms first, bringing the whole family that you are promoting. So, if your search term is, let’s say, in an email chain of 10 emails, it’s in the fifth email, and the promotion from the review will be from the fifth through the tenth. When you thread, you will receive five to 10. You’ll not get anything previous in the chain. If you do threading first, apply your search terms, and then promote those threads that have search term hits, you will get one through four plus five through 10. Contextually, you may have more information to review and some additional background when you’re conducting your review of the entire email chain. There’s an inclusive email that happens before number five. You’ll have all that context in one example. Another example you’ll have is reviewing search term hits and forwarding, as that’ll presumably exist in inclusive emails after email number five. Again, it’s semantics; it’s a logical progression that you need to consider when deploying these technologies. That is important to note because regardless of the methodology that you’re deploying, it’s where, when, and how you deploy in addition to what’s being examined. Typically, search terms are run on document text, aka the text of the document. Another factor here, one that we could discuss, is whether you run that on your folder paths in addition to your extracted text.
Esther Birnbaum
Are we talking about post-collection?
Young Yu
Post-collection.
Esther Birnbaum
Okay.
Young Yu
You could have someone create a folder named after each project they work on, with a corresponding code name. Then, the document text itself for the documents within that folder would not include that code name. And if that code name is one of your search conditions, it won’t pick up on the documents. However, if you search for it in the folder path, you’ll probably find that hit.
Jim Sullivan
Do you often see [legal teams] negotiate what metadata fields to search? Obviously, the file name and folder path from…Is that stipulated? Do you do what feels right?
Young Yu
I’ve seen it both ways, where it’s either not discussed or discussed expressly. It’s a matter of understanding your client’s data and how you negotiate with the opposing party. Knowing your data is key now at this point in eDiscovery. If you don’t know your data or have someone who understands the architecture behind your data structure, that lift of culling becomes more difficult.
Jim Sullivan
Yeah. What kind of objective goals are you trying to hit? With TAR, we talk about recall precision. It has to be 70% or 80% recall. Whatever with ECA, is there a goal? Or is it a metric that we’re trying to hit or not?
Young Yu
I don’t think in terms of precision or recall. I would say the overarching goal is to limit the number of documents you review and potentially produce. That’s true of any of these methodologies, whether it’s search terms, TAR, CAL, or GenAI; the goal is to minimize the number of documents you ultimately review and produce. And in some instances, let’s say for true ECA, what you’re hosting. The largest portion of the cost here will be the review. So you’re really trying to limit what you review. How do we measure that? There are a couple of things here. Precision and recall are great metrics, but without understanding how many documents you’ve reviewed to get to your precision or recall goals, it’s hard to quantify.
Esther Birnbaum
I want to take a step back. In order to do an ECA properly, you need to understand the structure of your data. But I actually believe it goes before that. The question is: should we be talking about how we’re collecting the data and what parameters we’re applying? We look at ECA as this phase where it’s like, “Okay, here’s the data, now we can do what we need to do with it.” But I think that we’re missing the bigger picture of whether we even know if the correct data was collected. Data collections are not easy.
Jim Sullivan
You’re choosing filtering throughout the entire process. At many points, you dictate which documents are included or excluded from the process through the process you have in place to determine whose data gets collected and which date range. I always found it interesting when there was a lot of discussion around it. I think there was a Tracy Greer article that said the FTC wasn’t going to allow keywords with TAR. And then, a lot of questions came up around whether culling is allowed before using TAR. Is it appropriate to cull before TAR? I wondered, as you said, that throughout the process, we’re culling the data by dictating which custodians we collect from, which means what we’re getting is not the full data set of all data in the universe’s history. And we are culling it significantly before it even gets to the ECA process. So, is there more culling, or is all the culling we’ve already done the only thing that’s allowed? But that’s true; you’re culling it before it gets to the database.
Esther Birnbaum
There’s a bigger picture where we need to start looking at data as a whole before deciding on different collections. And then you need an expert to understand what your data is. I think back to when I was a lawyer at a law firm, serving as outside counsel, and we would conduct custodian interviews. When I went in-house, I realized we weren’t asking the right questions because corporations’ data structures are so complicated that there are questions you don’t even know to ask when you’re outside counsel, such as proprietary data sources or platforms, etc. I know this webinar is supposed to focus on early case assessment, but I think we need to start looking at it beyond the traditional perspective. And Young, if you want to go to the next slide.
Jim Sullivan
I think this next slide talks about what Young was saying about how it’s cost-driven in a lot of cases where a client wants to get to a price point they’re comfortable with, and the accuracy or the excluding relevant material is less of a factor, I’ve seen in a lot of matters.
Esther Birnbaum
And burden.
Jim Sullivan
I mean, and burden. In some way, it is fair in that regard. But yeah, how often is accuracy, or not excluding good documents, a huge priority? And do we know if we are missing things?
Young Yu
Well, Jim, we’ve had this conversation too. Let’s say you run search terms, you promote your search term hits plus families for review, and then you run active learning. When you present metrics for active learning, you’re reporting on what’s been part of that review corpus. You’re not reporting against what’s been collected. Do we add a layer there, or is that metric good enough? Right now, for the status quo, that metric we’re reporting now is good enough. We put X many documents into active learning, and we reached 90% recall with X percent illusion. We’re fine with that. But again, with search terms, we’re not accounting for anything that’s been missed. If you stack layers, for example, TAR I on TAR II, how do you report that? The biggest sort of thing here is people who understand math and stats will make the argument one way, and then you have the sort of legal sort of perspective where it’s like, “We did enough.” “This is accepted.” We’re at a juncture with GenAI where there’s a clash. How do you reconcile the mass with accepted practice or intended practice? And that’s what’s coming next, right? Wherever precedence is reviewed and accepted, I believe that will give us insight into how the future will be shaped. And I think that’s why a lot of people are saying, “I want to use GenAI; it’s really good at what it does, or I get better results than, let’s say, search terms.” Right? But how do y0u present those numbers? After presenting those numbers, how do you move forward?
Jim Sullivan
Well, and what’s the baseline? I mean, what’s the goal here? How accurate? When you present this as two options, stating that if we use keywords, we will achieve this result and incur this cost; if we use GenAI, we will achieve this result and incur this cost. What is our result with keywords? Are we getting 90% of the relevant docs? Are we getting 50% of the relevant docs? Do we know?
Esther Birnbaum
We don’t know. There’s no requirement to validate very frequently, so how would we know if we’re getting the best results? But at the same time, looking at it from the perspective of GenAI, it’s going to be really hard to set that baseline on what you did because there are so many different ways that you can use GenAI over your data. But the question, and we’ve discussed this, is at some point, whose burden is it to figure this out? If I have to produce documents, then the burden is on me to make sure that I’m giving you the relevant documents. Part of my issue is that, having seen the differences in data between various corporations, it will be challenging for anyone to tell me what to do with my data to achieve what is actually responsive. Because it’s not just the technical structure of the data, it’s also internal conversations. So, how do we handle that in the future?
Jim Sullivan
With keywords, we’re basically all the things you said are true, where we don’t really know, and every case is different, but we’re currently just throwing keywords at it and then closing our eyes to see whether or not they’re right. So it’s our current method isn’t great.
Esther Birnbaum
Our current method is that we decided at this ECA phase, where we’ve already collected data, that we could have run threading, who knows when. There’s not necessarily a specific process. Now we take search terms. Usually, they’re shot in the dark, and we’re like, “Now this population is the holy grail of where our relevant documents are.” Now all the scrutiny is on that set that has never been validated. And very often, we don’t even know the collection procedures that go into it. And suddenly, we’re running recall and precision rates and every statistic over this set of data. And it’s like, on one hand, I understand because it’s like there’s the burden, there’s a time element, et cetera. But at what point do we step back and say, “Maybe we need to really look at this entire process that we’re doing and do it differently.”
Jim Sullivan
I mean, that’s the thing. If we’re missing a lot of things at the ECA stage, we just don’t know. And then, no amount of validation at the review phase is going to correct that.
Young Yu
Right. And you’re only reporting on that secondary work track. That’s sort of why I have anxiety today.
Jim Sullivan
I come up with a crazy, narrow search term. I return one document, and I review that one document. I have 100% recall and precision, but that doesn’t mean that I have a good result. And I find it strange the amount of concern and pressure around the validation process in review when the risk for harm is far greater in ECA, and the validation process is non-existent. I can’t reconcile that.
Esther Birnbaum
Yeah. To make it even more complicated, when we have one set of search terms, usually, you can have a couple, but different datasets will respond differently to the same search terms. With so many communication channels, the way you apply search terms in different channels may vary significantly. And none of that’s contemplated.
Jim Sullivan
Right. And I think short messages are the biggest change we’ve had in data, where the biggest impact on search terms, the way people talk in short messages, is going to be different. The language used is different. The number of misspellings, slang, and those types of things is going to make search terms much less effective on particular types of data. And I’m not seeing many people accounting for that when they’re putting together strategies.
Young Yu
There’s also the interjection of multinational business now. Every business has been affected, to some extent, by the availability of the Internet or simply by the passage of time and the increasing globalization of business. I don’t think we’ve encountered a dataset in recent years that didn’t include foreign language content. How many people are adjusting search terms for foreign languages encountered in your dataset?
Jim Sullivan
I just put all my bad language. If I want to keep any secrets, I simply put them in a foreign language, and then I’m not at risk of having them disclosed because they won’t appear in search terms. Right?
Young Yu
Don’t hire, Jim. That’s my advice now. I’m joking. To be transparent, that was a joke. I’m going to move to the next slide, Esther. Is that all right with you?
Esther Birnbaum
Go ahead.
Jim Sullivan
But I am curious, though, specifically about the foreign language aspect. Are you coming up with two sets of search terms for different languages, or is that being contemplated?
Esther Birnbaum
Also, when you’re coming up with search terms, do you even know if there’s a foreign language in your dataset?
Young Yu
If search terms are agreed upon before we receive data, typically, foreign language is not contemplated. It’s only after we run language ID or a reviewer finds a document in a foreign language that the foreign language becomes a consideration. Nobody ever goes back to the well and says, “Hey, we promoted these search term hits; they turn out to be, let’s say, Italian or French. Are we going back to the well and running additional search terms in other languages to make sure that we have that corpus?” I have not seen that as a very common practice. I probably could count on my hands how many times that’s actually happened. However, that being said, it’s essential to understand the dataset your client or corporation uses, as well as your datasets, and to comprehend what could exist within them or what actually does exist and how the data is stored. That all comes into strategy. How do you deploy sort of any sort of culling or search terms or any… going down to the objective filters? How do you do that?
Esther Birnbaum
My favorite type of eDiscovery workflow is in, let’s say, a regulatory investigation where it’s a fact-finding mission, and it’s your burden to figure out what the facts are and then tell the regulator. So you’re not starting with search terms or anything like that when you have the opportunity to say, “This is the team, these are the custodians who are involved, these are the data sources we have.” When the burden is on you to respond appropriately, and you’re not limited by the parameters of anything decided before you, you’re free to explore your data or conduct internal interviews to find out who would have the information, and so on.
Young Yu
It makes me very anxious that this is your favorite. I agree with what you said.
Esther Birnbaum
No one asked me to do an internal investigation here. So you’re good.
Young Yu
That’s a best-case scenario for deploying culling. Because you can run searches and go through your searches. If the burden is on you, it’s really up to you in terms of how you identify pockets of non-relevant and relevant material. I’m sure everybody has their own way of doing this, especially if you’re working on the compliance side. Let’s take an easy example: a CPA. If you’re doing it on the compliance side, I’m sure you have a list of terms that you just run over and over and over again across whatever data set is thrown at you. If we pivot to GenAI, you gain efficiencies. There is a little bit more, I think, education and legwork that needs to happen on the front end. Right? But natural language processing, interpretation, and understanding are baked into GenAI. You can put a prompt in English, and it will return foreign language hits, right? Because it’s fairly language-agnostic for the languages supported by the GenAI model. These large language models have been trained on millions, billions, if not trillions, of documents and have a pretty good grasp of natural language. Some kind of free-form search and having that natural language progression play out instead of being fixed into, let’s say, search terms where even if you use stemming or proximity, you’re trying to bridge the way you think people speak as opposed to, okay, you have a model that understands how people generally communicate. You would yield better results, with far fewer false positives and false negatives.
Jim Sullivan
Can we take a step back to discuss what it means to use GenAI in ECA? There’s not just going to be that I click a button and apply GenAI to my project. Specifically, here, we refer to a process where GenAI reviews documents to determine whether they belong to your review population. GenAI is taking the task of assessing each document to determine whether or not it passes muster to make it to the review phase, which is just a different consideration than anything we’ve done before with TAR or search terms where search terms are if it hits on this word, it’s in if it doesn’t, it’s out. We’re not factoring in a lot of misspellings, slang, abbreviations, or phrases that people might use, such as the context of saying, “Hey, let’s do this.” With TAR, it’s like search terms, where we’re looking at documents that discuss the core issues of our case and hitting on those words or phrases or combinations of things that match what we’re generally looking for. And now, with GenAI, we are doing significantly more to assess the document because GenAI considers the context, the words, and what is being said. It can understand foreign language, so it doesn’t have to use any of those search terms that you’re talking about or that you know of. It doesn’t have to use any of the same things. It can have one narrow reference, and the AI can now understand things that other technologies are simply not capable of. But it’s more like having Generative AI review the documents than what we see with search terms or TAR, where it’s just a different type of assessment.
Young Yu
I agree with that. It’s more subjective than termination, for instance, when it comes to keywords. With keywords, it’s essentially a binary yes or no, right? With GenAI, it’s less binary, as a zero-one or a yes-no. I do think that the user inputs that go into the criteria are going to matter a lot more. And going back to sort of your reference to TAR, the model only knows what you’ve used for training. If you’re not covering all aspects of responsiveness, how can you say that you found everything? The model doesn’t know what you haven’t presented.
Esther Birnbaum
That goes back to the question of what’s the benchmark. How do we define it? With GenAI, there are so many different use cases and ways to use it. That’s going to be a hard thing for us to agree upon in this industry.
Young Yu
I agree with that, too. All the methodologies to date for validation will likely come into play for GenAI validation. In terms of what is the right method or vehicle for validation, it’s really going to be a collective sort of agreement. Right? Everybody’s going to have to sit down and really examine their options. Sedona might have a recommendation as soon as this year, but again, it’s sort of a wait-and-see game for those who want to go first. And for those that do want to deploy or, let’s say, get ahead of precedence here, you can just validate against your whole [dataset], right?
Esther Birnbaum
I’m talking about validation on your ECA. No one’s talking about that. We’re talking about validation for review.
Young Yu
You can use similar methodologies for ECA. In terms of the instances where HaystackID has done search term calibration and validation, it’s something that we do. You’re not only reporting on, let’s say, “This term is bringing back this amount of documents, and for those hits, here’s recall and precision.” You’re looking at it on the opposite end, where [they say]: “Hey, this is our set of documents that didn’t hit on any search terms or null set. You want to sample these to make sure that you didn’t miss anything.” Right? Our clients who are calibrating and validating search terms are absolutely doing that at the ECA phase. Often, those sizes are disproportionate, though.
Esther Birnbaum
These search terms should be validated and calibrated. They’re just often not. A best practice is to validate your search terms. I think that that’s going to be a lot harder when it comes to validating GenAI for an ECA.
Jim Sullivan
Can’t we just do it the same way we’ve done it for relevance for the last decade? You’re trying to determine whether a set of data captured what you’re looking for. We can use the same processes of control sets, calculating recall and precision with the random sample, and determine essentially what we hit on and what we missed. The biggest problem I see is that we don’t have a baseline today to know what we’re getting today. So we don’t know, is 80% recall good in an ECA context? But wouldn’t the process just be the same?
Esther Birnbaum
Yeah. And I always bring any work we do in Discovery with GenAI back to, let’s compare it to the human process: human validation or the validation of human work. That’s really important. But to your point, we don’t always have that. We don’t have the standards in ECA. We might conduct a search and validation when I’m certain Young applies meticulous standards, but we don’t do that across the board. There’s no agreed-upon [standard], “This is what you have to do for ECA.” We’ve conducted extensive testing together, where we can run GenAI across a set of data and also run search terms across it, comparing the results. But maybe we have to start by saying, “Hey, we’ll use search terms, and we’ll use GenAI so that we can validate and we can start getting more comfortable with it.” From day one, I always say let’s test it before you talk about it. Because you need to understand the context of GenAI in an ECA setting, you won’t grasp why it might be better or how it can improve. And then, once you’re seeing what we can do with it, we can run search terms against it. We can do validation in that way. So, if you have a set of search terms and data, you may not want to do it on an active matter once the search terms are agreed upon, but there are ways to get comfortable with it.
Jim Sullivan
Right. I feel like the easiest way in every case, if you want to test GenAI, is to run your keywords across the dataset. Run GenAI across the data set, and then review a small sample in the docs that GenAI hit on those keywords and the keywords hit on, and GenAI didn’t understand what you would capture if you were using GenAI versus keywords. And I’m very confident that the GenAI is going to blow it out of the water in every situation because it has the ability to do so many things that keywords we know they can’t, and that keywords have notoriously been bad for a long time.
Young Yu
A lot of food for thought.
Esther Birnbaum
We have a lot of food for thought. I keep looking at the big picture. We need to consider ECA in the context of the data source, data collection, and data structure. We’re falling behind there because search terms are an objective measure, and everything else isn’t. But does that mean that’s best practice? Does that mean we’re fulfilling our ethical responsibilities? Are we properly doing defensible work? And the answer is yes, but the question is, should that be our standard?
Young Yu
Right.
Esther Birnbaum
We move on to the next slide.
Young Yu
We were getting ahead in our talk track here. We’ve covered most of these objectives here, and the path you take in terms of evaluating your data through the ECA process. I do think it’s very, very important to set goals early on. Right? And I made light of it early on in this conversation about the goal of ECA being to reduce your spending, right? That population of documents that you’re going to review. It’s a worthwhile endeavor for the end client, counsel, and service providers to keep that as a goal. I think clear objectives mean considering the following steps in mind: what are your goals? Do you have a specific number you want to hit or a budget you need to meet, as Esther stated? Or are you more sort of geared into, “Okay, it’s very, very important to set goals early on,” or what it is that measures, right? That will become more important in the days to come.
Jim Sullivan
I mean, when you say it out loud, is it defensible to say that we have a budget that we’re going to focus on, and we’re going to find a way to get the doc count under that budget? Now, I understand that proportionality is going to be a factor, but just hearing it out loud, it essentially means that our focus should be on getting the price under this dollar, which should be on identifying the relevant material with the proportionality balance. And I know that that’s what a lot of people go into it saying, but is that proper?
Young Yu
Proportionality and burden are tied, right? If this matter is worth $3 million and it’s going to cost us $5 million to review documents, then proportionally doesn’t make sense. Proportionality and burden are tied and go hand in hand in decision-making. I do think it’s within ethical duties. I’m not a lawyer, but I’m sure you wouldn’t recommend, as an attorney, pursuing a path that would cost more than your matter is worth, right?
Jim Sullivan
Well, so I guess let’s lay out two options. One would be that we have a budget of $100,000. One option is that we refine our keywords in the ECA process and narrow them down until we have a dataset that we feel we can review for $100,000.I think that that’s the norm, that’s the standard, that’s what we’re doing on a regular basis today. Option two would be to say, ‘Let’s put everything into a CAL workflow and have our review team start with the best documents.’ First, we’re working our way down and then stopping the review once we spend $100,000. Would you agree that the second option is going to get you a better and more complete result? You’re going to get the better documents, but it certainly would never be allowed to be done that way. I can’t imagine somebody agreeing to that type of method. But in the end, again, I think that we’re oftentimes falling within, “This is the way that it’s always been done, so it’s okay,” rather than it’s actually good.
Young Yu
I agree that your second methodology would yield better results. That’s why GenAI is scary for so many people. It is a game changer. The cost versus human review is a fraction. That’s why it’s frightening to the legal industry. And the sort of thing there is, “Okay, if I had $100,000 and let’s say it cost me a dollar a doc for human review, I can review 100,000 documents.” Let’s say GenAI, and I am using round numbers here, costs 50 cents per document. Then, I can review 200,000 documents through GenAI. What is the better approach? Are you doing a human review versus a GenAI review? That’s the question we’re posing to the general public. Where does this sort of overlap, and where does it make sense? How do we move forward here? And to the extent that it’s accepted practice or will become accepted practice, what do we need to report on?
Jim Sullivan
What you just described, though, is the way that I see it: you’re very knowledgeable about the tools available, the methods that you can use, and the approaches. And if I said to you, Young, “I’m reviewing documents and producing them to you, I have this volume of data, and I have this budget, tell me how you want me to spend the budget. I’ll spend it however you want. You can control it.” What would be the optimal strategy for you to uncover the documents you’re looking for in the budget that you have to work with? I mean, isn’t that kind of what a lawyer’s job is to essentially do that, right? Your job is to provide the best output at a reasonable cost while accomplishing everything you’re trying to achieve.
Esther Birnbaum
Well, even if you take costs out of that, and the objective is to respond to what you have to respond to, how do you do it in the most reasonable way?
Jim Sullivan
But is the most reasonable way supposed to be the way that gets you the best result within that budget or goal? And I guess what I’m trying to get at is if I were producing to another party, I would try to negotiate keywords as narrowly as I could to get my dataset down further because that’s going to reduce my costs in a way that is going to yield the worst result, but it accomplishes the goals that I have more clearly. However, if another party were producing for me, that method would absolutely be something I would not accept. Why aren’t we seeing more people saying, “No, you have to use the method that’s going to get us the best results.” Why is it not demanding that?
Young Yu
It’s a double-edged sword. If you demand it, it can be requested of you. Again, you ask good questions, Jim; I thought we promised each other that we would not ask each other these questions during this, but I see you went there.
Jim Sullivan
I don’t remember promising it.
Young Yu
I don’t disagree with your thought process. I think it’s why with a regulatory investigation, whether it’s the CID or, let’s say, a second request, the government gets to dictate what you produce, and they normally say, “Hey, I want this.” Right? “If you’re going to use TAR, I want you to use TAR 1.0. A 75% recall rate is your goal. That’s what I want.” Right? I don’t see a problem with that. But understanding sort of, “Okay, how can I make this proportional?” That is what we will run up against. What’s best versus what it costs and how much, and what the burden is. And the burden is not proportionate when it comes to civil. Typically, let’s say an individual is suing a corporation or a smaller entity suing a larger one. The review burden on each side is not proportional. Typically, individuals will have far fewer documents to review and produce as opposed to corporations. Contextually, it will differ, but in terms of what you’re saying, it does make sense to me. I agree with the thought process.
Jim Sullivan
And it seems that the side with the most interest in using GenAI should be the one that’s getting the data. Even though we flip it around and say that GenAI is a way to save money and time in your review when producing content for others, you can also find the most relevant documents. What we’re really seeing is that if you’re having somebody else produce data for you, demanding they use GenAI is going to get you a much better production set and a much more complete production set. And now it’s going to cost the opposing party less, which might be against your interest for whatever reason. But that seems like really an avenue where you’re talking about ECA, and you’re talking about negotiating keywords, that if I’m on the other side of the table, I’m going to say, “No, I want you to use GenAI.” We can discuss how this approach not only saves you money, but also ensures that I receive a production that contains the material I need.
Esther Birnbaum
Well, a controversial take here. In a lot of ways, GenAI is scary for lawyers because it can be an equalizer in litigation, especially when we talk about asymmetric litigation. GenAI gives the underdog the ability to review documents at a lower cost and at a faster rate. GenAI might be, in some ways, taking out the weaponization of discovery. That’s scary, and nobody wants to say that, but it’s the reality. We have more transparency in our data than we ever did. Let’s start talking about what we should be doing, how we leverage it, where in the process we leverage it, what parameters we can put around it, and how we can standardize it moving forward.
Young Yu
Right.
Esther Birnbaum
Any final thoughts, or is that a mic drop moment?
Young Yu
I’m going to have to think about this for a very long time before I have any final thoughts.
Jim Sullivan
This is super cool and exciting. There are a lot of things changing, but I see most of it is improvement. It’s things that are getting better. It’s getting better, cheaper, faster, and that’s really what we’re trying to do here: improve processes, improve the quality results, improve the speed, and make everything work better. If we look at it that way, GenAI’s already proven to be very successful, and I think that it’s going to even be more in the future as we continue to come up with ways. But that is the exciting part is it’s only getting better across the board.
Young Yu
If you choose to go down this road, I would partner with a provider that has experience and can relay defensibility and minimize risk. I would take a lot of time upfront to walk through the process and outline the methodology. What you’re seeing sort of on the screen now: setting your objectives, understanding the dataset, and analyzing and validating your dataset. Get acceptance from whoever sits on the other side. But I do think it will provide a lot of insight into your data. It will minimize your burden. I do understand the sort of hesitation and anxiety that comes with using GenAI at this stage. Again, if you have the budget or a partner willing to provide you with free clicks, test. It’s the only thing that we can do.
Esther Birnbaum
I’ll piggyback on that a little bit and say we can discuss agreements and disclosures, etc., but there are also many ways you can leverage GenAI to help you in the processes we already have. If you need to identify the relevant data that you don’t know or you don’t even know what keywords to apply, there are different ways you can leverage GenAI. If you’re going to use active learning, GenAI can be used to facilitate that process. There are numerous different use cases. Go out there and educate yourself and explore them. Before we get to that, let’s agree on problems or disclose. If you can utilize GenAI as a tool to enhance your skills as a lawyer or eDiscovery professional, I highly recommend it, provided you approach it in a thoughtful, informed, and measured manner. I would like to thank everyone for joining the webcast today. We value your time and interest in our educational series. We hope you learned something from this. Our next webcast is going to be August 6th, “Detecting the Undetectable: Deepfakes Under the Digital Forensic Microscope.” We’re looking through a GenAI lens there as well, unsurprisingly. During that webinar, we’ll dig into the risk posed by deepfakes to truth, trust, and digital integrity and how forensics can identify, analyze, and combat deepfakes in the legal and investigative context. Check out our website, HaystackID.com, to learn more, register for the upcoming webcasts, and explore our extensive library of on-demand webcasts. Thank you again for joining. Have a great day.
Moderator
That wraps up our master class. Thank you all for joining us today. A special thanks to our speakers, Esther, Jim, and Young, for their time and efforts in preparing and delivering this session. As mentioned earlier, the session was recorded, and we’ll be sharing a copy of the recording with you in the coming days. Thank you once again, and enjoy the rest of your day.
Expert Panelists
+ Esther Birnbaum
Executive Vice President of Legal Data Intelligence, HaystackID
With a robust background in complex litigation and regulatory compliance, Esther brings a wealth of knowledge and practical experience to the table. She uses her unique expertise at the intersection of technology, data, and law to develop best practices and drive innovative workflows across many areas of the business. She enjoys sharing her insights with the wider eDiscovery community and frequently speaks at conferences, webinars, and podcasts on topics related to law and technology.
+ Jim Sullivan
Chief Executive Officer, eDiscovery AI
Jim Sullivan is an accomplished attorney and a leading expert in legal technology. As the co-founder of eDiscovery AI, he is at the forefront of transforming how the legal industry leverages advanced artificial intelligence in document review. With two decades of experience, Jim has become a recognized authority on integrating AI into legal workflows, playing a key role in modernizing eDiscovery practices. Throughout his career, Jim has consulted on thousands of predictive coding projects, utilizing AI to efficiently identify relevant documents in complex, large-scale legal matters. His expertise has made him a sought-after speaker at legal technology conferences, webinars, and meetings, where he advocates for the adoption of AI to improve productivity, accuracy, and defensibility in legal proceedings. Known for his forward-thinking approach, Jim encourages legal professionals to embrace AI as a means to future-proof their careers. In addition to his practical contributions, Jim has co-authored The Book on Predictive Coding: A Simple Guide to Understanding Predictive Coding in e-Discovery and authored The Book on AI Doc Review: A Simple Guide to Understanding the Use of AI in eDiscovery, both of which serve as essential resources for understanding the impact of AI in legal practices.
+ Young Yu
VP of Advanced Analytics and Strategic Solutions, HaystackID
Young Yu joined HaystackID in 2018 and is currently the Vice President of Advanced Analytics and Strategic Solutions. Prior to his current role, Yu was the Director of Advanced Analytics and Strategic Solutions at HaystackID. In this role, Young was the primary strategic and operational adviser to HaystackID clients in matters relating to the planning, execution, and management of eDiscovery activities.
Assisted by GAI and LLM technologies.
SOURCE: HaystackID