[co-authors: Michael Sarlo, Adam Rubinger, Anya Korolyov, Seth Curt Schechtman, Young Yu]
Editor’s Note: On January 13, 2021, HaystackID shared an educational webcast designed to inform and update legal and data discovery professionals on the industry’s most advanced analytics technologies and to highlight recommended workflows and appropriate use cases for achieving quantifiably impactful increases in document review efficiency and accuracy during the use of Technology-Assisted Reviews. While the full recorded presentation is available for on-demand viewing via the HaystackID website, provided below is a transcript of the presentation as well as a PDF version of the accompanying slides for your review and use.
TAR in the Real World: From Promise to Practicality
eDiscovery experts and commentators have championed the promise of technology-assisted review (TAR) since Judge Andrew Peck’s Da Silva Moore decision in February of 2012. But exactly how is TAR faring in the real world of complex discovery? More importantly, how are the latest generation of structured and conceptual analytics tools being used to increase efficiencies and drive positive outcomes, translating TAR’s promise into practical results?
In this practical presentation, eDiscovery analytics and legal review experts will share an overview of the industry’s most advanced analytics technologies and highlight recommended workflows and appropriate use cases for achieving quantifiably impactful increases in document review efficiency and accuracy.
+ Structured Analytics: Threading the Email Needle
+ Conceptual Analytics: From Choices (TAR 1.0 v. 2.0) to Clusters
+ Brains and Brawn: Considering Brainspace and Relativity
+ A Good Stopping Point: The Why and When of Workflow Decisions with Continuous Active Learning
+ Michael Sarlo, EnCE, CBE, CCLO, RCA, CCPA – Michael is a Partner and Senior EVP of eDiscovery and Digital Forensics for HaystackID.
+ Adam Rubinger, JD. – As an EVP with HaystackID, Adam brings more than 20 years of experience and applied expertise in advising, consulting, and managing large-scale eDiscovery projects.
+ Anya Korolyov, Esq. – As Director of Project Management with HaystackID, Anya has 12 years of experience in eDiscovery with extensive expertise with Second Requests as an attorney and consultant.
+ Seth Curt Schechtman, Esq. – As Senior Managing Director of Review Services for HaystackID, Seth has extensive review experience, including class actions, MDLs, and Second Requests.
+ Young Yu – As Director of Client Service with HaystackID, Young is the primary strategic and operational advisor to clients in eDiscovery matters.
Hello, and I hope you’re having a great week. My name is Rob Robinson and on behalf of the entire team at HaystackID, I’d like to thank you for attending today’s presentation titled TAR in the Real World: From Promise to Practicality. Today’s webcast is part of HaystackID’s monthly series of educational presentations conducted on the BrightTALK network and designed to ensure listeners are proactively prepared to achieve their computer forensics, eDiscovery, and legal review objectives during investigations and litigation. Our expert presenters for today’s webcast include five of the industry’s foremost subject matter experts and legal review authorities with extensive experience in supporting technology-assisted reviews.
The first introduction I’d like to make is that of Michael Sarlo. Mike is the Chief Innovation Officer and President of Global Investigations for HaystackID. In this role, Michael facilitates all operations and innovation related eDiscovery, digital forensics, and litigation strategy both in the US and abroad.
Secondly, I’d like to introduce Adam Rubinger. Adam serves as the Chief Client Experience Officer with HaystackID. He brings more than 20 years of experience and applied expertise in advising, consulting, and managing large-scale eDiscovery projects in this role.
Next I’d like to welcome Anya Korolyov, who is the Director of Project Management with HaystackID. Anya has 12 years of experience in eDiscovery with extensive expertise in Second Requests as an attorney and consultant.
I’d also like to highlight Seth Schechtman as a senior managing director of Review Services for HaystackID. Seth has extensive review experience, including class actions, MDLs, and Second Requests.
Finally, I’d like to introduce you today to is Young Yu. Young is the Director of Client Services with HaystackID. In his role, Young is the primary strategic and operational advisor to clients in eDiscovery matters.
HaystackID will record today’s presentation for future viewing, and a copy of presentation materials will be available for all attendees. You can access these materials directly beneath the presentation viewing window on your screen by selecting the Attachments tab on the toolbar’s far-left position beneath the viewing window. Additionally, we do have several poll questions today. These questions will appear under the Vote tab.
At this time, I’d like to turn the mic over to our expert presenters, led by Mike Sarlo, for their comments and considerations on Technology-Assisted Review, and its practical use in the real world. Mike?
Thanks so much, Rob, and thanks, everybody, for joining this month’s webcast. We’re really happy to have you. We’ve got some new speakers on the circuit. Anya and Young, in particular, from an operational standpoint, spend a lot of time dealing with analytics, both from a technology system review standpoint, from a continuous active learning standpoint, structured analytics for some of our most complex matters. Adam Rubinger as well has been advising clients for years on the effective use of these technologies, which we’ve all come to know and love so much, and I myself, I’ve been dealing with data analytics for many years as well. So, we often find sometimes there’s misinformation or disconnects regarding how different features and tools and workflows should be used when you start to hear the analytics word or the technology system review word, and as a vendor, we’re unique from our position to work with many different clients through their workflow expectations, and then on the delivery side.
So, we’re going to start off with a discussion about structured analytics, we’re going to move into conceptual analytics and really break down the differences between TAR 1.0, TAR 2.0, we’re going to highlight some of the differences between Brainspace and relativities, and similarities, and then we’re going to really take a dive into taking a look at when you really stop from more of a CAL standpoint, using that technology to cut a review short.
So, fundamentally eDiscovery has been transformed and is being transformed every day, by the practical application of analytics and from my standpoint, all the cost savings aside, the real goal here is to get the relevant facts to the case teams faster and earlier on in any given matter, and I know Adam has quite a bit of feedback here as well, just from his experience dealing with very large corporations who are leveraging these tools.
Thanks, Mike, and as eDiscovery has matured over the years, the volumes have gotten to the point where it is almost impossible to really do eDiscovery without the use of analytics and technology-assisted review. We’re seeing the rise and adoption at a pretty swift pace. From a client perspective, we’re seeing it’s almost becoming rote now for clients to use TAR, continuous active learning in particular, to assist in both cost savings and getting to the information sooner, as Mike said. From our perspective, we’re seeing clients who use analytics in ways that while they’re intended for that use, they’re using them in ways to really take huge amounts of data and make it more accessible, make it available sooner to the litigators to build their case in chief, for the review teams to be able to get to the most important information quickest, and then ultimately, cost savings is the ultimate goal, which from the perspective of the total cost of doing reviews, and looking at documents and sifting through data, having these tools available, we’re seeing very, very measurable and extensive cost savings and efficiency gains using it. So, from our perspective, eDiscovery is being transformed by the use of analytics, and it is becoming part of the statement or part of the workflow that’s occurring on a day-to-day basis. Almost all of our clients are using analytics in one way or another in just about every case.
Anya, why don’t we start talking about structured analytics next.
Thank you, Adam. So, as Adam and Mike mentioned, the days of linear review, just straight linear review, are pretty much long gone. All of the cases use, at the very least, the structured analytics, and just to go over really quickly some of the basic ones, the language ID, I know it seems a given, but even those cases where the client comes to us and says all custodians are US, there’s no chance we’d have any other foreign language, we still like to run it just to give us a fuller picture, and to know that once we do get to the machine learning part, what we’re dealing with, do we need a subject matter expert who can speak in the foreign language, do we need to do translations, just really quick, get that out of the way. For the near-duplicate analysis, we of course use it for the purpose it was intended to just identify near-duplicates, to make sure that they’re coded the same way, all of the good stuff, but we also use it to help us to train the model when we get to machine learning. Sometimes we get cases and we simply just don’t know where to start. All we have is a pleading, we have some exhibits, so we create documents and we feed them into our population, and we use the near duplicate analysis to help us identify key documents earlier, and help us get a clearer picture and maybe take us from identifying similar documents to also going into name normalization and the communication tool in Brainspace, and with that, we once we run it, we get way more clear a picture than we have with just using the metadata at the top of the email, from/to. We get the full range of who is communicating with whom, on what subjects, and combining the near duplicate analysis and name normalization really does take us a step closer to the machine learning and to getting us to have our key documents that we can use to train the system, and of course, we get to –everybody’s familiar with email threading. Everybody, I’m sure, has at least seen it and email threading is when we have a group of seemingly unrelated emails and we run it, and we get to our inclusive emails, and our inclusive emails are any email with a unique content, so any unique attachment or the last email in the chain, and absolutely, we use it for the purpose as intended and we do have cases where we have agreements right off the bat that we’re only going to review the enclosed emails, but there are many other ways that we have learned to use email threading and incorporate it into our workflow with analytics across the board, and with that, I’d like to hand it over to Seth, and to our first poll.
Seth Curt Schechtman
Thanks, Anya, I appreciate that. So, the first poll question of the day, of the past year, how often have you made use of threading to organize the review and assist with quality control? Now, Rob will open up the poll for us. As the results come in, I’ll talk a little bit about, as Anya already mentioned, you may have ESI agreements in place that allow for suppression of non-inclusive, so document emails that are part of other emails, meaning that less are included. So, if you exclude them from a review, you won’t be excluding the content in them from production. That’s not to say that you’ll always get that. It may be the case with the government, they may not allow it, they may only allow it in certain situations. One in particular that we’ve seen on some second requests is that you might not have to log the non-inclusives if all of their inclusives are coded as privileged, saving some time on the privilege logging sense.
I’ll say a couple of caveats there on when you may not want to suppress. So, as I mentioned, suppress from production, that is. We have seen arguments from attorneys, in particular, for maybe complicated cases, maybe during depositions, where you don’t want to show the deponent the replies to certain emails, so all you want to show is the lesser included, and if you’ve suppressed them from review in production, you may not be able to do that. Also, you may lose some context on the privilege log. There’s certainly some ways around it that we’ve developed, HaystackID has developed, but if you’re not cutting a document and logging it as privileged, you may lose To, From, CC information, but if you can roll up that information from those thread groups, lesser included, which we do have systems and processes, tools to do that, you don’t lose that content.
So, looking at the poll results, it looks like we have plurality on most reviews using threading, every review is 30%, So, that’s all good to see, and then 12% not regularly used. We use it on every single matter, even if you’re not suppressing those documents from review, but you want the documents at minimum sorted when they go to the review team by those thread groups. A lot of our reviews, and we’ll talk about this later, involves TAR or CAL, or cutting off the review, meaning we’re not reviewing every document, every producible document or potentially producible document. You will be setting some documents aside that go straight to production. Now, there are certain emails where you may lose certain search term hits or unpublished search term hits, in particular for Gmail data, but we have seen it with Microsoft as well, where you lose header information on those lesser included, and so if you’re only reviewing documents with privileges, you certainly want to make sure that you’re bringing in full threads if there’s a privilege hit on that email just so you don’t lose potentially… produce a privileged document thinking that it didn’t have an inherited privileged [inaudible].
The other thing that we use it for, and develop scripts and tools, is for QC purposes. We have seen regulators and others, the opposing side, attack redactions are inconsistencies across thread groups. Obviously, we’ve seen that for years across MD5s, individual copies of documents that are different, or that are similar, but in terms of thread group, we’re seeing that more and more. The difficult part with spotting those traditionally has been you only have a thread group and, as we all know, conversations can branch off in multiple different directions, and a seemingly not privileged document can transform into a partially privileged document, which has been forwarded onto an attorney. What our tools do is able to pin down where those discrepancies are occurring across an individual stem across a thread and find out where you have a not privileged document or not privileged part of the stem going to fully privileged, but most likely that fully priv or priv withhold should have been encoded as priv redact, or you’d have a partial priv or a full priv, priv withhold going to non-privileged within the stem and most likely those underlying privileged documents have been released and deemed as not privileged. So, some great tools out there. I certainly recommend threading at a minimum every single case for those reasons, and one thing I didn’t mention, we say for sorting, it speeds up the review, makes sure that the same attorneys are reviewing the same conversations over and over and over again, and are familiar with the context and not having to relearn it or having a new person be learning it.
Thread visualization, some great tools out there as well. For those visual learners, it helps you pin down where those consistencies are seen, but again, an inconsistency on its face may not be without being able to thread down and stem down to see where that inconsistency is taking place across
Thank you, Seth, and again, we wouldn’t use all of these structured analytics, the analytics that are strictly based on tasks without any concepts, just what’s available to us. We use all of them to help us get to the point where we start machine learning and also at the end as the QC, all of them combined really make for a great tool for QC. And with that, we’re going to move into the machine learning, and I’d like to hand this over to Young to introduce us.
Thank you, Anya. When it comes to machine learning, or conceptual analytics, there’s two types. There’s supervised and unsupervised learning. You want to think of these as objective and subjective methods that the system utilizes to categorize similar pools of documents, unsupervised learning will cover clustering and concept searching. These tools provide insight into the conceptual makeup of the document collection without any human reviewer input. It’s a very good way to take a top-level look at the unknowns in your data set, or to confirm any assumptions you may have had going into the start of the project, because it doesn’t require any human inputs.
Supervised learning, that covers your TAR models, and it does require human input. The decisions you’re making for responsiveness, the system will categorize documents and score them. Depending on the model that you pick, the scores will be set in stone, or they’re constantly updated, but the scores indicate a proximity of conceptual similarity to the decisions that you’ve made. Typically, higher scores will be more conceptually similar to a responsive document, and the lower scores will be further away from the responsive decisions you’ve made there.
Anya, do you want to speak to clustering?
Yes, thank you, Young. So, to go over some of the unsupervised learning concepts, so clustering is a great tool, and I know some people have not had much luck with it, but I think the way it was intended and what its real-world use kind of differs. So, we do like to cluster everything right off the bat, because we have found that it does help us to get to know our data, and even if it’s as basic of a step as what we have is a whole bunch of Outlook appointments, and then we need to deal with those, just to get to know not even the concepts, but the data itself, and of course, the concepts as well, to identify similar concepts to if we have key documents, if we’ve identified using the structured analytics where the key documents are, what they are, and it helps us to know where they are in the concept search and to home in on who the communicators are, what they’re talking about. It really helps us visualize everything right off the bat. It also helps us to use it to cut down the data that we need to worry about. By running a simple spam concept search for spam, this is an example we have here is part of the [end run] data. So, if you just run a simple concept search for spam, you very clearly get 62,000 documents, and then with Brainspace capabilities, you get all the similar concepts listed in as part of the cluster, and you can go through them and you can very quickly make a decision to cut out 62,000 documents out of your review and never have to worry about them, never have to look at them again, and it’s a great tool to get us to a more narrow population of documents.
Also, to go back to the Brainspace use of concept searching, which is a little bit different from Relativity’s concept searching where you just get similar concept documents. Brainspace does provide actual similar concepts. So, again, this is going back to everybody’s favorite [end run] data. Searching for a minority investor very quickly, you can see what the similar concepts are in the documents that come back for minority investor, and anybody who’s done any investigation knows the language friend of anything is usually called for something. So, we can select that one, and we can go into those documents and see what they’re talking about and get to the point where we identified the key people, the key concepts very quickly utilizing Brainspace.
And that brings us to actual supervised learning, but before we get there, we want to cover really quickly that not all data goes into machine learning. So, we’d like to talk about data that doesn’t make it in and what issues and solutions we have for those.
Right, and when you’re analyzing datasets for TAR, whether it’s TAR 1.0, TAR 2.0, your guidelines are going to be fairly similar, and they’re typical document types that are recommended to be excluded from your analytics index. Those will include documents with too little or too much text. You have CSV files, your Outlook calendar, the replies or even the invitations that don’t have message body content; audio, video, or image-based files, CAD files falls into that category there. Source code and spreadsheets, and when you’re exploring these pools of documents, there are ways to include them, there are ways to vet these documents. If we want to speak to spreadsheets here, typically, your normal spreadsheet will be numbers based. We have seen instances where it is very text-heavy, and we can do an analysis to see what the ratio of alpha characters stand against numeric characters. I mean, these are all things that you can do to include or exclude various pockets of documents. With audio files, if you have them transcoded or transcribed, that text can actually go in.
There is another bucket here, which we run into very frequently, and that would be short message format. And, Mike, I think you have a great solution here, and do you want to speak to that a little bit here.
Happy to do so, Young, and thank you. So, alternative data types, everybody’s new favorite subject. In eDiscovery, I think we’re being bombarded by new data sources that fall outside of your typical paradigms as far as email is concerned, and just typical e-documents from network shares, and computers. These types of platforms like Slack and Teams, and just chat applications, and just other types of data that doesn’t necessarily lend itself to containing a nice, packaged border around the ideas inside of it, very much like a Word document or an email string would, have become so much more like prevalent really since the start of the pandemic as well, and now that we’re about a year into it, almost every organization big or small is using these tools to enhance their ability for their remote teams to work together. Well, one of the big problems here is short type of message format data, like chats and like texts, we don’t typically write the same way as we do for an email. They’re short, sometimes we don’t use the noun, sometimes there’s emoticons. The fundamental issue here is just not having enough what I like to call a conceptual density in a single text file for analytics engines to understand and to learn from an individual, like text string.
So, from a collection standpoint and a production standpoint, we typically would always recommend to our clients using Slack or Teams to try to bundle channels and channel content on a 24-hour basis. However, we start to think about – getting to these types of communications through any type of analytics platform, that typically sometimes may not be enough text. So, we have some proprietary tools and code that we designed to basically measure and test the efficacy around creating what I would call analytics-ready on my test files, using separate relational fields, where we may have a text file that’s specific for loading into Brainspace or Relativity, NexLP, any tool that’s going to read text that serves as a secondary reference point for these engines to have a little bit more conceptual density, and then these can go through a TAR process, and we get pretty good results here. When we go to produce, we can then actually start to produce on any frequency that our clients would like, insomuch as we use that secondary relational field to backtrack those decisions.
You’ve got to be careful here. Obviously, any time you introduce complexity into the technology-assisted review process, you have to be prepared to attest to the quality of that actual workflow. So, we do have a lot here to use statistical sampling on responsive and non-responsive populations post-TAR on these types of data types, to then be able to work with outside counsel to establish my comfort that the process is working the way they would expect.
Likewise, for mobile phone chats, it’s very similar, and we’re always trying to make sure that those are analytics-ready and text files are along the lines of specific participants, and same thing for chats like Bloomberg, or anything else, will be handled in the same way. This has been huge in some matters for us, where we’ve had [spot] populations totaling tens of millions 24-hour communication strings, multi-terabyte is becoming more common in large enterprises, being able to work with this data through a technology-assisted review workflow in a second request, which was fairly unprecedented. The DOJ actually has worked with us on these workflows, and they’ve been happy with it, which I understand is a first. So, we’re doing this more in civil litigation, we’re doing it more generally, and we have just a lot of the documentation pre-built for our clients to have, really, a defensibility report delivered to them fluidly, and on a repeatable basis as datasets move and expand through the lifetime of a matter, which is important because sometimes you start with one population, and you end up adding more, and that’s something that I’m sure that Anya and Young are really going to dig in with once we start to break down the workflows in TAR 1.0 and TAR 2.0.
Thank you, Mike. I think the short messages format is a very exciting area right now like you said; the DOJ getting involved in making decisions, what’s acceptable, what’s not, and using TAR in short messages, it’s a very exciting time for that. I do want to go back just for a second to clustering and say we have used clustering, and that has helped us quite a bit with the short-term messages, because of the way they’re structured, and so many for Teams, for Slack, so many people entering the room and leaving the room, that a lot of times the names of the people become concepts. So, it’s a great… clustering, it really is a great tool to help us identify that.
And with that, let’s move into the other exciting part of this presentation, is supervised learning, and our next poll. Over the past year, what percentage of matters have required review that you’ve used TAR 1.0 or TAR 2.0 for? Everybody has their own preference. I’m just going to start going over what TAR 1.0 is and TAR 2.0 for those people that have joined us that don’t know, and some of the challenges that we face with both workflows.
And with that, our very first challenge is always defining relevance and Young is going to take us over that one.
So, as you begin any TAR project, whether it’s TAR 1.0 with sample-based learning or TAR 2.0 with active learning, you have to define relevance. It has to be a binary decision, meaning it’s a yes or no choice. You don’t want to be overly narrow because you will miss peripherally or portion… of documents that are partially responsive there, and then it swings the same the other way. If you’re overly broad in your definition of responsiveness, the system will just be over-inclusive and bring back almost anything that touches on the decisions that you’re making. As you’re going through the process, when you define responsiveness, you really have to think of the conceptual relationships between documents, and it’s a deviation from linear review, where you’re not looking at an entire document family. Each document should be considered a standalone record, and that decision for responsiveness needs to be made at face value at the four corners of the respective document that you’re looking at. And as you’re going through the process as well, that definition of responsiveness or relevance, it’s huge, because it’s the measure of the TAR process. In TAR 1.0, precision is going to be measured against your definition of responsiveness. The scores all correlate directly to that definition of responsiveness. Unfortunately, if you do have a shift in scope for that definition of responsiveness, let’s say you learn something later down the line, or you’ve completed your project, and now they ask is different from a regulator or from posing, you have to learn how to shift or morph that definition of responsiveness. Sometimes you can just pick up from where you left off, and broaden the scope, and there will be times where you might have to start that entire project over. It just really depends on how well you define relevance and responsiveness very early on.
I agree with you that that is one of the most important decisions, and even once you’ve made the decision which one to go with, I think that still continues to kind of [inaudible] over where you are in your project.
Thank you, everybody for joining the call, and it look like half have used it. I’m still going to go over the definition and just the general workflow. So, we have here our TAR 1.0 flowchart, and again, I do want to touch base, again, that there will be documents that Young discussed that will not be part of the entire workflow; the exclusion documents, the JPGs, potentially the spreadsheets, things like that. So, once we have the index without exclusion documents that will still potentially need to be reviewed, we have a subject matter expert that will need to review the control set, and once the control set is reviewed, we will get to the point where the margin of error level has been achieved or not, and that’s where the defining of relevance really comes into place. Because if you have a super low richness data, the subject matter expert is going to spend a lot of time in this loop, where we will have to review additional documents for the control set to be closed out.
So, this is very important, and actually this, I think, is the part where, even here you might say, you know what, maybe TAR 1.0 was not the best option for me. I need to move into TAR 2.0. But once you’ve achieved that margin of error level, and you move into training rounds, normally we see somewhere between one to five training rounds, and they usually range somewhere between 300 to 500 documents. Again, all of this depends on the data. If we started off with 10 million documents, the training rounds are going to be a little bit different. And you keep going with the training rounds until you get to your desired precision and stability, and what that means is, it depends on a case by case. There might be opposing counsel that wants to see the reports and wants to see where you are. There might be the Department of Justice and they want to know where you are, and you might never get to the point that everybody recommends. Relativity, Brainspace, all the experts recommend getting to 65, 70%. You might never get there. You might be at 40% or something like that, but you’re just not moving, you will have continuously potentially around 40%, and that’s where you are, and then that point is when you make the decision to stop and go ahead, and code your documents as responsive, not responsive, and move on to the privilege review. So, really you have to make sure that you’re looking at the data, you’re looking at your reports, and you’re making informed decisions with TAR 1.0.
As far as training rounds are concerned, we at HaystackID use Relativity and Brainspace, which in our opinion are some of the best products out there for TAR 1.0 workflow, and with Relativity, you have some options. You have your basic statistical, which will usually pull about 300 documents; you have your percentage, where you tell the system what’s the percentage of the documents that you want to use for the training rounds; and of course, you have your fixed sample. You have the stratified, which is probably one of the best ones, because what it will do is it will identify documents. It will identify the documents that are mostly related in concepts to the documents you’ve already coded as part of the previous training rounds, and they will also make sure that it covers the biggest population of documents. So, with every training round, it will give you the documents that will carry out the concepts to the largest pools of the documents that you still have left.
In our opinion, what Brainspace has done is it took it one step further. It has three different kinds of training rounds for the Relativity stratified. So, you have your influential which is going to be the one you most use, and is the most similar one too stratified. The same thing; it is just going to pick the most documents that are closest, and it’s going to try to cover as much as possible of the population you have left. And then you have your fast active and your diverse active, and in our experience, we have found that the larger datasets get, the more results we get with fast active and diverse active. We have used influential several times and seen absolutely no movement, and then switched over and got huge jumps. So, again, it’s always the data that speaks to you, it’s always what’s in front of you. You have to really read these reports and analyze them, and not just say, OK, well, this is the recommended approach and I’m going to go with it.
There’s also the random, which again is the fixed sample and the random, with Brainspace, it does allow you to create a notebook, and it can… with creating a notebook, you can put in the documents that you think are most important in your case. So, if you discovered something, you can put them in there, the most not responsive, most responsive, but you have to be very careful because especially when you’re dealing with a government entity, there will be a certification to the process. So, you have to be very careful in how the certification is phrased and which one you’re using.
Really quickly just to go over again between Brainspace and Relativity, which might help you decide which one you want to try. The reporting in Brainspace is kind of – I don’t want to say the word “basic”, but it just gives you the information of where you are in the process. So, it will give you an Excel spreadsheet that will list the control rounds, the training grounds, and with every round you run, you just pull the report for that round. Relativity’s reporting is a little fancier. This is just two of the things that it provides, and you can kind of get a little bit better, especially if you’re a legal support person and you have the legal team asking you, where are we? How many more documents? How many are uncategorized? It’s a little bit easier to just get that information right off the bat with Relativity.
Again, a lot of it depends on whether you’re going to be passing reports onto the opposing side or to the Government entity, so you kind of have to make that decision. I think both tools are great. In our experience, we have used both for TAR 1.0 with large datasets, and we think we’ve got pretty good results even when we moved onto the privileged review and did a little bit of QC of what was considered of not responsive. We’ve gotten very good results using both tools.
Anya, one of the questions we received from the audience is, are there instances where you would recommend TAR 1.0 over TAR 2.0?
I can think of a couple that I would want to mention, and anyone else can certainly chime in.
I just wanted to cover the TAR 2.0 workflow, and then we’re definitely going to go over that and say the pros and cons and when we recommend one or the other just a little bit later.
So, really quickly, TAR 2.0 Continuous Active Learning. Again, you will always have your documents that are the exclusions. You will still have to review them, keep that in mind. But instead of having a control set, training rounds, ideally, you would like to have a subject matter expert or somebody or have key documents that will kick it off. Preferably, 100-500 documents depending on your population. And then you have your review team that starts to teach the model, what is responsive, what is not responsive. So, it continuously learns from every decision that is made.
And then you get to the point where you either see a clear break between responsive and not responsive, or you get to the point where you no longer see any responsive documents and you say, ‘OK, I think I’m done and I’m going to perform my QC elusion test and see if there’s anything responsive and then you close out the project, and you either moved onto the privileged review or the project is done.
So, again, in our experience with CAL, Relativity has been kind of a better platform, because it’s all in Relativity, but we have also seen very good results with Brainspace, just a little bit more work on the vendor part and we really don’t mind. And we have used TAR 2.0 in the very traditional sense in the workflow that you see in front of you (the recommended workflow), meaning review until you get to the point where you no longer see any responsive documents at all.
And with that, I would like to move to our next poll question, which is our last poll question, which is “What percentage of matters that have used TAR 2.0 employ a workflow where the learning algorithm is trained, and the review is cut off prior to placing eyes on all responsive documents that are produced?”
So, meaning – to just go back to my slide – this is your traditional… if there were any reviews where you used an alternative solution, where you started looking at what the system thinks are not responsive, or you just kept going with the recommended workflow.
Seth Curt Schechtman
I think the key there, Anya, is when you have large volumes of data and you’re running it through CAL, do you want to keep reviewing if the algorithm has been trained. The question becomes do you want to review a million documents, even if it’s a low [inaudible], because you have such a large set to begin with. Why continue if the documents don’t need to be issue coded or reviewed for [inaudible] or for other reasons. Why not stop?
Definitely, definitely. There are many considerations with TAR 2.0 that you have to keep in the back of your mind, and they are listed here. And some of them are families and privilege, which kind of go hand in hand. Are we concerned that privilege needs to be carried out across the family? Are we going to do a separate privileged review, or do we just kind of trust that the privilege is based on the four corners of the document? That is definitely a consideration.
Another that Seth just brought up is how many documents are we starting out with. In our experience, and everything I kind of readout there, all the whitepapers say with CAL, it’s usually going to end up reviewing somewhere between 15-20% of your population, of course depending on the richness, to go back to what Young said about relevance. But what if you’re starting out with 10 million documents? 15-20% of that is still quite large. Do you have the time to go through all of those documents? Do you have the resources to have all of those documents reviewed? Or do you look at the data at some point and say, these are my facts, this is where I am, I have this many documents that the system already thinks are responsive, I have this much money that my client is willing to pay, and what decision do I make at this point? Do I continue or do I cut it off?
This part of cutting off or starting to go to what the system thinks is not responsive documents is a conversation that we have with our clients very often, because they want to be done. They want to close it out. They are ready to go. It’s kind of a struggle for us to recommend one or the other, because we can present them with the facts, but they have to make that decision for themselves, and where they are in the litigation.
Young, Seth, I know you guys have a lot of experience here recommending the cutoff and kind of deciding what are we going to do here.
There are various methodologies you can employ here. With any active learning model, you’re going to see a precipitous drop or, let’s say in an ideal case, right. But the name of this presentation is TAR in the Real World, you might not ever see that precipitous drop. You might have a steadily climbing score, no gaps in the middle, no clear break from responsive and not. So, what do you do?
Let’s say… I’ll just throw out numbers. Let’s say, you have a score of 65 and we’re considering that borderline responsive, the recommendation from us would be, ‘hey, why don’t you sample from 55 through 64 and see what the rate of responsiveness is there, we’ll do a random sampling out of that pool or it doesn’t have to be random, you can employ any sort of methodology, so long as it’s documented and repeatable’. You do the sampling, and if the numbers all make sense and you can say, OK, this 65 is a good number, we’ve sampled around it, we’ve gone over this and it all makes sense, there’s no reason why you couldn’t stop at 65.
Let’s say, it’s the other way around, though, and you’ve sampled from 55-64 and your rate of responsiveness is higher than it should be, you’re going to have to just keep going and either continue reviewing or say, ‘OK, we can’t use 65 as the cutoff, what happens if we drop down to 60?’ Those decisions all have to be factored and weighed. You have to consider what your estimated richness or richness being the percentage of responsive documents in your dataset. Its estimated… because if we knew what it was… all of this would be, you push the button and you’re done.
I’m looking at the poll results and I see quite a few people do cut it off before they lay their eyes and everything. That’s great to know. thank you, Young.
This is just kind of TAR 1.0/TAR 2.0 overview, and now that you have done all this work, in your case, it has finished, what do you do with the results and do you keep them for the future use. And that brings us to Portable Models.
Right, so depending on the application that you’re using, you may be able to reuse all that work product that’s gone into this process. but that’s caveated by a few things here. Typically, what we would like to do or what we recommend here is building a model that if you are going to build a reusable model, what you want to do is build it around specific topics. If you have serial litigants that are always involved in the same type of litigation. If it’s a specific type of litigation such as employment or FCPA, antitrust, or creating a model to identify junk or auto-replies, potentially privileged. These are all very, very specific to what you want to build. But if you build one that works, you can apply it over and over again.
Now, every dataset is different, and all of these factors have to weighed, but if you have a repeat client and you’re intimate with their data and privilege is always going to be the same, junk is always going to be the same. Or here are the five types of litigation that this one client faces day-in/day-out, you should be able to build a model around that. And once you have that model, it’s a great place to start. You’ve already done the work behind the scenes; you can apply that model to that dataset, and it gives you a place to start. It doesn’t mean that the process will be 100% completed, but it gives you great insight, and also as you refine that model, because you’re going to continue to work in there, you can refine that model and truly build something that will get you 60-70% of the way there from day one.
Seth Curt Schechtman
Hey, Adam, why don’t you break in here and talk about how clients are reaching out for this stuff, asking about our abilities here.
Yes, and I think clients are certainly looking at ways to reuse work product from MD5 hash databases to the use of portable models when you do have these kinds of repeat custodians, repeat issues. There’s a great opportunity, I think, to drive further savings by the reuse of these types of – this type of information, classifiers specifically, and we’re starting to see that take place. It’s sort of just starting. I think the technology is evolving to the point where it’s becoming useful and capable, so I do believe that we’re going to see a lot more usage of reuse of data and things like that.
Thank you, Adam and Young. And now, this brings us to the difference between TAR 1.0 and TAR 2.0 and which ones would we normally recommend, which I believe is one of the questions that we’re being asked.
And I kind of covered some of the differences. Again, TAR 1.0, you have one or two subject matter experts that are coding the documents. The cost is minimal there on the one point. On the other hand, the subject matter expert costs a little bit more than a managed review solution.
There is Continuous Active Learning. You have many people that are making decisions, and that’s not always the best thing. The more people you have, the more interpretation of what is responsive for this project is. But at the same time, it allows you to learn new responsiveness. Documents come up during the review, and that could change. Something that wasn’t considered responsive, you find an email and you say, ‘oh, that’s what they’re talking about, oh yes, that’s definitely responsive’, and you update the system, and it lets you learn.
So, again, it all depends on what kind of case are you dealing with and what kind of deadline are you dealing with. What is your budget? All of these questions need to be asked and we always ask them of our clients because that’s going to drive the decision. If you have a Second Request of epic proportions and you have three months [inaudible], if you have even three months to go through millions of documents and you have to consider reviewing not only for responsiveness, but you have to consider privilege and you have to consider finding key documents about the merger, you’re kind of going to go with the TAR 1.0, because you just want to get in there, you want to code it out, you want to say, ‘this is the population that’s responsive, I am substantially complying, I am clear, I am good, I don’t want to continuously learn, I don’t want to know, I wash my hands off of this’.
In another case, you have a case that’s not in a rush and is a little bit lower on the data side and you have no idea what you’re even looking for, you have very little key documents to start with and you want to know what the data is going to show and you expect the responsiveness to change, and it’s an investigative matter. In that case, we will definitely recommend TAR 2.0.
We don’t really say one is better than the other. We have had instances, and Seth and Young can speak to that, where we started with TAR 1.0 because that’s what the client preferred, and then we got to the point where we were making absolutely no progress with 1.0 because the richness was so low. And we said, at this point, we feel like you’re just spending money for no reason having a subject matter review these documents, let’s move this into a review, when all is said and done, we think we will actually save you money going to TAR 2.0 and kind of actually doing a hybrid model there between the two’.
Just to circle back to the question here, instances where we would recommend TAR 1.0 over TAR 2.0, each case is different, the timeline, definitely, any sorts of deadlines weigh heavily into the decision that’s made, but I think more importantly, it’s going to be the richness of that dataset. If you’re saying 40-50% of that dataset is going to be responsive, do you really want to go into an active learning model and try to figure out where to cut off or get through that percentage.
If richness is very low, let’s say it’s under 10%, there’s a chance that during your control set, you might not find any responsive documents, so you have to keep pulling extra documents into your control set. That control set can go on for a very long time, until you have the proper number of responsive documents.
That sort of goes to that other question there. “What’s the difference between a control set and a training round?”
A control set is going to be a random pull of documents, and it’s the measure that the training rounds are compared against. It’s a random pull of documents. The number of documents is going to depend on your confidence and your margin of error. It’s implied in some applications that richness affects it. Brainspace certainly takes it into account. There are other applications that don’t. But in terms of the recommendation between TAR 1.0 and TAR 2.0, it’s really getting to know your data. The rate of responsiveness, the conceptual diversity there, your timeline, your budget, all of that is going to factor into that decision.
Seth Curt Schechtman
And one thing, Young, that I’ll add from the review perspective, do all documents need [issue checks]? If you are, then you have to review them all either way you look at it. We don’t see it often. Second Requests and the larger matters that we have, they will just go straight out the door without eyes on them, something [inaudible] for privilege or PII or hot terms, but you’ve got to issue tag, you’ve got to lay eyes on all of them. So, you’re using TAR 1.0 or trying to use 2.0 to cut off the review before you’ve looked at everything potentially producible, it’s not going to happen.
Thank you both. And with that, I would like to go into what a lot of people are interested in, and one of the major decisions that plays a role is the cost. How much do you save using each and every one of these tools? I’m going to hand it back to Mike and Adam to talk about that.
The cost of review. For example, we have a case here where we had almost 2.9 million documents that were in the review population. Being able to only look at 12,000 of those to train a model, to identify responsiveness, the cost savings are in the millions on the high end, $8 million in some cases for more complex matters.
In almost all cases, you’ll always a realize cost savings with either TAR or CAL, and these presentations will be available for download, everything is recorded if you’re interested in some good metrics and we’re happy to get into the granularities of any of these case studies.
In particular, Case Number 1 here is one of those cases where we actually were able to run Slack data through the TAR model in an agreed-upon protocol with the Department of Justice. So, a lot of these documents actually contain many, many, many more smaller communications, because again, we were working with those merged up secondary analytics-ready text files that we create through our custom algorithms over here.
I always encourage my clients to consider using TAR or using CAL and just going back to really just, in general… you can use these tools certainly to QC. You can use these tools to find more documents that you’re interested in. You can do feeds. You can batch documents at different levels of a [conference like] interval. There are so many different ways to use pieces of Technology-assisted Review to enhance any review both from a quality standpoint, QC, and to reduce risk, and to help you find those needle-in-the-HaystackID-type documents.
And even going back to classifiers and things like that, being able to take those out and move them from case to case at an individual client level is great, but we’re also now being asked to use these types of tools proactively from a compliance standpoint for organizations who are trying to identify risk as it’s happening. We are really using the underlying text to aid in more compliance workflows, analyzing email on a weekly basis or live for key concepts.
I encourage everybody to really think outside the box here as well, because there’s a lot of value you can provide your clients when you start thinking about the extended applications of Technology-assisted Review.
Thanks, Mike. And I just want to throw it back to Adam to cover the CAL costs as well.
And one of the interesting differences between these two tables is the TAR 1.0 theoretical train-the-database and then sort of stop review and the system predicts. With CAL, as Anya and Young and Seth have outlined, you’re putting a team of reviewers on the matter and they’re starting to review, and as the system learns, you go from very responsive documents down to very non-responsive documents. And these real-world examples here illustrate how we start with some fairly large corpus of data, and there are a few outliers that are interesting. As the numbers are lower, you can see that it’s taking longer to have the system stabilize and find a point where you can cut off the review. And as the larger numbers illustrate, you can see some significant savings by using this strategy, especially when you really have low richness, and you still have to do substance review to build your Case in Chief and be able to classify documents and look at them throughout – as you’re building your production sets and things like that.
In all cases, we’re seeing both TAR 1.0 and TAR 2.0 save considerable amounts of money that make it absolutely worth it in even the smallest of cases.
Thank you, Adam. Just with that, and I think this next slide will also in our talking about it will answer some of the questions. What’s Next in Analytics?
Now, that we’re here, what’s coming up next? And I think the first thing
of a hybrid model kind of goes to what is TAR 3.0 and I think Brainspace is actually making great waves there, so Young, if you would like to take over your favorite topic.
So, Brainspace in the latest release has introduced the implementation of a control set regardless of whether it’s active learning or TAR 1.0. What that really means for active learning is you can get very easily… you can easily measure recall precision. Typically, it’s a little more difficult, the math can definitely be worked out, it’s a little manual, but it gives you the same look and feel as if you were running a predictive coding or a TAR 1.0 project.
For TAR 1.0, what does it mean? If you have a shift in responsiveness or if responsiveness changes over time, you can drop in another control set to act as a second measure. It does allow flexibility to go from TAR 1.0 to TAR 2.0, or in any sort of odd scenario, go from TAR 2.0 to TAR 1.0, but it does give you more visibility into the metrics.
There’s a couple of questions out there that touch on this. Anya, do you mind if I just go through them very quickly?
Of course, go ahead.
OK, so the first one is “Accepted recall of about 80% is defensible”.
80% is kind of high. Typically, we recommend about 75% recall. It’s a seesaw, the higher the recall, you have a trade-off of precision. 75% is typically accepted. Going higher than that with a higher margin of error, or let’s say, a higher confidence level and lower margin of error, that’s not really conducive to TAR 1.0. It means you will have to review a lot more documents.
For the suggested data sizes, how many documents you need to train, it’s proportional. When you’re looking at these numbers, richness definitely comes into play. To the extent that you have an agreement with opposing or regulators, it’s proportionality. If you have 3 million documents, how many decisions do you need to make for it to seem reasonable to say we’ve trained the model. In a TAR 1.0 scenario, you’re looking to hit stabilization. Stabilization is where you’re no longer seeing huge changes in precision and [depth] for recall. So, what’s happening there is the decisions you’ve made are consistent, you’re not seeing precision go from 55-60% to 70%, you have almost like a straight-line average.
In active learning, the industry sort of reads that as somewhere in between 10 and 15% of your population before you’ve sufficiently trained the model. The caveat there is always going to be conceptual diversity. So, you only know what you know. when you’re judging responsiveness on a concept that you have not encountered before, how many of those concepts exist. So, the clustering and the concept searching that we recommend upfront plays heavily into this. If you can say that you’ve done your spread, you’ve done your coverage and we know 90-95% of the concepts within our data population, 10% it can work. Again, every dataset is different, and I hate to give an “it depends” answer, but there are a couple of factors you need to take. It’s also the reason why, as you’re going through the process, having a subject matter expert who can attest to the process and document the process and present that, that’s very important to have.
Anya, I didn’t mean to go off-topic here, go ahead.
No, no, you’re fine. Since we’re already in the questions, I’m just going to take, “Is the near duplicate the same as find similar?”
It is not the same. The near duplicate is only based on the text of the document. The analysis will take the actual text of the document and compare the actual words across the document. It will find the document that has the most text and then rank all the other documents on a percentage similar to those. Find similar is more of an “analytics” concept, where it will find conceptually similar documents, not necessarily textually similar documents.
I think Seth would be great to answer the relevant and responsive question.
Seth Curt Schechtman
I’ll get to that. Just one question that Young had answered, so I think there was a question between human manual review and computer-assisted review.
Assuming humans were [inaudible] to any review, to every document and whether you ran search terms or not and they find 100%, then you’re going to say the algorithms are going to find 80%, maybe at best, 90% may be the best you’ve ever seen. What’s the cost of finding those other documents? That’s what it comes down to. It comes down to proportionality. Are you spending oodles and oodles of money pouring through 95, 98, 99 non-responsive documents to find those other responsive ones? That’s what it really comes down to. The answer is you’ve got to find every single one, whether it’s because – I’ll say it’s a make-or-break case, or because it’s, I don’t know, maybe an internal investigation and one document can make a difference, then maybe you want to review every single one. Maybe you want to find every single one. It depends on what’s the cost and whether you can get the other side of the government to allow you to agree to these things. TAR is well accepted in case law. You want to use it to save money, but in certain instances where you may not want to use it.
Turning to the question of relevance versus responsiveness. One of my all-time favorite questions in review. I’ll say it depends. Relevance is broader. Responsiveness is narrow. When you get a request for production, they’re asking for things that are responsive. That doesn’t mean they left out a whole bunch of stuff that may be pertains to the case, relevant to the case, relevant to the matter, but they just haven’t asked for it. When you’re training algorithm, if the end is going to be, we’re producing this set, whether we cut off a review or not, you want to go with responsiveness, right, because that’s what they were entitled to. You don’t want to give the other side all of this other stuff that may be pertaining to the case, but may not be responsive. Specifically requested, you don’t want to be overbroad on your productions, but excellent question and I hope I’ve answered it.
So, to the other question out there in terms of negotiating with TAR ESI protocols. Precision and recall. My answer is going to be a non-answer here. I wouldn’t promise anything. Recall, 75% is an acceptable tolerance. For precision, it’s a harder question. Really, it’s going to depend on how much review you want to do. Also, the definition of responsiveness will play heavy there. acceptable precision. Generally, I would like to see higher than 65% precision, that’s what I look for. Not all cases are built the same. Not all datasets are built the same. We’ve seen as low as low 20s, high teens, and we’re still able to get approval on that process. So, in terms of negotiations, I would not cement precision in stone.
TAR 3.0, we can wait until Mike speaks to some of the other What’s Next in Analytics.
And just to go back to what’s acceptable, what’s not. We’ve certainly had cases where, like Young said, we always, at HaystackID, recommend 65, but we’ve had cases that never got above 23, 24, but we reached stabilization and we’ve… the attorneys were able to take the data, to take the reports, go back and say, ‘look, I know we kind of started out wanting 65, but this is where we are, let’s agree to cut it off, let’s agree to end the review here and just go onto the production’. So, again, it’s always… you always want to look at the data available to you and I know, as lawyers, data is overwhelming and the reports and all of that, but you still want to make sure that you look what’s in front of you, consider everything, including cost and where you need to be at the end, what makes the most sense for the client.
Unless somebody else has more input, I want to throw it back to Mike to talk about the exciting things that are sentiment analysis, the emojis, financial data, PII, PHI, all of that good stuff.
Sure, thanks so much, Anya, and we’re going to be running short on time, so I’ll make it pretty quick. The key takeaways, and I always tell people this, analytics from an eDiscovery standpoint, and the engines and the tools and underlying technology and the application of it is not as advanced as other industries that may rely on data analytics. We just don’t need many of the applications or many of the customized libraries and tools required, a more nuanced approach that is specific to an organization and their data or a problem you’re trying to solve. We spent quite a bit of time working with these kind of off-market-type analytics tools, be it open source like graph databases, like Neo4j, which can allow you to do some really interesting things.
Where we’re seeing things as well and really where you’re getting much better analytics capabilities from a sampling standpoint and just being able to do more with your data is just more access to hardware. Putting things up to the cloud, it’s very cheap to do big data lake calculations from a computational standpoint, and ultimately, from a cost standpoint. Sometimes I think about how long that the stuff would have taken three, four, five years ago as you get into more advanced features to analyze your data. We’re using graph databases to analyze much larger financial datasets, like call logs. We are tying together user activity across a broad array of systems to actual documents that are being created in a timeline, just more investigative services.
And really, for everybody here too, dealing with all of the PHI, PII, GDPR, data privacy and being able to identify that as… we actually are doing quite a bit of work with our own homegrown engines and then also relying on APIs from Google and from Microsoft and from Amazon that all do different pieces of the PII detection puzzle, so that’s something that we’re offering to clients today actually as well. Really, in our post-breach discovery, like cyber, like reviewing practice, but all of our multinational matters where we’re dealing with data that may be in APAC or in Europe, and with GDPR issues, being able to identify PII early on is so important. Keyword searches only work so well, so I would encourage everybody to do some exploration here. There’s a lot of open-source tools and just really great resources on the internet in these domains.
Thanks, Mike. I know we’re running out on time here, but I did want to address the TAR 3.0 question. TAR 3.0, I don’t want to say it’s a throwback to TAR 1.0, but it takes a similar approach. There is additional layering here, so traditionally what you’ll see in clustering is you’ll get a central layer or cluster and then it goes out into the outer arms. So, with TAR 3.0, think of it more as a Venn diagram where a document can live in multiple Venn diagrams just sitting on top of each other. You can have a document that actually lives in 40,000 clusters.
What it’s doing is it taking cluster cores, sending them to you. When you make a decision for responsiveness, it’s drilling a layer down and then asking you to code the subsequent underlying layer. It’s very hard to visualize. There is a very good blog about this. If you just type in “TAR 3.0”, you’ll be able to do some light reading. Alternatively, you can definitely reach out to us and we can give you a consultation there.
Yes, and I would touch on to say that it’s very much a workflow. You can simulate the effects of TAR 3.0 through the tactile use of different training rounds capabilities in Brainspace. Also coupled with strategic sampling upfront, doing some search term analysis and then frontloading some of those results into the model almost like a pre-train and feeds, but then using those to kind of get documents you know are hot or relevant early on in the TAR 2.0 process. So, we can jumpstart the models that way, and you are oftentimes getting very much the same effects here. I think TAR 3.0 is really very much it’s that hybrid workflow depending on who you talk to. There’s other platforms out there that have started to try to brand this more algorithmically. It’s an entirely different process, but I would say that all the major analytics platforms offer some capabilities in this domain.
All right, well, thank you all very much. I’m going to kick it off to Rob Robinson to close this out. We really appreciate all of you joining today. Feel free to reach out with any questions. We’ll be happy to answer them. We’re always available. Just shoot an email or hit us on our website.
Thank you very much, Mike. And thank you to the entire team for the excellent information and insight today. We also want to take the time to thank each and every one of you who attended today’s webcast. We truly know how valuable your time is and we appreciate you sharing it with us today.
Lastly, I do want to highlight the fact that we hope you have a chance to attend our monthly webcast scheduled for February 17th at 12 p.m. Eastern, and it will be on the topic of data breach, discovery, and review. In this upcoming presentation, we’ll have cybersecurity experts, privacy experts, and legal discovery experts who will share how organizations can prepare to respond to a cyber-related incident, and we hope you can attend.
Thank you again for attending today. Be safe and healthy. And this concludes today’s webcast.
CLICK HERE TO DOWNLOAD THE PRESENTATION SLIDES