More so than ever, legal departments are under intense resource and budget pressure when it comes to eDiscovery. These challenges are exacerbated by new forms of risk, including cybersecurity threats, escalating data volumes, the rise of new business communication, such as chat, heightened regulatory and data privacy mandates, and a whistleblower culture.
While “do more with less” pressures aren’t new, the need for efficiency has never been more acute. Workload has increased by 51 percent while budgets have shrunk by 23 percent in the past two years. Seventy percent of legal department leaders cite a renewed focus on adopting new or better technologies to simplify workflows and reduce manual processes to cut costs.1
Using tools with front-loaded analytics is an approach to shaving time and costs off an eDiscovery or investigation project that shouldn’t be left off the table.
What are front-loaded analytics in eDiscovery?
Making automated analytics tools available sooner in eDiscovery processes has a dramatic impact on the depth of insights that can be quickly attained for early case assessment (ECA) and investigations use cases by helping to find facts faster where full review is not a direct objective. Front-loaded analytics also play a critical role in expediting processes when defensible proportional review is required.
Why bring machine learning and analytics earlier into the review process?
The primary objective of front-loaded analytics for litigation review is to narrow the volume of data that requires eyes-on review and lower the cost of eDiscovery because of the linear relationship between the volume of data and the time it takes to review the data. Applying automated analytics before review helps to quickly create review sets with higher concentrations of responsive data and filters out more irrelevant data early on.
Traditional eDiscovery tools – date histograms, communications hypergraphs, search filters, concept groups and phrase analytics and the like – are typically used to narrow review sets by finding the relevant custodians and responsive issues within the timelines of the matter. Front-loaded analytics augments these features with predictive search and newer forms of text analytics to provide a richer (and faster) ability to extract relevant data from within large volumes of irrelevant data to create review sets with even higher proportions of responsive content.
Analytic tools that help establish optimal review sets
The following are all automated. Machine automation does the heavy lifting for each and exposes the results with minimal effort to initiate the analysis. For example, the names of people to search for do not have to be known in advance and entered in a lengthy search. Specialized algorithms automatically detect all names within the data.
Use data known to be highly relevant as examples for the machine automation tools to compare against the entire corpus to quickly surface data with similar content, constructs and context. Is this TAR on the fly without the workflow? I like that analogy.
Detection of people, places and organizations
The automated detection of entities is exceptionally useful for easily identifying additional custodians and narrowing the review set to only the locations involved (e.g. specific facilities) or organizations involved (e.g. specific subsidiaries).
All data is automatically ranked by algorithms with extensive libraries of terms that suggest the sentiment and intent of data such as like, love, hate. Focusing on the positive and negative outliers helps to quickly home in on the key concepts and phrases of interest and helps to flag essential custodians.
Fact vs. opinion analysis
All data is automatically assessed by algorithms with extensive libraries of terms that suggest whether assertions are grounded in fact or pure speculation such as “the evidence shows…” vs. “this is just my opinion but…”. Fact vs. opinion analysis provides another vector to surface key custodians, concepts and phrases to help extract relevant data from background data.
Document summaries aid in the creation of refined review sets by helping to identify highly relevant documents for use with predictive search and to home in on the custodians, concepts and phrases of interest. To be effective, document summaries need to be based on deep lexical analysis of the concepts and context of the data within and not just an aggregation of headers or cover page summaries.
Used individually or in combination, front-loaded automated analytics is a game changer for efficient litigation review. Modest effort applying automated analytics prior to beginning the review process will cut the time and cost to review in direct proportion to the volume of data that is determined sooner to be unresponsive. Don’t leave these tools on the table when seeking the most efficient path to cost-effective review.