March 1, 2021

Kangaroo Court: Supervised Learning Algorithms for Litigation

Association of Certified E-Discovery Specialists (ACEDS)

+ Follow Contact

Send

Embed

Association of Certified E-Discovery Specialists (ACEDS)

What do you say if somebody asks you what the algorithm is doing with your data? How does it learn? What is it looking at and why? Defensible use of AI is important, whether during litigation or through some other application such as monitoring for various forms of risk. Achieving a collective understanding of how the technology you are using works will become significantly more important as these tools become ubiquitous with legal data. In artificial intelligence there are several ways to skin a cat. Given the growing complexity of data in both type and size, it is important for legal teams to understand a little bit more about the algorithms themselves.

Before we jump into any complexity, let us review some baseline concepts. Supervised learning is a form of machine learning commonly used in predictive coding. LegalTech vendors focused on building AI solutions have made impressive use of supervised learning techniques in the pursuit of more speed to insight during document review. In supervised learning, we are interested in developing a model to predict a class label given an example of input variables. This predictive modeling task is called classification. Classification is also traditionally referred to as discriminative modeling.

A model must discriminate examples of input variables across classes; it must choose or decide as to what class a given example belongs. This is what happens during a supervised predictive coding process when a review attorney codes documents for relevance. However, in order to understand exactly how the data is modeled, it’s important to dig a little deeper and familiarize ourselves with the underlying algorithms most commonly used to achieve these goals. Today, we are going to review one commonly used algorithm: support vector machines.

Application of SVM

Many LegalTech firms with AI based predictive coding functions utilize SVMs because they are one of the most robust prediction methods. This is due to their foundation in statistical learning frameworks. SVMs are consistently the go-to method for a high-performing algorithm with little tuning. This means less work is required in the creation of a reliable AI model. SVMs are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. Like many classification and prediction methods, SVMs classify binary outcomes (e.g., reviewing documents as responsive versus nonresponsive) by estimating a separation boundary within the space defined by a set of predictor variables. A given observation, defined by the values of the predictor variables is classified based on which side of the boundary it falls.

The objective of SVMs is to find a hyperplane in an N-dimensional space (N – number of features of the data) that distinctly classifies the data points. In the context of litigation and document review, features within the data could be represented by any number of factors including metadata, emotional signals, and relevant entities such as the names people, places, organizations, or things. Support vectors are the data points that lie closest to the decision surface (or hyperplane). They are the data points most difficult to classify and have a direct bearing on the optimum location of the decision surface. Hyperplanes are decision boundaries that help classify the data points. Data points falling on either side of the hyperplane can be attributed to different classes.

Hyperplanes

To separate the two classes of data points, there are many possible hyperplanes that could be chosen. Our objective is to find a plane that has the maximum margin, i.e the maximum distance between data points of both classes. SVMs maximize the margin around the separating hyperplane, with the decision function fully specified by a very small subset of training samples (the support vectors). Maximizing the margin distance provides some reinforcement so that future data points can be classified with more confidence. This makes SVMs a highly preferred algorithm for data scientists because it produces significant accuracy with less computation power.

We can go into more detail, but there is little need to go into greater depth for the average eDiscovery professional. The important part is understanding the problem of supervised learning and the most utilized algorithms, like SVMs. Remember that in supervised learning we may be interested in developing a model to predict (classify) a class label (data point) given an example of input variables (training data). The model must discriminate examples of input variables across classes; it must choose or decide as to what class a given example belongs. When review attorneys are facing a discovery process involving several terabytes of data, it helps to understand exactly how the algorithm is looking at your data. Remember, what if someone asks you to explain which algorithm you have used and what the algorithm is doing? Best be prepared.

Send Print Report

Written by:

Association of Certified E-Discovery Specialists (ACEDS)

Contact + Follow

Chip Delany

+ Follow

less

PUBLISH YOUR CONTENT ON JD SUPRA NOW

Increased visibility
Actionable analytics
Ongoing guidance

Learn More

Published In:

Algorithms

+ Follow

Artificial Intelligence

+ Follow

Discovery

+ Follow

Document Review

+ Follow

e-Discovery Professionals

+ Follow

Electronically Stored Information

+ Follow

Legal Technology

+ Follow

Machine Learning

+ Follow

Electronic Discovery

+ Follow

Professional Practice

+ Follow

Science, Computers & Technology

+ Follow

less

Association of Certified E-Discovery Specialists (ACEDS) on:

Kangaroo Court: Supervised Learning Algorithms for Litigation

Related Posts

Written by:

PUBLISH YOUR CONTENT ON JD SUPRA NOW

Published In:

Association of Certified E-Discovery Specialists (ACEDS) on:

"My best business intelligence, in one easy email…"