Ediscovery Keyword Filtering: The Human Role

Nextpoint, Inc.


Ediscovery is automated in many important ways, but attorney judgment is still vital to the process. Legal teams need to make smart decisions up front about how to filter, what to cull, and what to keep.

If you are too cautious about processing, you will have too much evidence to review. If you are too aggressive, you can make a mess of your case or even get into serious legal jeopardy.

Ediscovery Data Filtering in Three Parts

According to Michael Arkfeld, approximately 80 to 98 percent of initial data collected in response to an ediscovery request will be eliminated as non-responsive. With a little strategy and planning, you can become smart about reducing ediscovery data. If you filter by keyword with the right techniques, you can drastically reduce the expenses of review.

This is the second of three posts describing the processing and filtering of digital evidence in ediscovery. In part one of our series, we described the technical and automated processes you can use to limit the scope and cost of discovery. Today, we are focusing on the decisions and judgment calls human reviewers need to make in order to win the ediscovery battle.

Get Your Hands Dirty with Ediscovery Data

The only way to make informed and intelligent decisions about filtering document collections is to get hands-on with it. To limit and refine the scope of your ediscovery review, interview key players and ask them who else is likely to have potentially relevant ESI. Interview the IT personnel who manage the computer systems being investigated and identify what types of data they retain.

The next thing to talk about is which custodians (individuals from whose file system a group of records are extracted) you are going to include in an ediscovery data collection. You may put many custodians on hold, but that does not mean you have to load all of the data collected into a database for review.

This is where interviews and knowledge of the case should allow you to know who the key custodians are. Rank them by the likely importance of the data they hold to the facts disputed in the case and choose only those likely to own relevant documents.

Request Ediscovery Data in Proper Format

Also, remember that once processed and filtered, documents still need to be reviewed. ESI must be produced in specified forms of production, either in native form (being the form stored and used in the ordinary course of business) or in a static image format (a screenshot of each page plus a load file holding text and metadata). There are also near-native forms of production, such as when email inboxes are produced as individual messages called MSGs or EMLs.

Requesting produced data in a “uniform” image format (PDF, JPG or TIF) is the most common practice. Images represent the “cleanest” (low risk of anomalies) and most universal format for a multitude of review software platforms.

However, firms that specialize in a particular area of law may have another consideration to make regarding production format. Certain types of cases (construction law for example) may involve proprietary files types such as AutoCad, making it important to request these files in their native form. These files can be imaged, but often they are better viewed in their source software, due to the possibility of ‘hidden lines/layers' the software suppresses. Native files can be converted to JPG, TIFF or PDF later for Bates stamping, redaction, and production.

Know What Evidence You Are Hoping to Find

Imagine you are involved in a major lawsuit involving insurance claims for injuries involving young people. How do you find relevant documents?

Searching for the words “young people,” or “juvenile” will probably not return very many relevant or interesting documents. However, if you can find words for sports/activities correlated with young people, like football or basketball, you will likely return highly relevant results.

The keywords used to search a collection cannot be based on guesswork. They must be tested and refined through trial and error. But how do you test possible keywords without first collecting and ingesting all of the documents to determine which might be relevant? The best answer is that you ask the witnesses, and do some partial reviews before collection.

When trying to narrow search queries, the only way to find the most effective keywords or phrases is to test and measure your results.

Try to:

  • Negotiate the number of searches that should be run (ranging from dozens to hundreds)
  • Determine the targeted data types and sources (databases, docs, email, spreadsheets, etc.)
  • Brainstorm possible names, events, date ranges, acronyms, and phrases to search for in your set

Be aware that most keyword strategies fail on initial contact with a database. We recommend clients have a contingency plan for when keyword searches fail to successfully return relevant documents.

For example:

  • If data or documents are missing, consider deposing witnesses to determine potential reasons why, or where that data might be
  • If cost is an issue, consider sampling to determine if relevant documents are present in a data set before proceeding
  • Negotiate limits on overbroad or common search terms with opposing counsel

Defend Your Ediscovery Data Filtering Methods

Unfortunately, there is a danger that lawyers may delegate too much of the ediscovery processing duties. The lawyer signing the Rule 26(g) statement has a legal and ethical duty to closely supervise document review done in response to a request for production.

As a producing party, it is in your interest to limit the scope of discovery by based on the claims or defenses as well as individuals and time frame. For requesting parties, a broad discovery request might be in your interests, but only if you can analyze the collection for patterns and conduct that will lead to relevant information, or even that most coveted find, a smoking gun email.

Contrary to what you may read, legal teams still face sanctions for ediscovery failures. The updated Federal Rule of Civil Procedure 37(e) replaced the old “safe harbor” provision and now holds that if lost Electronically Stored Information (ESI) can be replaced or restored, no sanction will be imposed. However, if ESI cannot be replaced and if the requesting party is prejudiced, the court may order sanctions “no greater than necessary to cure that prejudice.”

That means a legal team must be able to show what data they have collected, what was discarded, and why. If mistakes are made, it must be possible to reverse those decisions and remedy any errors.

Look for our next blog post covering advanced ediscovery techniques, including how to manage predictive coding, encryption, and foreign languages. And don’t forget about our first post of the series, The Art of eDiscovery Data Filtering and Culling, that outlines the technical aspects of data filtering.

Written by:

Nextpoint, Inc.

Nextpoint, Inc. on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide

This website uses cookies to improve user experience, track anonymous site usage, store authorization tokens and permit sharing on social media networks. By continuing to browse this website you accept the use of cookies. Click here to read more about how we use cookies.