Ediscovery Keyword Search: Get More Relevant Results in Document Review

Nextpoint, Inc.
Contact

This post explains how to craft searches that get the results you're looking for in document review. We dive into the technical aspects of searching and provide ediscovery keyword search examples.

Despite efforts to replace keyword searching in ediscovery, keywords remain the most cost-effective tool available for reviewing documents in litigation. Unfortunately, there is not a lot of guidance for how to get the best results with keywords when reviewing evidence.

Keyword searches can fail in two opposite directions – over-inclusive searches that return too many irrelevant documents, or under-inclusive searches that fail to capture what you’re looking for. Given this challenge, how do you narrow down initial searches that produce too many irrelevant documents? How do you refine your searches to find more useful evidence?

These ediscovery keyword search examples and best practices will help you make the most of your keyword searches and find relevant and useful documents in any document review.

The Ediscovery Keyword List

To begin, attorneys must generate a list of keywords – unique terms or phrases deemed critical to their case – at the onset of litigation. This is especially tricky because at the start of a matter you have little insight into case strategy and tactics, let alone what keywords might be involved.

Review teams must frequently revise the keyword list as the matter matures, which increases the complexity and expense of a matter. Whenever possible, craft your keywords with input from the Custodians of the documents or source data to find out the jargon and abbreviations they use. The proposed search terms MUST be quality control tested to assure accuracy. It will likely take 3-5 iterations to finalize a set of search terms.

Sample Your Data Set

To begin a keyword list, reviewers should sample the data set. Reviewing a sample of the documents will help set realistic search hit expectations, which will help you quickly recognize if your search terms are working well. For example, if you expect a 20% return rate, and you are getting 90% (or vice versa), you should revise your terms.

Wildcard Search*

Wildcard searches can help you identify more encompassing search terms for your keyword list. Perform multiple character wildcard searches using the asterisk symbol. For example, searching for the word bean* will return beans, beanies, beanbags, etc.

Choose Your Ediscovery Keywords Carefully

It's easy to be too literal when planning your keyword list, but keyword searches done thoughtfully can return a viable number of documents. For example, consider an insurance case involving an injured racing horse. You can search for “horse” and synonyms, but you are likely to find irrelevant documents with words like “horsepower” or "clothes horse."

However, words like bridle, veterinarian, saddle, or other keywords related to the animal involved are more likely to return useful hits. No matter the type of case, you should consider all possible meanings of your keywords and think of related words that may garner more contextually relevant results.

Understand Boolean Search Tools

Legal teams need to become adept at using Boolean searches – searches that use commands such as AND, OR and NOT to refine the results. Once you have a sense of the terminology and phrases used in a sample document set, you can begin constructing strings with parentheticals to capture more specific iterations of terms you are interested in and create hierarchies.

For example, once you understand the job titles your subjects use, a keyword string might look like this:

(“ba” OR “business analyst” OR “project manager” OR “project analyst”) AND NOT (“data warehouse” OR financial OR analysis “product roadmap”).

Use Proximity Searches

One of the most powerful tools for zeroing in on relevant documents is a proximity search. Specifying the number of words between two words or phrases helps add context and limit the returns. This is especially useful if one of the words is common and returns too many false positives.

In Boolean search, this tool is called the "W/n connector," with "n" being the number of words you want to specify. So, if you search for "horse" and "injury" with the W/5 connector, you'll find all the results in which these two terms appear within five words of one another.

When searching personal names, use a proximity search between the first and last names. If you are searching for "John Smith," this will ensure the results include references to the actual subject, and not just every document in the collection with the name "John."

Use the W/3 connector between first and last names to retrieve search results that take into account middle names, middle initials, and inverted name order. You also should include nicknames and shortened forms (Robert, Bob, Rob, etc.) of the subject's name.

Phrase Searches Are More Precise

Phrase searches allow document review teams to be more precise in their searches by looking for keywords that appear in a particular sequence. For example, searching "national" AND "defense" AND "contract” may return over-inclusive results. Instead, reviewers can search for "national defense contract," which would only return documents that include that exact phrase.

Be Aware of Keyword Noise

Before you even begin crafting keyword strings, be aware of how your source documents are being indexed. For example, in some search engines certain characters may be indexed as spaces. This is especially problematic if the @ is treated like a space, which means any email addresses you search for may not appear.

Other applications don’t index “noise words,” like “it” or “up,” so your attempt to find the key phrase “pick up” will fail. If your search technology has a noise word list, you can customize it or turn it off to avoid such failures. Utilizing a specialized ediscovery software like Nextpoint will minimize indexing issues like these.

Protecting Attorney-Client Privilege

The most important task when using keyword searches is to identify documents likely to contain privileged material. However, it is easy to miss privileged material without careful planning.

According to recent research, the best generic terms to ferret out privileged documents are "counsel" and "attorney," or those terms with root expanders ("counsel*" and "attorney*”). "Complainant" and "statute" are also effective terms for detecting privileged material.

The research also found that "confidential" was not a useful keyword in privilege review and that terms like "legal," "priv*," and "lawyer*" were about half as effective as "counsel" or "attorney." While these words may appear in privileged communications, those communications almost always include other, more specific terms, such as attorney email addresses, firm domains, or words like "counsel."

The Final Word on Ediscovery Keyword Search

Here's a summary of the ediscovery keyword search examples we covered:

  • Wildcard searches: Search "bean*" to get results that include "beans," "beanies," "beanbags," etc.
  • Related keywords: Don't just search "horse" – this will bring up unrelated results on topics like "horsepower." Search for specific words like bridle, veterinarian, and saddle.
  • Boolean search: Use commands like AND, OR and NOT to refine results. When searching for a subject with multiple job titles, try a search like this: (“ba” OR “business analyst” OR “project manager”)
  • Proximity searches: Use the W/n connector to specify the number of words between two search terms. When searching "John Smith," use the W/3 connector (i.e. “John Smith”~3) to include results like "Smith, John" and "John R. Smith."
  • Phrase searches: Phrase searches will give more precise results. For example, search "national defense contract" instead of "national" AND "defense" AND "contract”
  • Noise words: Make sure your review tool indexes all words and characters necessary to your search.
  • Protect privilege: "Counsel," "attorney," "complainant," and "statute" are the most effective generic terms for detecting privileged material. Searching email handles or firm domains can be equally effective. "Confidential," "legal," "priv*," and "lawyer*" are not as effective.

As mentioned earlier, choosing words that are too broad will create a high number of false positives, requiring costly and unnecessary manual review. Choosing words that are too narrow will result in an incomplete review that inadvertently discloses privileged material.

However, with the right strategies, any legal team can craft keyword lists that not only work, but help win cases. For more document review tips, check out our comprehensive document review eGuide.

Written by:

Nextpoint, Inc.
Contact
more
less

Nextpoint, Inc. on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide