A Fresh Comparison of TAR and Keyword Search: eDiscovery Best Practices

by CloudNine
Contact

Bill Dimm of Hot Neuron (the company that provides the product Clustify that provides document clustering and predictive coding technologies, among others) is one of the smartest men I know about technology assisted review (TAR).  So, I’m always interested to hear what he has to say about TAR, how it can be used and how effective it is when compared to other methods (such as keyword searching).  His latest blog post on the Clustify site talks about an interesting exercise that did exactly that: compared TAR to keyword search in a real classroom scenario.

In TAR vs. Keyword Search Challenge on the Clustify blog, Bill challenged the audience during the NorCal eDiscovery & IG Retreat to create keyword searches that would work better than technology-assisted review (predictive coding) for two topics.  Half of the room was tasked with finding articles about biology (science-oriented articles, excluding medical treatment) and the other half searched for articles about current law (excluding proposed laws or politics).  Bill then ran one of the searches against TAR in Clustify live during the presentation (the others he couldn’t do during the session due to time constraints, but did afterward and covered those on his blog, providing the specific searches to which he compared TAR).

To evaluate the results, Bill measured the recall from the top 3,000 and top 6,000 hits on the search query (3% and 6% of the population respectively) and also included the recall achieved by looking at all docs that matched the search query, just to see what recall the search queries could achieve if you didn’t worry about pulling in a ton of non-relevant docs.  For the TAR results he used TAR 3.0 (which is like Continuous Active Learning, but applied to cluster centers only) trained with (a whopping) two seed documents (one relevant from a keyword search and one random non-relevant document) followed by 20 iterations of 10 top-scoring cluster centers, for a total of 202 training documents.  To compare to the top 3,000 search query matches, the 202 training documents plus 2,798 top-scoring documents were used for TAR, so the total document review (including training) would be the same for TAR and the search query.

The result: TAR beat keyword search across the board for both tasks.  The top 3,000 documents returned by TAR achieved higher recall than the top 6,000 documents for any keyword search.  Based on this exercise, TAR achieved better results (higher recall) with half as much document review compared to any of the keyword searches.  The top 6,000 documents returned by TAR achieved higher recall than all of the documents matching any individual keyword search, even when the keyword search returned 27,000 documents.

Bill acknowledges that the audience had limited time to construct queries, they weren’t familiar with the data set, and they couldn’t do sampling to tune their queries, so the keyword searching wasn’t optimal.  Then again, for many of the attorneys I’ve worked with, that sounds pretty normal.  :o)

One reader commented about email headers and footers cluttering up results and Bill pointed out that “Clustify has the ability to ignore email header data (even if embedded in the middle of the email due to replies) and footers” – which I’ve seen and is actually pretty cool.  Irrespective of the specifics of the technology, Bill’s example is a terrific fresh example of how TAR can outperform keyword search – as Bill notes in his response to the commenter “humans could probably do better if they could test their queries, but they would probably still lose”.  Very interesting.  You’ll want to check out the details of his test via the link here.

So, what do you think?  Do you think this is a valid comparison of TAR and keyword searching?  Why or why not? 

[View source.]

Written by:

CloudNine
Contact
more
less

CloudNine on:

Readers' Choice 2017
Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
Sign up using*

Already signed up? Log in here

*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
Privacy Policy (Updated: October 8, 2015):
hide

JD Supra provides users with access to its legal industry publishing services (the "Service") through its website (the "Website") as well as through other sources. Our policies with regard to data collection and use of personal information of users of the Service, regardless of the manner in which users access the Service, and visitors to the Website are set forth in this statement ("Policy"). By using the Service, you signify your acceptance of this Policy.

Information Collection and Use by JD Supra

JD Supra collects users' names, companies, titles, e-mail address and industry. JD Supra also tracks the pages that users visit, logs IP addresses and aggregates non-personally identifiable user data and browser type. This data is gathered using cookies and other technologies.

The information and data collected is used to authenticate users and to send notifications relating to the Service, including email alerts to which users have subscribed; to manage the Service and Website, to improve the Service and to customize the user's experience. This information is also provided to the authors of the content to give them insight into their readership and help them to improve their content, so that it is most useful for our users.

JD Supra does not sell, rent or otherwise provide your details to third parties, other than to the authors of the content on JD Supra.

If you prefer not to enable cookies, you may change your browser settings to disable cookies; however, please note that rejecting cookies while visiting the Website may result in certain parts of the Website not operating correctly or as efficiently as if cookies were allowed.

Email Choice/Opt-out

Users who opt in to receive emails may choose to no longer receive e-mail updates and newsletters by selecting the "opt-out of future email" option in the email they receive from JD Supra or in their JD Supra account management screen.

Security

JD Supra takes reasonable precautions to insure that user information is kept private. We restrict access to user information to those individuals who reasonably need access to perform their job functions, such as our third party email service, customer service personnel and technical staff. However, please note that no method of transmitting or storing data is completely secure and we cannot guarantee the security of user information. Unauthorized entry or use, hardware or software failure, and other factors may compromise the security of user information at any time.

If you have reason to believe that your interaction with us is no longer secure, you must immediately notify us of the problem by contacting us at info@jdsupra.com. In the unlikely event that we believe that the security of your user information in our possession or control may have been compromised, we may seek to notify you of that development and, if so, will endeavor to do so as promptly as practicable under the circumstances.

Sharing and Disclosure of Information JD Supra Collects

Except as otherwise described in this privacy statement, JD Supra will not disclose personal information to any third party unless we believe that disclosure is necessary to: (1) comply with applicable laws; (2) respond to governmental inquiries or requests; (3) comply with valid legal process; (4) protect the rights, privacy, safety or property of JD Supra, users of the Service, Website visitors or the public; (5) permit us to pursue available remedies or limit the damages that we may sustain; and (6) enforce our Terms & Conditions of Use.

In the event there is a change in the corporate structure of JD Supra such as, but not limited to, merger, consolidation, sale, liquidation or transfer of substantial assets, JD Supra may, in its sole discretion, transfer, sell or assign information collected on and through the Service to one or more affiliated or unaffiliated third parties.

Links to Other Websites

This Website and the Service may contain links to other websites. The operator of such other websites may collect information about you, including through cookies or other technologies. If you are using the Service through the Website and link to another site, you will leave the Website and this Policy will not apply to your use of and activity on those other sites. We encourage you to read the legal notices posted on those sites, including their privacy policies. We shall have no responsibility or liability for your visitation to, and the data collection and use practices of, such other sites. This Policy applies solely to the information collected in connection with your use of this Website and does not apply to any practices conducted offline or in connection with any other websites.

Changes in Our Privacy Policy

We reserve the right to change this Policy at any time. Please refer to the date at the top of this page to determine when this Policy was last revised. Any changes to our privacy policy will become effective upon posting of the revised policy on the Website. By continuing to use the Service or Website following such changes, you will be deemed to have agreed to such changes. If you do not agree with the terms of this Policy, as it may be amended from time to time, in whole or part, please do not continue using the Service or the Website.

Contacting JD Supra

If you have any questions about this privacy statement, the practices of this site, your dealings with this Web site, or if you would like to change any of the information you have provided to us, please contact us at: info@jdsupra.com.

- hide
*With LinkedIn, you don't need to create a separate login to manage your free JD Supra account, and we can make suggestions based on your needs and interests. We will not post anything on LinkedIn in your name. Or, sign up using your email address.