Unstructured Data: The Black Hole of Ediscovery


unstructured data

Big Data, Structured Data, Unstructured Data – these terms are becoming the buzzwords of ediscovery, but what do they mean?

Structured data refers to information residing inside complex applications, such as transactional and financial databases.  Data that you access in a variety of ways based on how it is presented within the application. For example you might have several similar yet distinct finance reports that hold the same structured data, but simply present it in different visual formats. Ultimately, structured data exists as segments of information inside a larger system, one that is often quite complex and contains many parts. While this type of data does continue to grow, and the format can make for challenging ESI application, it isn’t causing quite the same volume problems as we are seeing with “unstructured data”.

“Unstructured” or “loose” data might not be what you call it, but it’s what you are generally working with as ESI. These terms refer to all of the standalone, common files that make up work done every day in corporations around the world. All of those e-mail messages, word processing documents, spread sheets, and presentations, among other things—that are commonly sought as potentially relevant ESI in discovery – are considered unstructured data.

And that Unstructured Data is the harbinger of Big Data and the root cause of a 50% jump in enterprise storage volume from 2010-2012 (from 2,175 terabytes to 3,183 terabytes), as profiled in a recent infographic on ediscovery.com. But the scariest thing about unstructured data is that it’s a silent killer; most organizations don’t even know a problem exists until litigation is underway and (not surprisingly) something goes missing. Yikes.

While “Big Data” and the growing mass of “unstructured data” can make traditional manual ESI review completely cost-prohibitive, something often can be done. Predictive coding, for example, can provide a much needed backbone for unstructured data by detecting linguistic patterns in documents and ranking them according to predicted relevancy. Moreover, depending on the capabilities of a provider’s technology, it is possible for a vendor to host these unstructured documents in a cheaper “nearline” storage location, in case serial litigation summons them again.  Thus, once a document has been tethered to a custodian or date range in project once, you can leverage this information in the future.

DISCLAIMER: Because of the generality of this update, the information provided herein may not be applicable in all situations and should not be acted upon without specific legal advice based on particular situations.

© Kroll Ontrack Inc. | Attorney Advertising

Written by:


Kroll Ontrack Inc. on:

JD Supra Readers' Choice 2016 Awards
Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:

Sign up to create your digest using LinkedIn*

*By using the service, you signify your acceptance of JD Supra's Privacy Policy.

Already signed up? Log in here

*With LinkedIn, you don't need to create a separate login to manage your free JD Supra account, and we can make suggestions based on your needs and interests. We will not post anything on LinkedIn in your name. Or, sign up using your email address.