Long gone are days when the majority of discovery records were kept in paper format. Documents, invoices, and other related evidence needed to be scanned and printed in the tens (if not hundreds) of thousands. Today, a huge number of discovery efforts (internal or external) revolve around digital content. Ergo, this article will highlight the collection of digital evidence and how to best prepare your case when it comes to preservation and collections as well as processing and filtering.
But, before we get into that, one of the core factors to keep in mind here is time, which will always be there irrespective of what we have at hand. It is especially complicated if multiple parties are involved, such as vendors, multiple data locations, outside counsels, reviewers, and more. For the purposes of this blog, I have divided everything into the following actionable groups - preservation and collection as well as processing and filtering.
Preservation and Collection
In an investigation or litigation there could be a number of custodians involved, for example, people who have or had access to data. Whenever there are more than a handful of custodians the location may vary. It is imperative to consider where and what methods to use for data collection. Sometimes an in-person collection is more feasible than a remote collection. Other times, a remote collection is the preferred method for all those concerned. A concise questionnaire along with answers to frequently asked questions is the best approach to educate the custodian. Any consultative service provider must ensure samples are readily available to distribute that will facilitate the collection efforts.
Irrespective of how large the collection is, or how many custodians there are, it is best to have a designated coordinator. This will make the communication throughout the project manageable. They can arrange the local technicians for remote collections and ship and track the equipment.
The exponential growth in technology presents new challenges in terms of where the data can reside. An average person, in today’s world, can have a plethora of potential devices. Desktops and laptops are not the only media where data can be stored. Mobile devices like phones and tablets, accessories such as smartwatches, the IoT (everything connected to the internet), cars, doorbells, locks, lights…you name it. Each item presents a new challenge and must be considered when scoping the project.
User-generated data is routinely stored and shared on the Cloud using a variety of platforms. From something as ancient as email servers to “new” rudimentary storage locations, such as OneDrive, Google Drive, Dropbox, and Box.com. Others include collaborative applications, such as SharePoint, Confluence, and the like.
Corporate environments also heavily rely on some sort of common exchange medium like Slack, Microsft Teams, and email servers. These applications also present their own set of challenges. We have to consider, not just what and how to collect, but equally important is how to present the data collected from these new venues.
The amount of data collected for any litigation can be overwhelming. It is imperative to have a scope defined based on the need. Be warned, there are some caveats to setting limitations beforehand, and it will vary based on what the filters are. The most common and widely acceptable limitation is a date range. In most situations, a period is known and it helps to set these parameters ahead of time. In doing so, only the obvious date metadata will be used to filter the contents. For example, in the case of emails, you are limited to either the sent or received date. The attachment's metadata will be ignored completely. Each cloud storage presents its own challenges when it comes to dates.
Data can be pre-filtered with keywords that are relevant to the matter at hand. It can greatly reduce the amount of data collected. However, it is solely dependent on indexing capabilities of the host, which could be non-existent. The graphical contents and other non-indexable items could be excluded unintentionally, even if they are relevant.
The least favored type of filter among the forensics community is a targeted collection, where the user is allowed to guide where data is stored and only those targeted locations are preserved. This may not be cost effective, however, it can restrict the amount of data being collected. This scope should always be expected to be challenged by other parties and may require a redo.
Processing and Filtering
Once the data collected goes through the processing engine the contents get fully exposed. This allows the most thorough, consistent, and repetitive filtering of data. In this stage, filtering relies on the application vetted by the vendor and accompanied by a process that is tested, proven, and updated (when needed).
The most common filtering in ediscovery matters is de-NIST-ing, which excludes the known “system” files from the population. Alternatively, an inclusion filter can be applied, which only pushes forward contents that typically a user would have created, such as office documents, emails, graphic files, etc. In most cases, both de-NIST-ing and inclusion filters are applied.
Once the data is sent through the meat grinder (the core processing engine) further culling can be done. At this stage, the content is fully indexed and extensive searches and filters will help limit the data population even further to a more manageable quantity. The processing engine will mark potentially corrupt items, which are likely irrelevant. It will also identify and remove any duplicate items from all collected media from the entire matter data population. Experts can then apply relevant keyword searches on the final product and select the population that will be reviewed and potentially produced.
I hope this article has shed some light on how to best prepare your case when it comes to preservation and collections as well as processing and filtering.