[author: Jason Schroeder**]
Dealing with a cyber incident is an incredibly stressful time for clients and counsel. Not only is there the stress of dealing with the initial breach, but also the pressure to do a review of what data was compromised in quick order. Ensuring that you have seasoned cyber incident professionals available to handle the project can help to alleviate some of this distress. Before an incident occurs it is so important to have the right team in place that is both a consultative and iterative between counsel, client, and the service provider. This team approach is integral to achieving the best results in the most cost-efficient and timely manner.
To understand the specific processing considerations for cyber incident matters after a data breach, it is best to think of the requested deliverable and how that will be created. The deliverable will be a de-duplicated list of all individuals who might have been affected, their addresses, and any exposed personal information (PI) that will need to be identified on the deliverable and/or in the notification letters. This PI is extracted from the reviewed items. The primary considerations are markedly different from those in regular eDiscovery projects and drive the processing and workflow approach for cyber review matters.
As in any electronic data project, the best way to contain costs is to reduce the review set to the smallest population possible. Global de-duplication removes exact duplicates from the review set. Though these records are withheld from the hosting environment, they are not deleted and can be brought in at any time, if needed.
Since common computer-generated filetypes do not contain any PI information, they can be removed (or de-NISTed) to decrease the initial population. De-NISTing uses a list published by the National Institute of Standards and Technology (NIST) as a starting point for filetype exclusions. It is crucial to analyze all remaining filetypes with counsel to identify additional filetypes specific to the data set that can also be removed.
Some files, such as image-only PDFs, do not contain extractable text. Some of these items may be OCR’d to maximize the population of possible search term hits. Multi-media files or pictures may not lend themselves to OCR but should be accounted for with counsel, as an MP3 might be a recording of a phone call and a picture may be of a passport. Since these files will not be included on search reports, it is important to discuss their special handling ahead of time.
Data Mining-Initial Culling
After processing and prior to search terms reports, initial culling can further reduce the population. Analysis of email domains with counsel may uncover emails from specific senders that will not contain any PI. For example, an ESPN newsletter can be safely removed from the starting data set. Sampling at this phase may also uncover additional blocks of removable data, such as auto-generated responses from IT departments and no-reply messages. A discussion with counsel regarding what can be removed is important as it will vary by matter.
Search Term Application
Search terms applied for a lawsuit may be specific to a legal question. However, cyber terms are crafted to capture types of items most likely to contain PI. While the team may know a few actual social security numbers (SSNs), they want to find all possible SSNs. The best way to accomplish this is to use “regular expression” searches in combination with keyword searches. Searching for known SSNs OR “social security number” or ###-##-#### will accomplish this task.
A reviewer must find both a name and related PI in the same document to include it in the notification list. A name in an email with the PI in an attachment does not qualify as reportable. Therefore, search term hit reports contain only individual records, not entire email families.
Special consideration should be paid to terms that have a high unique hit count ― where only one term is present in a document. It’s important to note that searches apply to single data elements while (as noted above) the eventual notification entry will require a combination of name plus data element from the same item. This means that many documents in the review population will not contain reportable PI. A focus on the high unique terms is the best way to remove or tweak a term to reduce the overall review population.
As there may be many rounds of searching, a consultative, iterative approach is required to achieve the desired results. Iterations may include sampling of term hits, reporting on context, and ensuring that both the end client and their counsel share any matter-specific information to assist in identification of the final review set.
Secondary Culling-Email Threading and Item Level De-duplication
After finalization of the search terms are set, secondary culling procedures can further reduce the population. Only the most inclusive email in a thread needs to be reviewed to capture any potential PI. Attachments can be de-duplicated according to their processing hash values. This step is markedly different from standard eDiscovery workflows where email family relationships should be kept intact. As the PI only needs to be identified once, there is no need to review duplicate attachments.
Final Reviewable Set
The final reviewable set is a product of:
- Data mining and culling by cyber incident professionals
- Close consultation with counsel and clients to identify the unique characteristics of each data set
- Cyber incident search terms as the basis of an iterative, collaborative process.
This streamlined data set is then transferred to the review management team and the manual review begins.
The common adage in relation to cyber security risk is, “It’s not if you are breached, it’s when.” In an effort to avoid the when, companies across the globe are taking steps to harden their networks against unauthorized access. Response preparation is the next step in risk mitigation. A response plan should emaphsize close collaboration between counsel, client, and cyber project professionals leveraging programmatic machine power, human experience, and case knowledge to identify the data most likely to contain personal information.
For more information on how to handle cyber incident reviews, consider reading Tips for Handling a Cyber Incident Review.
**Jason Schroeder, client services manager for Epiq’s Cyber Incident Response team. Jason applies his decades of eDiscovery experience in all phases of e-discovery to this leadership role.