In-depth merger control reviews are intense and time-pressured matters. As a result, merging companies and their advisors are challenged to meet regulators’ ever-growing demands for internal documents.
When a regulator refers a matter to Phase 2, requests for information (RFIs) require parties to identify and disclose internal documents relating to the proposed transaction. This allows the regulator to gauge the parties’ understanding of the relevant markets and the true rationale for the transaction. These requests can involve huge numbers of documents – for example, the European Commission (EC) publicised that its decision to clear Bayer’s acquisition of Monsanto in 2018 involved a review of more than 2.7 million internal documents.
In today's increasingly global business landscape, disclosure obligations are made even more complicated by the reality that merging companies’ internal communications will likely involve multiple languages. However, navigating the multilingual obstacle isn’t necessarily an ordained, painful exercise if merging parties and their lawyers know what to look out for and plan accordingly.
During TransPerfect Legal Solutions’ (TLS) 2023 event, The Future of EU & UK Competition Regulation, a panel of legal technologists discussed the challenges, best practices, and trends of multilingual disclosures in the world of antitrust. The expert panel included AdamSmith, Head of EU Antitrust eDiscovery at Freshfields Bruckhaus Deringer; Robert Wagner,Global Director of Multilingual Discovery at TLS; and Preeti Sharma, Director of Consulting at TLS.
This blog post summarizes the highlights of that discussion to help you prepare for your next multilingual document review.
Why Language Nuances Matter:
Every language possesses unique idiosyncrasies. When multilingual data comes into play during merger reviews, it’s crucial to tailor the process to address these nuances, rather than treat each dataset as one would English-only data. Neglecting language nuances can lead to:
- Time and Cost Overruns: Underestimation of the time and cost needed to create translations for production;
- Quality Concerns: Machine translations failing to meet the regulator's standards; and
- Search Term Issues: (Mis)translated search terms simultaneously over- and under-including documents.
Translation Requirements Can Have Severe Time and Cost Implications:
Time is often the scarcest element in any merger review. RFI deadlines typically only give parties a deadline of two to three weeks to collect, process, promote, review, and produce relevant documents. If these deadlines aren’t met, there is a risk the regulator will exercise its power to ‘stop the clock,’ delaying the statutory review timeframe until the RFIobligation has been met.
However, multilingual data complicates already stretched timelines with added steps, suchas the need to construct equivalent search terms in other languages or staff multilingual review teams. Further, regulators may require parties to translate documents into English or another language officially accepted by a regulator. For example, while the EC has the means to receive and review languages in common EU Member State languages, the UK’s CMA and the USA’s DOJ or FTC will typically require high-quality English translations.
In instances when translations are required to facilitate disclosure, parties need to plan forthe timing and cost implications because the component steps of translation, translation engine customisation, and human translation and review are not always timeframe compressible. In particular, depending on the scale of the disclosure, human translation might be entirely unfeasible or disproportionate due to the cost and time required.
Even when a regulator permits the use of machine translations, parties often underestimate the time it takes to translate large numbers of documents for submission. This is in part due to the experience users have with online tools like Google Translate, which can translate a single document almost instantaneously. Scaling this for one million formatted documents, potentially housing billions of words, could take over a month, or two months if the data involves a high count of Excel sheets or PowerPoint presentations.
Generic Machine Translations Struggle to Generate Quality Translations for Industry-Specific Datasets:
Beyond the time and cost considerations, the question of translation quality is paramount. Even when a regulator accepts the use of pure machine translation workflows over human translators, the output quality must meet the regulator’s standards. This generally requires that the translations:
- are error-free so they can be easily understood; and
- use industry- and party-specific terminology.
Unfortunately, off-the-shelf machine translation services made available as Relativity plug-ins or via third-party technology platforms do not always meet these requirements, leading to the regulator rejecting the translated submissions. The inherent problem is that these machine translation engines are trained to be generalists and are good at translating clear, everyday communications. As a result, they often struggle with industry-specific and contextually nuanced terms.
For example, in a review concerning the finance industry, a typical machine translation might struggle with jargon like ‘bips’ (basis points) or acronyms such as ‘LIBOR’ (The London Interbank Offered Rate).
In these circumstances, parties should consider language consultants who can offer machine translation engines trained on industry-specific data and glossaries, and which can be further customised on the parties’ own data to increase the fluency of the output. These customisable machine translation engines can address the inherent limitations of generic, one-size-fits-all translation engines and help build regulator confidence in the translation quality.
A critical benefit of appointing a dedicated language service provider early in the multilingual disclosure process is it allows for early regulator engagement to agree on the required quality for machine-translated submissions. The sooner issues with early samplesare identified, the earlier machine translation engines can be customised to meet the regulators’ criteria, which can avoid delays down the line.
Poor Search Term Translations Can Lead to Over- and Under-Inclusive Search Results:
The panel also touched upon the challenges of search term translation, which is amongst the most challenging types of translation and can cause significant difficulty if gotten wrong. Common pitfalls include:
Missing Context: To generate search terms, legal advisors will draft each term with the very specific company, industry, and legal contexts in mind. They may also go through multiple rounds of negotiation with the regulator to finalise these terms. To take an English example, lawyers might choose the term ‘motorway’ over synonyms ‘highway’ or ‘interstate’ because they know the relevant custodians are based in the UK, rather than the USA. However, the translators only see a context-free list of words and will typically have little knowledge about the underlying case, increasing the risk of poor search term construction.
Search Operators: Search operators are an essential focus of any keyword search exercise, and linguistic nuances in each language affect how these must be applied. For example:
- Wildcards – The best use of the wildcard function (‘*’) in a search term will vary between the intended consequences and given language. To take an English example, buy* will find instances of ‘buys’ or ‘buying’ but won’t capture the past tense ‘bought’.
- Proximity Operators - Language expands and contracts when translated, so proximity operators (e.g., W/5) set to find English documents where two words are within five of each other, might need to be increased to W/6 in Italian, which is a more verbose language.
Diacritics and Other Morphological Language Considerations: For example, in Relativity, Sebastien & Sébastien are seen as the same by the indexes, because the diacritics (e versus ë, è, é, ê) are flattened. Searching either yields both in the results. However, in many other hosting platforms and most processing platforms, they are indexed as wholly different words. So, from a search perspective, searching for one will get you only that specific one, while the other remains unidentified. There are dozens of similar, important nuances across the globe’s languages.
These risks leave a tremendous margin for linguistic deviation, and the wrong translation of search terms can lead to twin risks of over-including responsive documents, unnecessarily adding to review costs, or missing relevant documents, at the risk of misleading the regulators.
Parties Should Address Multilingual Data Risks as Early as Possible:
Handling multilingual data in merger control proceedings involves numerous process deviations and additional workflows, both of which necessitate diligent planning to ensure timely completion.
Unfortunately, translation is often the last consideration in a disclosure process – and then it’s too late. For timelines not to slip, eDiscovery and language consultants should be involved at the outset, so that the risks concerning the identification of relevant languages, search term translation, staffing multilingual review teams, and customising machine translation engines can be scoped and addressed at an early stage.