east-meets-west
Master Data Management

Adam Pollitt (Europe) - Lost for Words: When East Meets West

In a globalised environment, where legal and regulatory matters involve corporations and subsidiaries spanning multiple jurisdictions, the level of complexity of potentially-relevant, electronically-stored information ("ESI") increases, which has implications for timing, cost, and liability. This is particularly the case when a matter involves ESI from the East that must be disclosed in the West. If potentially-relevant ESI becomes "lost in translation" as a result of improper handling, the impact to a company's liability and reputation may be significant.

As a result of the omnipresence of certain software platforms in the West, legal practitioners there logically assume that such software platforms are equally ubiquitous in the East. When they then encounter software platforms such as Becky!, Hidemaru, and other platforms virtually non-existent in the West, they find that they lack the ability to effectively and efficiently meet their legal burden for these data sources or, worse yet, find out too late that they altered or missed entire swaths of potentially-relevant data during disclosure.

Software companies and service providers tend to highlight "Unicode compliance" as a bellwether of the ability of their offerings to accommodate the intricacies of ESI generated in the East. The Unicode standard for character encoding, while it exists specifically to mitigate a great number of the technical issues inherent to the historical utilisation of multiple encoding formats, is at the end of the day but one character encoding format.

It is therefore important to understand that while "Unicode compliance" signifies an ability to accommodate data that has been encoded using Unicode, it in and of itself signifies no such ability to accommodate ESI that has been encoded using any of the many other formats. While the Unicode standard becomes more prevalent with each passing year, the reality is that when it comes to the East in particular, the Unicode standard for character encoding is not universal.

In the East, it is common for users to generate ESI using software applications designed for character encoding formats that are specific to their language and character set of their country.  However they may also access or receive data that cannot be displayed correctly due to their machine or software not supporting the character encoding format employed to originally create the ESI.

The Japanese term for this phenomenon is mojibake, which roughly translates to "unintelligible sequence of characters." Mojibake that occurs in the normal course of business and cannot be corrected may be inconvenient but it is not likely to be of legal consequence. However, Mojibake that occurs as a result of improper handling of evidence and that cannot be quickly remedied may roughly translate in the West to a different word:  spoliation.

Far eastern languages are by their very nature inherently different from their Latin character-based counterparts. A single character may represent an entire word, phrase, or concept, and spacing is not a required marker that indicates where words start and stop. Many ESI products in the West for the legal market rely upon spacing to define indexing for subsequent searching and support neither language detection nor tokenisation, which is the ability to index character-based languages on an individual character level irrespective of the presence or lack of spacing.

An index of ESI generated in the East without the use of tokenisation amounts to the Western equivalent of multiple words or phrases indexed as a single word. To make use of such an index, a legal practitioner has no choice but to employ wildcards around each and every character to be searched lest he or she run the significant risk of missing relevant results due to the fact that they were indexed as part of a longer sequence of characters. Searching in such a fashion is inefficient and rarely necessary for data generated in the West.

As is the case in the West with the disclosure of any data, complications and liabilities further down the line can be avoided by mapping that ESI effectively and employing the necessary measures for defensible preservation, collection, indexing, searching, and review. While additional challenges, such as encryption and data privacy, must also be addressed, gaining an early and complete understanding of the software systems in use, the character encoding formats contained within that ESI, how those formats will be correctly addressed, and how that data can most efficiently be indexed and searched, is a must when legal practitioners in the West face ESI generated in the East.

Adam Pollitt, Vice President of Client Development of First Advantage Litigation Consulting. For further information visit his website

PREVIOUS ARTICLE

« Violet Yeo and Harsh Vardhan (Australia) - Outsourcing in the New Normal

NEXT ARTICLE

Kui Kinyanjui (Africa) - Charting the Explosion of Africa's Mobile Phone Sector »

Recommended for You

International Women's Day: We've come a long way, but there's still an awfully long way to go

Charlotte Trueman takes a diverse look at today’s tech landscape.

Trump's trade war and the FANG bubble: Good news for Latin America?

Lewis Page gets down to business across global tech

20 Red-Hot, Pre-IPO companies to watch in 2019 B2B tech - Part 1

Martin Veitch's inside track on today’s tech trends

Poll

Do you think your smartphone is making you a workaholic?