Data Mining

Exclusive: Inside the technology behind the Panama Papers scandal

The ‘Panama Papers’ scandal regarding leaked documents from Panamanian law firm Mossack Fonseca exposing tax havens for world leaders and others is making headlines everywhere today.

The electronic discovery software that helped make those documents so valuable and understandable was provided by Australia-headquartered company Nuix. I spoke this morning to Carl Barron, Nuix’s senior solutions consultant, to get an inside track on how the story unfolded.

Barron told me Nuix has had a strong relations with the International Consortium of Investigative Journalists (ICIJ) for over five years including a tie-up relating to the Australian Firepower corporate fraud probe. The ICIJ recommended Nuix to German newspaper Süddeutsche Zeitung, which it worked alongside on the Panama Payments investigation.

“SZ consulted us and we worked with them on hardware and workflow processes,” Barron said. “We were involved with both sides [SZ and the ICIJ]. The actual investigation when we started getting involved was around September last year but I don’t believe the data came in one large batch. It would only take us about 1.5 days to index the 11.5 million files of the 2.6 terabytes [collection].”

The data held by the ICIJ and Süddeutsche Zeitung contained some optical character recognition (OCR) challenges.

“This was electronic data but [some of it] was ex-paper based. There were large amounts of emails. It’s broken down into 11.5 million different files and nearly five million emails. There were PDFs and images that needed to be OCRed for bringing text into electronic formats.”

Nuix was used to index documents, search them and identify relationships across files.

“Nuix is a very powerful indexing engine that will extract text from these files and the metadata, and you can then run very simple or very complex queries and it can also be used to see relationships such as names in documents or the sender of other emails.”

The project started from a small desktop and SZ later bought a Windows server to process the data. The size of the data volume was “fairly routine”, Barron said. “It’s not a vast amount of data but I understand there was a fair amount of OCR so there’s always the challenge of [reading data] so it’s not ‘garbage in, garbage out’.”

After some initial deployment consultancy on hardware and workflow, SZ and ICIJ staff were able to analyse the trove behind a firewall to keep their work private.

Barron says there is scope for further revelations as journalists and researchers will be able to add more search criteria and build out relationships across names and data.

Nuix CEO Eddie Sheehy has blogged about the project here.


Related reading:

Nuix CEO seeks needles from digital haystacks


« Typical 24: Ciaron Dunne, Genie Ventures


Exasol CEO: The need for real-time analysis used to be "laughable" »
Martin Veitch

Martin Veitch is Contributing Editor for IDG Connect

  • twt
  • twt
  • Mail


Do you think your smartphone is making you a workaholic?