Data Mining

Machine learning innovator, Simon Chan, tempers the white noise

Artificial Intelligence and machine learning have suddenly become some of the biggest industry buzz terms around. If a solution claims to utilise these techniques, it suddenly becomes red hot, blazing smoke and mirrors in its impressive wake.

For this reason I was interested to hear Simon Chan – who started machine learning platform PredictionIO – add a spot of pragmatism to the story. PredictionIO itself has proved something of a success. Very popular with developers, it was bought by Salesforce in February this year then donated to Apache over the Summer.

Chan, speaking at ApacheCon Europe in Seville, tells an audience of about 12 how he started as a software developer when he was 14, founded a few companies of varying success rates – many of which used AI – then went on to conduct machine learning research.

He described the talk as “a summary of what I learned over the last decade.” This, in turn, is my very brief précis of his hour long talk.

Chan begins by reiterating that there is a lot of “cool stuff” going on in the field of machine learning but he warns “we need to move beyond this”.

“At the end of the day it is about customers. They need to see the value.”

It is certainly true that machine learning projects are often quite big picture and conceptual in nature, while Chan explains that for developers they can be be made to sound as easy to create as one, two, three: read text book, create model, train data.

Most of the educational processes around machine learning come from a static data set, he adds. Yet in reality you need an application that is learning from new user data continuously and automatically. “This is scary,” he says because it is live, in the field and represents a change in mindset from traditional development. Some of the skills necessary blend more naturally with data engineering.

Firstly, machine learning requires real data – not a nice cleaned CSV file – but your bog standard business data which is often inaccurate and badly labelled. Secondly, says Chan, it needs to process an uncertain amount of data, much of which is contextual, and needs to be captured in real time. Finally it needs to learn continuously – constantly retraining the model­ – and because it is learning automatically on the job it removes the standard QA testing phase.

Chan adds that technicality aside “there is also a product side of the chasm”. This revolves around creating useful applications that provide genuine business value.

“The AI chasm is a change of mindset from science-first to customer-first,” he says.

“Customers are always looking for the latest technology.” They might say something as nonsensical “I want to buy some machine learning,” he explains. Yet all this needs to be translated by the product and development team into something sensible.

In the end, machine learning is no exception to the standard IT rule. Applications may sound exciting and cutting edge but they still need to have a proper business purpose.


« GDPR: The world needs "at least" 75,000 DPOs


CA CEO: Every company will be judged by their technology »