Automation tools promise to accelerate machine learning

Automated machine learning promises to help companies make the most of their resources.

Dreaming up obscure insults might be a good way to pass the time in a bar, but it’s a strange day job. Nonetheless, it’s a serious business if you are trying to train a machine to spot unacceptable online behaviour. Data scientists not only need to provide training data; they also need to describe which language is likely to offend within that data. The process, known as annotation, is just one of the laborious tasks data scientists face that IT firms are promising to make easier with automation. Amazon, Microsoft, Google and IBM are offering a raft of technologies to automate machine learning processes (see box). But smaller firms are providing more niche technologies.


Automating annotation

Explosion AI provides Prodigy, software which automates some parts of annotation. It can extrapolate a corpus of relevant terms from a few seed words and helps data scientists quickly confirm the targeted language using a Tinder-like graphical interface.

Co-founder Ines Montani has demonstrated the efficiency of Prodigy in annotating insulting language to help moderate online behavior, for example on social media or ecommerce feedback comments, but the tools have been used to build applications analyzing text in financial services, she says.

 “The bottleneck is training data. Companies are amassing data, hoping they can do something with it. While machine learning might provide some good applications, you still have to document and label the data to use it for training machine learning models,” Montani says.

For the most part, annotation is farmed out to people working remotely on a piecemeal basis via the Amazon Mechanical Turk marketplace, for example. The problem is it can take weeks to get the data back, and only then can data scientists spot problems with their initial assumptions. By partly automating annotation, Prodigy promises to slash the cycle times in producing machine learning models, Ines says.

To continue reading this article register now