Automation tools promise to accelerate machine learning
Machine Learning

Automation tools promise to accelerate machine learning

Dreaming up obscure insults might be a good way to pass the time in a bar, but it’s a strange day job. Nonetheless, it’s a serious business if you are trying to train a machine to spot unacceptable online behaviour. Data scientists not only need to provide training data; they also need to describe which language is likely to offend within that data. The process, known as annotation, is just one of the laborious tasks data scientists face that IT firms are promising to make easier with automation. Amazon, Microsoft, Google and IBM are offering a raft of technologies to automate machine learning processes (see box). But smaller firms are providing more niche technologies.


Automating annotation

Explosion AI provides Prodigy, software which automates some parts of annotation. It can extrapolate a corpus of relevant terms from a few seed words and helps data scientists quickly confirm the targeted language using a Tinder-like graphical interface.

Co-founder Ines Montani has demonstrated the efficiency of Prodigy in annotating insulting language to help moderate online behavior, for example on social media or ecommerce feedback comments, but the tools have been used to build applications analyzing text in financial services, she says.

 “The bottleneck is training data. Companies are amassing data, hoping they can do something with it. While machine learning might provide some good applications, you still have to document and label the data to use it for training machine learning models,” Montani says.

For the most part, annotation is farmed out to people working remotely on a piecemeal basis via the Amazon Mechanical Turk marketplace, for example. The problem is it can take weeks to get the data back, and only then can data scientists spot problems with their initial assumptions. By partly automating annotation, Prodigy promises to slash the cycle times in producing machine learning models, Ines says.

“The big problem with machine learning is most things don’t work. Data scientists need to be able to experiment and iterate quickly to find the best models,” Montani says.

Explosion AI’s tools are aimed at developers and data scientists, to help them build applications, for example, it provides spaCy, a free, open-source library for advanced natural language processing. 

Demand for such tools is set to increase. According to research from KPMG, seven in 10 CIOs say machine learning and AI is part of their investment plans, and one-quarter of organizations are currently making at least a moderate investment. Perhaps as a result, larger IT firms are starting to focus on higher-level pre-trained services to help accelerate the development of applications based on machine learning.


AI services on tap for business users

Alex Monty, Microsoft Azure application innovation lead, says the tools Microsoft offers are designed to make data scientists more productive, but also help involve business users in the process. “We publish 35 cognitive services, which support speech recognition, computer vision speech, sentiment analysis, for example. You might find business users take those models and build on their pre-trained capability.”

IBM offers Watson Studio which provides a set of AI services created for specific activities such as image recognition, speech recognition and natural language comprehension. All services are trained around a general domain but can be tailored to suit a specific application.

IBM Watson chief technology officer Rob High says: “We have designed the tooling and methodology to make it accessible and useful for clients without experts in artificial intelligence and data science.”


Tools to focus on feature engineering

While Amazon, Google, Microsoft and IBM have developed the automation tools from their technology background, H20 has grown from the grassroots of data science itself, says CEO and founder Sri Ambati. Its Driverless AI tools focus on feature engineering – the process by which data scientists define the features of the data which will be important to developing data models.

“Making features domain specific it is very labour intensive. What we’ve done is automate that aspect of doing data science. The machine can build and evolve feature sets, build a model, and score new events and see how accurate the model is, and then promote the successful new features that were involved. The process helps make a very good feature set in less time than doing it manually,” Ambati says. 

As well as helping data scientists become more productive, the tools help those less experienced in data science operate at an expert level, Ambati says.

H2O is also working on tools for natural language processing for machine learning which it plans to include in Driverless AI in October 2018.


Meta-learning tackles tough machine learning problems

Automation of machine learning is not only applied to make data scientists more productive; DataRPM has developed techniques to help machine learning address problems which were previously very difficult to solve. The company specializes in building machine learning models for predictive maintenance of large, expensive assets, such as oil rigs and gas turbines. Because each asset faces different environmental conditions and has a different maintenance history, it requires its own predictive model. Given some companies maintain thousands of assets, this is impractical using traditional data science techniques.

To get over the problem, DataRPM uses meta-learning, a process of using machine learning to help select which predictive models will be most effective for each specific asset. It says the method increases prediction quality and accuracy by over 300 percent in 1/30th the time than before.

But Erick Brethenoux, Gartner research director, adds a note of caution to the idea that automation, in its various forms, can produce a radical leap in productivity in machine learning. Although automation may help in understanding and preparing data, as well as assist data modelling, the first stage in the cross-industry standard process for data mining (CRISP-DM), the foundation of machine learning, is business understanding.

“Business understanding is not automated. Understanding what data you have, what data you can use and what data you might need are different questions. Modern data scientists are creative, and you cannot automate all those decisions. But there are places where automation can help,” he says.

While automation will help accelerate the application of machine learning, it will not alleviate the shortage of data scientists. As machine learning becomes widespread, with more use-cases, demand for them will continue to rise, Brethenoux says.

IT giants in push to automate machine learning

Google builds on AI history with Cloud AutoML

Google launched TensorFlow as an open-source software library to help build machine learning applications in 2015, although it has offered proprietary tools since 2011. Cloud AutoML is a suite of products that Google says enables developers with limited machine learning expertise to train data models with its Transfer Learning and Neural Architecture Search technology. Cloud AutoML’s graphical user interface helps developers train, evaluate, improve, and deploy models based on their own data, the company says.

Microsoft offers Azure Workbench

Azure Workbench is an online studio that automates some of the processes necessary to develop machine learning models. It supports Jupyter Notebook, a web-based interactive computational environment for text and mathematical expression, which supports dozens of programming languages and is commonly used in open source machine learning projects.

Scripts and algorithms can be defined in the studio and then made available to other users, who can drag templates into the model using workbench wizards, building models without coding, says Alex Monty, Microsoft Azure application innovation lead.

AI services from IBM Watson

IBM offers Watson Studio, an integrated environment designed to make it easy to develop, train, manage models and deploy AI-powered applications. It also provides a set of AI services created for specific activities such as image recognition, speech recognition and natural language comprehension. All services are trained around a general domain but can be tailored to suit a specific application.

Amazon offers pre-trained services within AWS

Within AWS, Amazon provides developers with the ability to add intelligence to applications through an API call to pre-trained services including computer vision, speech, language analysis, and chatbot functionality. Amazon SageMaker includes hosted Jupyter notebooks that make it easy to explore and visualize training data in Amazon S3 storage, the company says.



«A buyers' guide to Endpoint Protection solutions


Datacenters need to be smarter about energy»
Lindsay Clark

Lindsay Clark is a freelance journalist specialising in business IT, supply chain management, procurement and business transformation. He has worked as news editor at Computer Weekly and several other leading trade magazines. He has also written for The Guardian, The Financial Times and supplements to The Times. 

Our Case Studies

IDG Connect delivers full creative solutions to meet all your demand generatlon needs. These cover the full scope of options, from customized content and lead delivery through to fully integrated campaigns.


Our Marketing Research

Our in-house analyst and editorial team create a range of insights for the global marketing community. These look at IT buying preferences, the latest soclal media trends and other zeitgeist topics.



Should the government regulate Artificial Intelligence?