Although advanced neural networks continue to dramatically improve the capabilities of artificial intelligence systems, they are associated with substantial energy use. In an effort to address this problem a growing number of organisations are focused on the creation of technologies designed to reduce energy use in the training and operation of such systems.
Winning ticket
The recent record-breaking predictive performance achieved by deep neural networks (DNNs) has prompted a growing demand to bring DNN-powered intelligence into numerous applications and devices. However, training a state-of-the-art DNN model often requires considerable energy use and is associated with a number of additional financial and environmental costs. For example, a recent report shows that training a single DNN can cost over $10,000 and emit as much carbon as five cars over the course of their lifetimes - limiting the rapid development of DNN innovations and raising a variety of environmental concerns.
In an attempt to remedy this situation, several organisations are engaged in the development of more energy-efficient approaches. One of the most interesting recent initiatives is a joint Rice University and Texas A&M University project that has developed a novel energy-efficient method for training DNNs, based on so-called ‘Early-Bird’ (EB) tickets, called an ‘EB Train.’ As Dr. Zhangyang 'Atlas' Wang, Assistant Professor of Electrical and Computer Engineering at The University of Texas at Austin (until recently, Assistant Professor of Computer Science and Engineering at Texas A&M University), explains, the Early Bird Ticket algorithm leverages an important recent finding called the lottery ticket hypothesis – which describes how a dense and randomly initialised deep neural network (DNN) has a small but critical subnetwork, known as a ‘winning ticket,’ that can be ‘trained alone to achieve a comparable accuracy to the former in a similar number of iterations.’
Figure 4, Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks
Experiments using state-of-the-art benchmarks and models show that the EB Train system can achieve substantial energy savings (of up to 580-1000 per cent) while ‘maintaining the same or even better accuracy compared to directly training the original using standard algorithms.’
Utilising redundancy
According to Wang, while it is ‘straightforward’ to think that training such small subnetworks will cost less resources than the original model, the process of identifying the winning ticket usually requires developers to employ the costly ‘train-prune-retrain’ process, which results in ‘heavy overhead’ and limits to the practical benefits for efficient training.
“Our main contribution is to demonstrate that a winning ticket can actually be identified at a very early training stage, and with drastically lower overheads, via a few low-cost training schemes,” says Wang.
“We call such discovered winning tickets Early-Bird tickets. Taking advantage of that, we propose an efficient DNN training scheme to quickly discover an Early-Bird ticket of the target DNN and then to focus on training the ticket only. To our best knowledge, this is the first step taken towards exploiting winning tickets for a realistic efficient training goal,” he adds.
The Early-Bird Ticket project forms part of the teams’ systematic research efforts to develop ultra-efficient deep learning, supported by the NSF Real-Time Machine Learning (RTML) program launched in 2019. The program seeks to lay the foundation for next-generation co-design of RTML algorithms and hardware, with the principal focus on developing novel hardware architectures and learning algorithms in which all stages of training (including incremental training, hyperparameter estimation, and deployment) can be performed in real time.
“The success of Early-Bird Ticket re-affirms the huge potential of digging and utilizing the redundancy in deep learning for more resource savings than currently. We now plan to integrate Early-Bird Ticket, as well as other efficient learning techniques developed by us, into a few robotics applications where the continual learning capability in resource-constrained conditions is often critically demanded,” adds Wang.
Once-for-all network
Elsewhere, a research team at the Massachusetts Institute of Technology (MIT) has developed an automated AI system for training and running neural networks that significantly lowers the energy generally required to train each specialised neural network for new platforms - which can include billions of internet of things (IoT) devices. As Dr. Song Han, assistant professor in the Department of Electrical Engineering and Computer Science at MIT, explains, deep learning (DL) models need to be deployed on a diversity of hardware platforms, ranging from cloud servers with trillions of FLOPs/s to mobile phones and micro-controllers that have orders of magnitude lower computation and memory.
“To achieve the best performance, many deep learning experts are required to carefully tune the architecture of the DL model for each hardware and efficiency constraint. The process of training DL models also requires vast computational resources, causing excessive energy consumption,” says Han.
In an effort to address this issue, the MIT team investigated an efficient AutoML technique called ‘Once-for-All’ to reduce the cost of deep learning by decoupling model training from architecture search processes. To this end, it trained a once-for-all (OFA) network that supports a range of architectural settings, including depth, width, kernel size and resolution – with inference performed by selecting different parts of the OFA network without retraining.
Train a once-for-all network
According to Han, another key area for the development of ‘efficient and green AI inference’ is TinyML. Although machine learning models used to run on cutting-edge GPUs operating at 300 Watts – and recent techniques have brought AI to mobile devices – he points out that machine learning on tiny IoT devices running at mW are less explored. In order to expand work in this area, he and his team propose the use of MCUNet to enable ‘practical deep learning on microcontrollers.’
“Our work demonstrates that it is feasible to enable ImageNet-scale deep learning applications on microcontrollers, showing that the era of low-cost, low-energy TinyML has arrived,” he adds.
Edge devices
In Han’s view, the high energy use involved in training and running neural networks is particularly problematic when applied to ‘energy-constrained’ edge devices like mobile phones – which will rapidly run out of battery power if energy use is too high. He observes that low energy use is also ‘essential’ for IoT devices relying on a button battery designed to last for several months and energy-harvesting devices that take a small amount of energy from the environment for running networks.
“Training a neural network is energy-intensive. According to a report by researchers at the University of Massachusetts at Amherst, the amount of power required for training and searching a certain neural network architecture involves the emissions of roughly 626,000 pounds of carbon dioxide. That’s equivalent to nearly five times the lifetime emissions of the average US car,” he says.
For Han, this issue becomes ‘even more severe’ in the model deployment phase, where developers need many different DNNs to fit diverse hardware platforms, each with ‘different properties and computational resources.’
“As the number of intelligent edge devices is growing dramatically, this issue will worsen over time. Additionally, large energy consumption will translate to large CO2 emissions, thereby leading to environmental issues,” he adds.
In contrast to existing approaches that use the same neural network for all hardware platforms, Han and his teams’ approach will customise the neural network architecture for each target hardware and efficiency constraint. This means it can provide better hardware efficiency without sacrificing accuracy and, more importantly for Han, ‘does not require extra marginal training cost when adding new target hardware devices.’
Looking ahead, Han also reveals that the MIT team intends to continue rolling out the new system for additional applications.
“Our MCUNet project can already achieve real-life vision and audio applications on commercial microcontrollers costing less that, or equal to, five dollars. We will extend the framework to support more complicated tasks like object detection on embedded tiny devices,” he adds.