Deep Learning has a data problem
Statistical Data Analysis

Deep Learning has a data problem

Despite being told we’re now in the age of big data, data explosions, and Exabytes, Artificial Intelligence is still suffering from a lack of data.

“There are people saying the same things they were in the 90s,” Professor Neil Lawrence of the University of Sheffield said at the Re: Work Deep Learning Summit in London. “The only difference is scale.”

At the summit, various experts in AI [Deep Learning is a branch of AI where the learning systems are loosely inspired by the human brain] showed off an impressive array of uses, ranging from robot’s interacting with the world to intuitive health diagnostics and intelligent chatbots. Currently, these are often very good at one specified task; identifying certain images, translating text, playing Mario etc. But Google’s DeepMind are now working on ways to train systems to transfer learning: not only mastering one Atari classic, but then use those lessons in other Atari games, and then applying the same methods to teach robots how to better interact with the world.

However, there were repeated highlights of the problems around both the gathering of necessary data and methods for training these highly complex systems with that information. SwiftKey CTO Ben Medlock – whose intelligence keyboard app was acquired by Microsoft earlier in the year – warned that we are still “Oceans apart” between current learning systems and the efficiency of the human brain.

“It’s very easy to look at the successes of Machine Learning, you could believe we’re racing towards human-level intelligence,” he said. “But if you look at how the learning occurs, it’s very different. We still require learning from vast quantities of data and fundamentally the trained human brain learns from very few data samples.”

Collecting large data sets might be easy for the likes of Google and Facebook but they rarely share, meaning the startups are often left to do the legwork themselves. Even once you have enough data that is probably labelled – still often a manual and time-intensive job despite advances in unsupervised learning– it can take a large amount of computing power to train systems.

Simon Edwardsson, co-founder of Computer Vision startup Aipoly, says that while the AI community is a very open one, there’s still trouble with data sharing. While there are an increasing number of data sets being made open for training systems (ranging from autonomous vehicle data to pictures of torsos) they are often they are small, of poor quality, or are released under non-commercial licenses.

One way some companies are getting around that lack of data is through simulation. 3D rendering is now at a point where systems can be trained in virtual environments created through something like the Unity Engine and the results are good enough to be applied in real-world situations. Driverless cars are even being trained using Grand Theft Auto 5.

More data in every sense of the word – the number of data sets, the scale, the quality – being made open and available, better training methods, and the relentless march of computing efficiency are required if we’re ever to fully reach the kind of AI we’ve seen in science fiction for over 50 years.

PREVIOUS ARTICLE

«News Roundup: Judge Thiel, Microsoft vs. cancer, and IoT candles

NEXT ARTICLE

The ‘Hybrid Home’: Why we should initiate this concept now»
author_image
Dan Swinhoe

Staff Writer at IDG Connect.

  • twt
  • twt
  • Mail

Comments

no-images

barry dennis on October 02 2016

Google Search: Cellular Biology = 1,483,000 RESULTS IN .08 SECONDS. Great! Enough to satisfy the most curious, right5? Maybe. There are problems. 1. These results do not include the most authentic and credible of Results from many places: proprietary research generated at universities and think tanks around the world. Research at corporate labs , foundation funded projects and field reports of projects underway or completed. 3. Many other individual efforts generated by the curious, privately funded, never published, secreted away on systems around the world; the DaVinci's of the modern era, thinkers and solutionists who labor as much for personal satisfaction as for personal or societal gain. The point is THAT THE 1.4MM SEARCH RESULTS ARE ONLY THE FIRST STEP. SO LONG AS INFORMATION OF ANY TYPE IS CONSIDERED AS POTENTIALLY PROFITABLE THROUGH SALE OR LICENSE, AND NOT AVAILABLE TO DEEP LEARNING ALGORITHMS FOR MANAGEMENT AND ANALYSIS, THE CONCEPT OF DEEP LEARNING WILL BE HINDERED IN FULFILLMENT.

no-images

barry dennis on October 02 2016

Google Search: Cellular Biology = 1,483,000 RESULTS IN .08 SECONDS. Great! Enough to satisfy the most curious, right5? Maybe. There are problems. 1. These results do not include the most authentic and credible of Results from many places: proprietary research generated at universities and think tanks around the world. Research at corporate labs , foundation funded projects and field reports of projects underway or completed. 3. Many other individual efforts generated by the curious, privately funded, never published, secreted away on systems around the world; the DaVinci's of the modern era, thinkers and solutionists who labor as much for personal satisfaction as for personal or societal gain. The point is THAT THE 1.4MM SEARCH RESULTS ARE ONLY THE FIRST STEP. SO LONG AS INFORMATION OF ANY TYPE IS CONSIDERED AS POTENTIALLY PROFITABLE THROUGH SALE OR LICENSE, AND NOT AVAILABLE TO DEEP LEARNING ALGORITHMS FOR MANAGEMENT AND ANALYSIS, THE CONCEPT OF DEEP LEARNING WILL BE HINDERED IN FULFILLMENT.

Add Your Comment

Most Recent Comments

Resource Center

  • /view_company_report/775/aruba-networks
  • /view_company_report/419/splunk

Poll

Crowdfunding: Viable alternative to VC funding or glorified marketing?