Meet Michael Stonebraker, database pioneer

If you were to compile a database of every type of database and set a team of business intelligence experts onto it, I can guarantee one outstanding pattern would emerge. They would all, in some way, have detectable traces of Michael Stonebraker, a professor and inventor whose research has been the foundation for dozens of companies.

However, business intelligence only tells us about the past. A study using data science, on the other hand, could gives us hints about the future of databases. Sadly, data scientists are in short supply in these parts and if we could find one we couldn’t afford their fees, so we shall have to use another technique to model our forecasts. Below, we shall examine some of the companies that Stonebraker has spawned and glean some unstructured data from a phone interview with the great man.

As a computer scientist specialising in database research at the University of California, Berkeley in the 1970s, Stonebraker created prototypes and technologies central to many relational database systems on the market today. Among the companies he has founded are Ingres, Illustra, Cohera, StreamBase Systems, Vertica, VoltDB, Tamr and Paradigm4.

Stonebraker spent 29 years as a Professor of Computer Science at Berkeley, developing the Ingres and Postgres relational database systems. These have been the foundations for relational concepts used by Sybase, Microsoft and Computer Associates. He has won silos full of awards, ranging the IEEE John von Neumann Medal in 2005 to the Google-sponsored ACM Turing Award in 2014. At Berkeley he inspired students such as Sybase founder Robert Epstein, VMware co-founder Diane Greene, Tibco co-founder Dale Skeen, CalSoft founder and CEO Anupam Bhide, and Cloudera co-founder Mike Olson.

Now in his seventies, he is currently an adjunct professor at MIT, where he has been involved in the development of the Aurora, C-Store, H-Store, Morpheus, and SciDB systems.

Despite all these successes, he doesn’t seem to have an exit strategy – possibly because he’s already doing the things he’d love to do if money was no object. Google put up $1m as a prize for the Turing Award, which has been described as the Nobel peace prize for the IT industry, but that still didn’t drive him out of the MIT labs.

What would a business intelligence expert conclude from the data on Stonebraker’s career?

“The industry has changed wildly different and bigger now,” he says. “Everyone needs a database now and hardware changes the rules every 10 years.”

Current projects include VoltDB, Tamr and Paradigm4. These are all specialised database management systems, created in response to today’s cloudy conditions. Paradigm4’s SciDb, for example, addresses a database usage problem that most scientists face, which is that they have to spend 90% of their time extracting and preparing the data that they want to look through. In other words, scientists only dedicate 10% of their working time to analysis. The purpose of SciDB is to reverse those proportions, by making data easier to find.

One of Stonebraker’s inventions was a system for analysing data in real time. The company Streambase arose out of the Aurora project that Stonebraker researched, which looked at ways of analysing streaming data. 

What would a Streambase analysis make of events in the database industry now, which seems to be in a rapid state of flux? 

There are three major strands in the industry: data warehousing, transaction processing and everything else, he suggests. There is no real pattern emerging yet on the myriad of technologies being spawned for the fastest growing element- the ‘everything else’ segment.

“The most interesting phenomenon to watch will be how data science will replace business intelligence,” says Stonebraker. “As soon as we can train enough data scientists, they’ll take over from the business intelligence people.”

That sounds a bit harsh. What’s the difference between the two and are the differences so insurmountable that we can’t convert business intelligence experts to become data scientists?

The difference, according to Stonebraker, is partly technical and lies partly in outlook. Business intelligence systems are easy to use front ends onto a database that aggregate the products of tables and numbers. This can give you flat historical information about, say, the effect that hurricanes in Florida had on the sales of certain items in all the local Walmart stores.

A data scientist will look at far more diverse sources of information: an eclectic range of forms and dimensions, then sourcing information from graphs and arrays and all manner of unstructured files. This enables the data scientist to move onto predictive modeling, to be proactive and to anticipate events.

A business intelligence expert can tell you how much money they made in the past, but they wouldn’t see their own demise coming. A data scientist would, if only the database industry could create enough of them. The shortage of good data scientists and the huge demand for them means that this is the industry graduates should be getting into if they want to cash in.

Business intelligence types lack the necessary skills and empathy for a job career in predictive analytics, then. That, surely, is the type of conventional wisdom that the business intelligence people shouldn’t take lying down. Maybe soon a new pattern will emerge, that shows an uptick on the charts, where BI experts surge into the data science industry.

Meanwhile, in the US every university is setting up a data science programme, taking graduates from business schools and computer science backgrounds. Stonebraker says the successful candidates in this field could be identified by three characteristics: a critical mass of understanding about business administration, memory and how databases work, which will give candidates a natural feel for the possibilities of science.

Such as? “Genomics is going to be huge. Soon you will be able to get your complete genetic sequencing for a few thousand dollars,” says Stonebraker.

Is there any last hurdle that needs to be cleared?

“The biggest problem is that scientists need to be convinced that they need better databases. This is part of what SciDB is about. The poster boy for the wrong way of doing things is the Hadron Collider.”

To be fair, when he says ‘wrong way of doing things’ I think he is referring to the organisation of the data, not the actual experiment. Still, criticising the Hadron Collider: which data scientist could have predicted that outcome?


« C-suite career advice: Gavin Wilson, Sociomantic Labs


Typical 24: Wieland Alge, Barracuda Networks »
Nick Booth

Nick Booth worked in IT in the UK’s National Health Service, financial services and The Met Police, witnessing at first hand the disruptive effects of new technology. As a journalist and analyst, his mission is to stop history repeating itself.

  • Mail


Do you think your smartphone is making you a workaholic?