Data Center & Storage Solutions

Top Tips: How to Bring Big Data into your Data Warehouse

hannah-smalltree-headshot2-sm-2Hannah Smalltree has been a self-professed "data nerd" in the technology industry for over 15 years. As a former technology journalist and editorial director for publisher TechTarget, she has interviewed hundreds of companies about their data and analytics projects. Hannah is currently a Director with Mountain View, California-based Treasure Data.

Hannah shares 5 tops tips to bring Big Data into your Data Warehouse.

You already have a great data warehouse (or not) that’s functioned well for a number of years (or not), providing you with an information repository for business intelligence. But now comes Big Data, and you’re being told that everything needs to change. You can’t dump billions of events from clickstreams or remote sensors into your data warehouse, so you need to stage it elsewhere.

So all of a sudden, you need to build an Apache Hadoop cluster, and you’ve got to hire experts with the right skillset and invest in new hardware and software. That is, you need to invest in all this and then wait. Wait for HR to hire experts, wait for the hardware to be configured, wait for the software to be configured. What will it take? Six months? A year? Meanwhile, departments across the company are complaining that they can’t beat the competition until they have access to all data and the actionable insights that come from it.

Fortunately, there’s a much faster way. Today you can rely on one of several Big Data managed services providers to load, store and transform data so you can funnel only subsets of your data into your warehouse. This new generation of providers operate in the cloud, often in multi-tenant database environments, which helps deliver cloud economics and elastic scalability to their customers. The other advantages of cloud-based managed services providers are similar to SaaS offerings: no need to install and maintain hardware and software, no need to hire specialized personnel or take training courses, a monthly subscription fee for predictable budgeting, very fast time to value and 24x7 support.

Here are five tips to consider when evaluating managed service providers and their offerings:

1.         Know which import strategy matches your needs
All of the major cloud-based Big Data managed services providers make it relatively simple to get data into the service, but it’s important to understand which import strategy matches your needs, whether it’s streaming data in near-real-time from the source (also often in the cloud), bulk import tools to upload data in batches or sending disks via the mail, still a surprisingly common strategy. Consider that with any batch import, you may need to stage the data elsewhere before upload to the cloud, adding architectural complexity and latency. You’ll also need to take into account your requirements for speed, reliability and security of data transfers.


2.         Assess the data storage style
This is a critical part of your assessment. To avoid the cost and complexity of storing all this data on-premises, your managed services provider must provide highly secure, highly reliable and easily scalable infrastructure. Carefully assess that the data storage style matches your needs, whether you’re more focused on semi-structured “Big Data” or more structured relational data from business applications. A key advantage of a managed service over your own infrastructure is that you should never run out of space or processing power, so make sure that you can scale up cost effectively, that your provider can handle the amount of data you plan to upload within the timeframes you require and that you can scale down if needed.



3.         Verify analytic functions, tools and performance
Beyond basic transformation, most managed service providers enable you to perform some analytics on data stored in the cloud. This has the advantages of accelerating time to value for some analytical processes while also easing the processing burden on the data warehouse and reducing the need for data movement. Since you don’t want to move all the Big Data stored with the service provider into your data warehouse for processing, make sure the managed services provider offers all the aggregation and filtering tools you’ll need to create the valuable subset of data. Then you can embrace a “logical data warehouse” (Gartner) or similar model, using the cloud-based managed service as a data lake or reservoir, supplementing the data warehouse. However, the SQL functions providers support vary from one to the other. So when you’re evaluating providers, make sure they support the functions you will need.



4.         Assess management and governance
Carefully assess what management services each provider offers. For example, PaaS and IaaS implementations provide cloud platforms, but still require your staff to install, manage, scale and monitor the technology. In contrast, some managed services include all setup, management and monitoring of the service. Also consider available governance features, such as access controls and monitoring services.



5.         Make sure data is protected
Security is always a concern when storing data in the cloud, but consider that most Big Data sources, including clickstreams, media streams and sensor data, are already outside the firewall and not anywhere near the data warehouse, so the goal is to protect the data as it traverses the internet. Also, compare the relative risks of creating a massive new on-premises infrastructure to be maintained by internal resources, who may not be security experts, versus the security offered by a managed services provider that considers security a core competency.


Big Data analytics is in your company’s future. A cloud-based managed services provider is a faster and easier way to make it happen than an on-premises deployment. It’s also a better long-term strategy because the provider’s infrastructure and core competencies will evolve along with the industry. Which means you won’t find yourself in two or three years, again saddled with an aging system that isn’t delivering value—but is putting you at a competitive disadvantage.


Hannah Smalltree is Director of Treasure Data


« UK Policing Unfit for Purpose in Digital Age, says Former Cop


NSA Fallout Continues, Threatening US Tech Leadership »
IDG Connect

IDG Connect tackles the tech stories that matter to you

  • Mail


Do you think your smartphone is making you a workaholic?