Beyond AWS: Tips for startups outgrowing commodity public cloud

Engin Akyol, CTO of Distil Networks shares his story about outgrowing AWS

This is a contributed piece by Engin Akyol, CTO of Distil Networks


Soon after founding Distil Networks in 2012, we began to grow quickly. With malicious website traffic posing a constant threat to web-oriented businesses, demand was high for a technology like ours that could protect organizations from threats like malicious bots, web and screen scraping, spam, click fraud, and brute force attacks.

As a startup, we initially relied on the convenience and agility of commodity public cloud services to support our business, but as we grew, we ran into some unexpected challenges with that approach fairly quickly. This might seem surprising, considering the widespread perception that public cloud is a magic bullet, but these obstacles are actually pretty common – especially for companies like ours that need to easily scale their environment while ensuring consistently high performance for their applications.

This is a look at the obstacles we faced, how we overcame them, and considerations for other startups as they evolve and potentially outgrow public cloud environments.

Growing up fast

The popularity of our bot detection and mitigation services is based on the ability to monitor Web requests and build a digital fingerprint of every incoming connection, identifying and blocking bots across all of the other websites protected by the platform. Over the last year, we raised $21 million in series B funding and quadrupled our business, serving Fortune 500 and Alexa Global 10,000 customers like CrunchBase, easyJet, Glassdoor, Orbitz, Staples, StubHub, Wayfair and Yelp. But it was this rapid expansion of our customer base and service portfolio that ultimately forced us to confront the limitations of the Amazon Web Services (AWS) public cloud services we were using and pushed us to consider IaaS alternatives that were more finely tuned to our specific and evolving needs.

Early on, AWS was a great solution for our needs, especially since we were mainly testing proof of concept and trying to keeping costs low with no capital investment. We got the flexibility, convenience and value we were looking for, but as we continued to grow, we very quickly ran into issues with performance and latency, which was unacceptable since our business is based on the ability to deliver bot protection on every single request within microseconds.

Here’s a rundown of the top roadblocks we faced with AWS:

Larger cloud instances – To achieve the performance and reliability we needed, we had to purchase increasingly larger cloud instances, which was hugely cost-prohibitive, considering we also needed to scale our environment substantially to meet growing demand.

Costly support – We learned quickly that talk actually isn’t cheap. We were spending about 10% of our large monthly bill with AWS on customer support. This was a real problem since any cloud issues that affect us also affect our customers. We needed to be able to communicate with AWS whenever necessary without focusing on extra costs.

Lots of “other stuff” – As we began to closely dissect our monthly invoices, we realized we were getting billed for lots of miscellaneous items, many of which were associated with trying to achieve the performance levels we needed to support our platform - including storage, IOPs and bandwidth. Our costs actually spiraled to the point that we had to limit our dev team’s use of AWS.

We needed a different approach to infrastructure – an environment that would give us the automation and elasticity to scale on demand, just like public cloud, but with significantly higher power and throughput capabilities to provide the performance levels we needed for our customer-facing platform – and it also needed to be cost-efficient. We ultimately moved our entire production environment from AWS to a global bare-metal IaaS environment with Internap that included locations in Atlanta, Dallas, Los Angeles, New York, Seattle and Silicon Valley, as well as Amsterdam, London and Singapore.

Once this high-IOPs environment was up and running, we immediately saw a double-digit performance increase, which optimized our bot blocking platform, and we also reduced our costs by 20% per month. And we don’t pay extra for support. Now we can deliver and maintain the service levels needed to protect our customers without worrying about racking up an additional support bill.

The lessons we learned easily apply to other businesses that rely on delivering performance-intensive web applications. Startups facing a similar transition as they expand in size and scope may want to keep in mind these three considerations that defined our infrastructure journey:  

Balance performance with agility: For organizations that need scalable, high-performance compute, moving our core applications to non-virtualized bare-metal Iaas gave us the best of both worlds. Since each server is dedicated to us, it eliminates the ‘noisy neighbor’ effect of sharing hardware with other users and delivers significantly higher power and throughput for our applications. We also get to keep the on-demand scalability and instant provisioning benefits of public cloud with options for monthly, hourly or even per-second billing.

Connectivity matters: Our current infrastructure environment includes global, route-optimized internet connectivity with nearly 80 points of presence worldwide. This is an often-overlooked but really important piece of the overall performance puzzle since it allows us to reduce latency and get closer to our customer base.

Burst as necessary: Deploying a mixture of bare-metal cloud and public cloud can balance performance and costs with global scale in a way that public cloud alone cannot. Allocating bare-metal IaaS for performance-reliant applications can reduce the number of servers you need to utilize, compared to what a shared, virtual server environment would require in order to reach the same performance levels. At Distil, we’ve achieved this optimal mix by running all of our baseline traffic on bare-metal IaaS, with the ability to easily burst into public cloud for immediate scale-up capacity during periods when traffic surges. We use this capability, for example, around Black Friday, when bot activity tends to escalate dramatically.

Non-human internet traffic can lead to content theft and fraud, and it can impose a business-impacting performance tax on its victim organizations. Distil helps these organizations fight back, and we’re better equipped than ever to do so based on a thoughtful, strategic approach to IaaS performance and scale. Like so many other businesses, commodity public cloud provided exactly what we needed to get started, but moving to a more specialized, high-performance infrastructure is helping us get to where we need to go.