The world may be moving to the cloud, but the growth of compute- and I/O-intensive analytics has kept many new big data workloads on-premises, in corporate data centers, and not in the cloud. At least for now.
Net-new workloads built around mobile devices, cloud services, social technologies and big data -- the so-called third platform, as described by analyst firms such as IDC -- can quickly overpower existing data center infrastructure. These workloads have unpredictable scale, dispersed components, and can generate, process and store copious amounts of sometimes sensitive data. At the same time, the maturity and cost of the public cloud has yet to fully meet the needs of organizations experimenting with these new workload types, especially big data analytics.
That dynamic is reflected in recent server market numbers. The worldwide server market enjoyed positive growth in 2014, IDC reported, despite the bottom falling out from under IBM´s high-end server business (down by 17.2% in the fourth quarter). Overall, 2014 server revenue increased 2.3% to $50.9 billion when compared to 2013, and unit shipments increased 2.9% from the prior year to 9.2 million units. In the volume sector, investments by hyperscale IT companies and service providers to support new workloads drove growth of 4.9% to $10.8 billion for the fourth quarter. Meanwhile, enterprises investing in new scalable systems for third-platform workloads helped the midrange server market swell by 21.1% to $1.4 billion year-over-year in Q4 2014.
That so many analytics workloads are still running on-premises is no surprise to companies peddling those third-platform applications -- even those that fully support cloud computing.Take Cloudera, a data management company with software based on Apache Hadoop. "When we were founded, we thought that the main deployment model was going to be the cloud," said Charles Zedlewski, Cloudera´s vice president of products. "We were quickly disabused of that notion," he said, and today, upwards of 90% of its deployments run on-premises. Why didn´t cloud take off for Cloudera? First, a cloud deployment model for analytics immediately eliminated many customers that didn´t need to move to cloud, or couldn´t for security reasons.
"When you look at who has a lot of data -- the federal government, financial services, telcos -- they all have enormous investments in their data centers," Zedlewski said, and thus no need to outsource that capacity. Meanwhile, for those users that would consider a hosted offering, many expected Cloudera to go way beyond just hosted Hadoop. "They wanted us to host their websites; we had a lot of mission creep," he said.
Taking advantage of Amazon Web Services (AWS) wasn´t really an option. The public cloud service was in its infancy, with limited instance types, and few enterprise customers. So Cloudera was on the hook for building and maintaining its own data center -- a pricey proposition.
"But a lot has changed since then," Zedlewski said. There´s now a critical mass of large infrastructure as a service providers on which Cloudera can host. Both AWS and Windows Azure support robust instance types for demanding data-processing workloads. There are also more data and applications in the cloud to analyze.
No right way to host workloads
Despite the flexibility of cloud, the infrastructure that organizations are considering for their big data analytics workloads is predominantly on-premises, said Nik Rouda, senior analyst with Enterprise Strategy Group.
In a survey, ESG found that when it comes to new big data infrastructure, 18% of respondents said they are planning to use dedicated (non-virtualized) servers for analytics workloads; 30% are looking to traditional virtualized infrastructure; and 21% are considering dedicated analytics appliances from the likes of Oracle and Teradata. Only 21% are considering public cloud, while another 10% are thinking about a public/private hybrid deployment.
"There are still a wide variety of deployment options out there, which says to me that people are still experimenting," Rouda said.
Often, the decisions around big data servers end up being based on things other than the workloads´ needs. "Sometimes, the thinking is, ´We´ve always done it this way,´ and so people go to their built-in biases or best practices," Rouda said.
That said, there´s an awareness that analytics applications have different requirements than other workloads. "There are a lot of changes going on, and few people are saying that their existing infrastructure is entirely adequate for their new needs," Rouda said.
Any new infrastructure, he said, should be evaluated for its ability to support big data attributes, such as:
- Scalability up or out.
- Adequate performance independent of location.
- Cost effectiveness. The assumption that public cloud is always cheaperdoesn´t necessarily hold with analytics workloads.
Go where the data lives
For big data workloads, what matters most is where the data being processed lives.
"If your social platform is cloud-based, it makes sense for your analytics platform to be cloud-based," Rouda said. If the data already exists in-house, process it on data center servers, to minimize networking charges. This also speeds access times and time-to-analysis.
Providing good data access times is particularly important, given the huge increase in employees using analytics databases, Rouda said. Where once only a handful of people, such as business analysts and data scientists and an occasional executive, would need those databases, now up to 40% of employees in some organizations rely on them. Use cases can be as diverse as a truck driver looking at an optimized route for package delivery, or a sales rep looking at updated inventory and pricing, Rouda said.
Indeed, data location is the North Star of many analytics providers. "Wherever you create data, it tends to stay there, because it´s such a pain to move it," said Cloudera´s Zedlewski.