VMware Cloud is built to provide a general-purpose compute platform for Infrastructure as a service. Other hyperscaler platforms, such as Amazon Web Services, Google Cloud Engine, and Microsoft Azure, also provide generic infrastructure services. Each of these companies has built their platform to be massively scalable, with their own unique service offerings running atop their own unique architecture, leveraging design decisions each company has made in order to operationalize their environment at scale.
VMware Cloud is unique, however, in its ability to provide a common set of infrastructure services with a common management plane – across all these platforms. Over the past 20 years, hundreds of thousands of customers have come to rely on VMware infrastructure for its ability to provide seamless availability – at the infrastructure layer – to any application deployed thereon. Applications can be tolerant of hardware failures due to the resilience built into VMware infrastructure – without needing resilience built into the application. A key design criterion for VMware Cloud, therefore, is ensuring that customers can continue to derive the same availability and reliability they have come to expect of the platform. We can not fundamentally change that experience just because customers may choose to deploy their applications in a cloud deployment of VMware infrastructure.
It is unrealistic to expect all cloud platforms will provide exactly the same hardware configurations and capabilities, yet VMware Cloud services for compute, networking, and storage must be commensurate across all providers. A second design criterion, therefore, is ensuring VMware Cloud services for compute, networking, and storage provide similar – if not the same – options for managing workload performance.
Finally, customers must be able to continue to maintain operational visibility into any workloads running on VMware Cloud. A final design criterion, then, dictates that tooling for operational management should continue to operate for workloads as they are migrated into VMware Cloud services.
These three criteria – consistency of availability, performance, and operations- drive the decisions surrounding how VMware Cloud infrastructure software is built and deployed across VMware Cloud service providers.
Importance of simplicity
Hyperconverged infrastructure arose essentially due to the influx of reasonably priced solid state storage devices into the market. Once solid-state storage became available to consumers as a reasonable replacement for spinning drives (HDD, it suddenly became possible to achieve performance in the storage domain that was once only available through the acquisition of expensive, high-end storage arrays. The democratization of such performance enabled new architectures that eliminated expensive, complex storage area networks. Suddenly it was not only possible but also economical to completely lift the intelligence and capabilities of advanced storage software higher in the architecture, and implement features such as de-duplication, advanced caching, or encryption through the deployment of software-only appliances. By leveraging on-board capacity (direct-attached storage, or DAS) within a fleet of normal / standardized / uniform / x86 hosts, and implementing advanced storage features within a software application, it became possible to eliminate expensive, complex storage networks. This had the effect of driving complexity out of the data center, reducing overall costs, and ushering in a new wave of innovation in the industry – introducing hyperconverged architectures to general data center use cases for private enterprises.
In fact, this is the exact same approach public cloud vendors such as Amazon and Google have pursued for years – however in their cases they were building highly specific platforms for which they required a hyper-standardized environment upon which they could build automation systems. Introducing hyperconverged infrastructure into the data center for private enterprise means they, too, now have the means to build ultra-homogenous, scalable infrastructure for general compute workloads. Public cloud providers such as Amazon and Google have shown the world what is possible by building datacenters with a simple, building-block approach and using automation to build infrastructure services. Managing anything at scale requires the system being managed to be built on repeatable, standardized units not only for the initial build of the system, but also for managing the system post-deployment. Many would argue that these ‘day-two’ operations are in fact more important than the initial build, simply because (ideally) you only need to build a system once, yet it must be managed for a long-time post-build.
Building data center services – and by extension, cloud services – using hyperconverged infrastructure (HCI) provides multiple benefits. Firstly, the environment is hyper-standardized, which means one can leverage economies of scale for repairs and replacements. Since all the systems are (theoretically) identical, it is easy to stock parts for replacements, and just as easy to replace entire systems. Parts and units may be procured in bulk and at scale, driving down the cost of the system. Secondly, it enables automation. Since all the ‘building blocks’ are identical, if you can build a script / system that can automate tasks on one node, you can immediately automate everything.
This is not to belittle the task of automation – automating any one thing in a data center first requires someone decide the 999 other things they are not going to automate (at least not yet). However, for most customers the data center comprises a myriad of systems, each with their own unique forms of administration, many without any kind of application programming interface (API) upon which to build an automation system. This makes the task of automation extraordinarily difficult. Having a data center built on identical blocks of infrastructure paves the way for an automated approach to management.
In the case of VMware Cloud, this new hyper-standardized infrastructure is built on well understood technologies and solutions from VMware. Customers don’t have to sacrifice the availability, performance and operational model(s) they have built over the years for managing their workloads. They can continue to use:
- reservations and limits to guarantee access to CPU and memory
- Affinity / anti-affinity for colocation or separation of workloads
- Networking and security policies to define how workloads will communicate on the network
- Storage policies to define how data will be protected
- vSphere HA, DRS, and vMotion for both proactive and reactive protection of workloads
As a result, a ubiquitous cloud operating system and model is now available for customers to leverage. Whether deploying in an on-premises data center, a partner colocation facility, or a public cloud hyperscaler, VMware Cloud offers a coherent and unified experience for data center and application operators, without requiring any kind of conversion of workloads, or major modifications to their operational model. Workloads may be migrated from a customer owned and managed facility to a cloud data center – using either hot or cold migration techniques – to gain the benefits of cloud scaling, financial transparency, and operational efficiencies.
PS: A Note About Resilience
It should be pointed out, of course, that as customers adopt such a software defined data center (SDDC) model – particularly if they adopt such a model with a hyperscaler such as AWS, Azure, or Google – that this infrastructure will be running in someone else’s data center. These data centers are typically built with the point of view that infrastructure is disposable and easily replaced. The assumption is that the application(s) running therein either provide their own resilience, or they don’t need it. VMware Cloud offers something in between – the ability to provide that familiar resilience without building it into the application. Of course, as the infrastructure is running in someone else’s data center particular care should be taken to examine and accommodate different outages scenarios. When one is not in control of the physical plant, power distribution, racks and other gear, how to accommodate such data center level events becomes an important aspect of your design. More on this in another post.