* Indicates that this value was computed. Most of the time this occurs when an instance is only available as 8x so the given value is divided by 8.
In this guide we’ll take a look at various GPU cloud services offerings and we’ll compare the GPU cloud offerings of the large public clouds to independent alternatives, startups, and everything in between. By the end of this guide you’ll have a good understanding of GPU cloud options available today – especially for machine learning and deep learning applications.
What’s the best GPU cloud? Are there viable alternatives to the GPU cloud instances offered by AWS, Azure, and GCP? What GPU instances are available with minimal setup and with helpful starters, templates, and guides?
Why does every GPU cloud make it so difficult to determine total cost?
If you’ve got questions like these and others about GPU cloud providers, we’ve got you covered in the Paperspace Guide to GPU Cloud Providers.In this guide we’ll take a look at the various GPU cloud providers offering GPUs on the web and talk about availability, performance, price, and general ease of use.
Let’s get started!
GPU cloud providers often use different units of measurement with different sensible defaults. GPU machine specs can vary wildly from cloud to cloud – with different instance or machine sub-groupings and different pricing conventions.
Our mission in this guide is to provide a simplified overview of the GPU cloud providers to make it easier to understand what GPU cloud providers are selling in comparative terms and what various services offer as part of their GPU cloud offerings.
First we need to understand what we’re talking about:
Let’s jump into the analysis!
Paperspace is a Series B cloud infrastructure company based in New York City focused on accelerated computing applications.
With more than 500,000 users, Paperspace operates data centers across the US and Europe.
Paperspace is known for having a wide selection of high-performance GPU machines, especially for machine learning and deep learning applications.
The company has two key products: Core, which provides GPU-backed VMs in Windows and Linux, and Gradient, which provides Notebooks, Workflows, and Deployments for machine learning users.
Benefits of Paperspace include a wide selection of machines, very fast start-up time, and free GPU options via Gradient. Drawbacks include limitations on types of free instances.
Linode, acquired by Akamai in 2022 for $900M, was once one of the pioneers of the cloud computing industry. When Linode launched in 2003, cloud computing and cloud infrastructure were just coming into focus as concepts. Linode was one of the first on the internet to make it really easy to spin-up a server in the cloud for hosting applications.
At the time of writing, Linode has only a single GPU type. Although a single GPU instance type places Linode ahead of scores of general cloud hosting companies, the lack of instance types is a negative among GPU cloud providers as it leaves little room for growth and experimentation.
That said, compared to the A100 offered by single-GPU-vendor Vultr and the V100 offered by single-GPU-vendor OVH, the RTX 6000 offered by Linode is an excellent value play as it is far less expensive with substantial GPU memory.
It would be a welcome sight for Linode to continue adding more GPU types.
Amazon Elastic Compute Cloud or EC2 is one of the oldest products in the AWS portfolio. At the time of the public beta release in 2006 there were no GPUs in the lineup but over the years EC2 has adopted more GPU support in more regions, especially with the rise of AWS SageMaker.
Critics have pointed out that EC2 has very few GPU options available given the market dominance that AWS enjoys in cloud computing generally. But what AWS lacks in options and configuration speed, they make up for in pricing power and volume discount. EC2 really shines when it comes to top-end 8-way clusters operating under reserved contracts of 1-3 years.
In addition, AWS SageMaker provides a layer on top of EC2 for machine learning and deep learning use cases. This includes SageMaker Studio Notebooks and other tools.
That said, AWS is known neither for simplicity nor for ease of use. Like other AWS products, it can be extremely time consuming to get up and running on GPU instances via EC2. Since this is the primary tradeoff, AWS is usually best for teams brining large-scale GPU computing projects into production.
CoreWeave, founded in 2017, is a New York City company founded by a team from the asset management and cryptocurrency space. The team began its life building sophisticated cryptomining operations and parlayed skills learned building cost-efficient infrastructure for finance into a GPU cloud computing platform.
Today CoreWeave operates around 8,000 servers across 7 data centers in the US. CoreWeave has one of the better GPU catalogs, offering a number of Ampere-series GPUs such as the A100 and RTX A4000, RTX A5000, and RTX A6000.
Google’s GCP offers six different GPU types which are available to add on to new or existing VMs. Since GCP provides GPU instances as "add-on" to regular VMs, it makes pricing a little bit complicated as VM costs need to be added to GPU costs to achieve a reasonable understanding of costs. On the flip side, the ability to select any of GCP's VMs to attach to GPUs makes the offering appealing for those who desire highly configurable instances.
Many users find the Google GCE interface easy to work with compared to other public clouds like AWS.
In the deep learning world, Google owns and operates Kaggle and Colab, each of which provide free GPUs in the form of Jupyter notebooks. Since these free offerings are extremely popular, Google enjoys the benefits of having a large audience of developers who are already accustomed to working with specific GPUs -- notably the P4, T4, and P100.
Jarvis Labs is an India-based company founded in 2019 that makes it fast and easy to train deep learning models on GPU compute instances.
Jarvis Labs operates data centers within India and is known for making it extremely easy to get up and running quickly.
Jarvis Labs is most popular among data science students, who find the simple interface and access to GPUs helpful. Although the offering is limited when trying to scale, it is perfectly acceptable for data science learning and exploration.
Lambda Labs is a scientific computing company that has been assembling and shipping GPU desktop and server hardware solutions for over a decade.
Although Lambda Labs offers physical hardware with an exciting number of GPU cards and configurations, the Lambda Cloud, which launched in 2018, is limited to V100, A100, RTX 6000, and RTX A6000 GPU types.
Nevertheless, Lambda Cloud is offering the beginnings of an exciting lineup of GPU cards, optimizing for configurations that are well suited for fixed-budget purchasers.
Microsoft Azure has the best selection of GPU instances among the big public cloud providers. Azure outcompetes AWS and GCP when it comes to variety of GPU offerings although all three are equivalent at the top end with 8-way V100 and A100 configurations that are almost identical in price.
One unexpected place where Azure shines is with pricing transparency for GPU cloud instances. Although Azure like AWS EC2 and Google GCP makes it difficult to synthesize various GPU offerings by detailing each instance on its own page, the end result is that Azure pricing is relatively easy to understand.
Azure is best for production-level GPU computing in which high levels of configuration and scalability are paid for with extensive setup time. Azure has received plenty of criticism for lack of GPU availability so as with any GPU cloud provider it's important to test the claims of what's available against what actually is available day to day.
OVH is a French cloud computing company founded in 1999 and is Europe’s largest hosting provider. Like some other GPU providers on this list, OVH has a long history of web hosting dating back to the early 2000s and recently has been dipping its toes into the GPU world.
OVH offers V100 GPUs (both 16 GB and 32 GB flavors) which were, until the rise of the A100, the pre-eminent GPU on the market for machine learning and deep learning.
OVH has the beginnings of a solid GPU offering but will need to increase the number of instance types to compete with its hyperscale cloud computing peers.
Vultr is a Florida-based cloud infrastructure company best known for high-speed SSD hosting. In May 2022, Vultr introduced Talon, which is primarily a service to deliver fractionalized GPUs. The fractionalized GPU service involves splitting a GPU between multiple VMs and/or service users via NVIDIA Multi-instance GPU technology.
Since this guide is focused on dedicated GPU instances, our consideration of the service is based only on the bare metal GPU offerings, which are the 4-way and 8-way A100 clusters.
Vultr is offering the same best-in-class GPU for deep learning that many other cloud providers are offering. They’re promising to bring big-league computing in the cloud via approachable and sensible defaults.
"For ML applications, I’ve found @HelloPaperspace to have the best UI / UX by far"
"Have been using @HelloPaperspace Gradient Notebooks and it has been an amazing experience so far. ... A true local-like development environment feel 😄"
"I just checked out @HelloPaperspace and wow its soooo beautiful"
"I came across a very exciting feature on Paperspace: they mounted additional storage to every machine for free. That storage has public machine learning datasets. OMG, this is so cool. Great job @HelloPaperspace!!! 👏"
"Trying out @HelloPaperspace after all the problems with colab so far the transparency about what you're getting for your money (and what instances are available) is nice. But all the system information graphs are my favorite."
"Just tried Gradient from @HelloPaperspace. Man that thing is super easy to use. #MachineLearning #CloudComputing"
"First time using @HelloPaperspace. Great way to spend more time learning and practicing ML rather than debugging / setting up a Cloud instance."
"We're testing deployment to @HelloPaperspace GPU cloud. So far it works great! Next week we'll add possibility to launch http://SIML.ai instance on it through Model Engineer - one click and you'll be up-and-running!"