How To Scale Your Azure Cluster Like OpenAI Scales ChatGPT

Karol Krupa

Senior DevOps Engineer

Cloud & DevOps

Tech Leaders Corner

Summary

There has been a lot of hype around AI and specifically OpenAI's ChatGPT.
OpenAI spent $70 million on cloud in 2020, before the exclusive deal with Microsoft began.
ChatGPT was, and is, trained on roughly two-thirds of the internet, all of Wikipedia, and two large datasets of books.
Scaling is a vital component of creating IT systems, as it helps to reduce costs while still allowing people to use our product.
With proper configuration, you can be confident that your infrastructure will be durable even with higher consumption.
We will present a sample Azure Kubernetes Cluster creation in Azure's Portal.

There has been a lot of hype around AI and specifically OpenAI’s ChatGPT. This product has recently overwhelmed collective consciousness with its possibilities. Microsoft, which has already invested more than 3 billion dollars in OpenAI, has recently bought a large stake in the company.

Although the exact figure is not known, sources cite $10 billion or more. Setting the large numbers aside, this is extremely beneficial to the AI company, since infrastructure costs are a big hindrance for AI companies. OpenAI spent $70 million on cloud in 2020, before the exclusive deal with Microsoft began.

Although there is no artificial general intelligence (AGI) on the horizon, what we already have is very useful during our daily work. Everything from searching on the internet to writing a draft of this article can be made easier using tools like ChatGPT. Microsoft, whose main target is Enterprise, would love to pass those efficiencies to their clients using their app suite.

Teaching the machine

Before releasing ChatGPT to the wider public, the model had to first be trained on a large dataset. ChatGPT was, and is, trained on roughly two-thirds of the internet, all of Wikipedia, and two large datasets of books. The learning process involves adjusting the parameters of the model to better fit the specific task or domain that it will be used for. This is an unbelievably hardware consuming process. Microsoft built a custom super computer for OpenAI back in 2020:

The supercomputer developed for OpenAI is a single system with more than 285,000 CPU cores, 10,000 GPUs and 400 gigabits per second of network connectivity for each GPU server.

It is hosted in Azure, so it can be easily with Azure services if needed. Some say that the computer could run Cyberpunk 2077 on max settings, although we can’t be sure (jk).

Deploying the model

The trained AI model can subsequently be packaged into a container image and deployed or hosted. Although this is a lot less computationally intensive endeavor, adequate scaling is required.

"I would have expected maybe one order of magnitude less of everything—one order of magnitude less of hype," said Sam Altman, CEO and co-founder of OpenAI.

This much traffic can utterly derail anybody's launch if we are not equipped with adequate infrastructure.

The most straightforward choice would be to use Azure Kubernetes Service - AKS. OpenAI is undoubtedly utilizing it in some of their workflows. They have open sourced their platform-specific autoscalers.

Assuming proper configuration, you can rely on Microsoft's high availability and disaster recovery services. Application Services will be more useful for more ordinary applications when scaling to hundreds of millions of users in a matter of seconds is not required. There is significantly less setting required, and scalability may be automated with a few mouse clicks.

Azure Kubernetes Services

Creating infrastructure, outside the playground, should always be done using Infastructure as Code (IaC). There are multiple ways of writing IaC, so we will present a sample creation in a Portal.

The cluster name is used only to identify your cluster in the resource group, there is no need to create a globally unique name. There are four preset configurations:

Note that you are not bound to the Virtual Machine (VM) size that is suggested by preset, you can change it later in the node pool configuration.

The next step is to define availability zones. This is where you may configure high availability and resiliency at the node pool level. An availability zone is a logical collection of one or more datacenters. They have their own electricity, cooling, and infrastructure.

If one Availability zone fails, the remaining two can handle the load. If you want to provide consistent uptime, you should employ distinct availability zones. Availability zones are not currently supported in all locations, however Microsoft is working to address this.

The Kubernetes version is normally left as the default, which is fine, but along with automatic upgrades, this is a setting that should be carefully considered. The end of life for each version is determined by new releases, with each new version implying the end of life for the version three releases behind. The complete timetable is available in the AKS documentation, which is constantly updated. Microsoft documentation is always full of valuable information, such as:

Azure reserves the right to automatically upgrade your cluster on your behalf.

You might be surprised to arrive from your break to a new (perhaps broken but updated) cluster! You can configure automatic updates, which is a nice option, but you must remember to review the k8schange notes. Changes to the API occur from time to time, and your GitOps/pipelines configuration Kubernetes yaml files may cease to function.

The primary node pool is listed last on the first page of creation. You can alter the node size, which VMs are used, as well as the autoscaler. Keep in mind that autoscaling does not consider your current CPU or RAM consumption. It looks for pending pods and scales to their requirements. This is why it is critical to maintain the request and limit resources across all of your deployments.

You can create more node pools on the second page. Spot Instances are much cheaper for everything FinOps, if you employ entirely stateless components and your workflow is highly burstable!

On the next screen, you may choose whether to use ordinary accounts or Azure Active Directory to authenticate to the cluster, as well as whether to leverage Kubernetes RBAC or Azure. We'd say that utilizing AAD to authenticate is convenient because it allows you to store all your access in one location. We believe that using Kubernetes RBAC within a cluster is a wise decision because it will be simpler to migrate. You should be keeping all config in code regardless.

You can choose to integrate your Kubernetes network with existing network infrastructure (Azure CNI) or use the standard kubenet. Unfortunately, this entertaining argument is totally off subject, so suffice it to say that if you need direct access to your pods from vNet, use Azure CNI; otherwise, use regular kubenet.

Integrations, the final page we'll discuss here, allows us to leverage Azure Container Registry, which is similar to Docker Hub. It's quite handy, and it's great that we have that integration right out of the box.

Only use this if you need to save your own images. Another option for improving quality of life is monitoring, with the option of using Azure Log analytics as our primary monitoring. It is essential to have consolidated log access. Log Analytics allows you to interact with Application Insights and have all your infrastructure and application logs in one place, although we imagine most low-budget setups will go for a self-hosted Grafana instance.

The End

Scaling is a vital component of creating IT systems, as it helps to reduce costs while still allowing people to use our product. We can never perfectly forecast how our new product will be consumed, nor do we need to.

Proper scalability enables us to provide services to multi-million-person crowds, and once the euphoria surrounding a viral breakthrough fades, so will our infrastructure expenditures. With proper configuration, you can be confident that your infrastructure will be durable even with higher consumption.

Looking for support on your projects? Get in touch with our team!

360° IT Check is a weekly publication where we bring you the latest and greatest in the world of tech. We cover topics like emerging technologies & frameworks, news about innovative startups, and other topics which affect the world of tech directly or indirectly.

Like what you’re reading? Make sure to subscribe to our weekly newsletter!

Relevant Expertise:

Microsoft Azure Cloud Consulting

Azure AI & ML Consulting & Development Services