The kubectl scale
command is used to immediately scale your application by adjusting the number of running containers. This is the quickest and easiest way to increase a deployment’s replica count, and it can be used to react to spikes in demand or prolonged quiet periods.
In this article, you’ll see how to use kubectl scale
to scale a simple deployment. You’ll also learn about the options you can use when you need a more sophisticated change. Finally, you’ll look at the best practices for running kubectl scale
, as well as at some alternative methods for adjusting Kubernetes replica counts.
Get started with Airplane
Deploy your existing scripts in less than 60 seconds and never worry about scaling again.
Kubectl Scale use cases
The kubectl scale command is used to change the number of running replicas inside Kubernetes deployment, replica set, replication controller, and stateful set objects. When you increase the replica count, Kubernetes will start new pods to scale up your service. Lowering the replica count will cause Kubernetes to gracefully terminate some pods, freeing up cluster resources.
You can run kubectl scale
to manually adjust your application’s replica count in response to changing service capacity requirements. Increased traffic loads can be handled by increasing the replica count, providing more application instances to serve user traffic. When the surge subsides, the number of replicas can be reduced. This helps keep your costs low by avoiding utilization of unneeded resources.
Using Kubectl
The most basic usage of kubectl scale
looks like this:
Executing this command will adjust the deployment called demo-deployment
so it has three running replicas. You can target a different kind of resource by substituting its name instead of deployment
:
Basic scaling
Now we’ll look at a complete example of using kubectl scale
to scale a deployment. Here’s a YAML file defining a simple deployment:
Save this YAML to demo-deployment.yaml
in your working directory. Next, use kubectl to add the deployment to your cluster:
Now run the get pods
command to view the pods that have been created for the deployment:
NAME | READY | STATUS | RESTARTS | AGE |
---|---|---|---|---|
demo-deployment-86897ddbb-jl6r6 | 1/1 | Running | 0 | 33s |
Only one pod is running. This is expected, as the deployment’s manifest declares one replica in its spec.replicas
field.
A single replica isn’t sufficient for a production application. You could experience downtime if the node hosting the pod goes offline for any reason. Use kubectl scale
to increase the replica count to provide more headroom:
Repeat the get pods
command to confirm that the deployment has been scaled successfully:
NAME | READY | STATUS | RESTARTS | AGE |
---|---|---|---|---|
demo-deployment-86897ddbb-66lzc | 1/1 | Running | 0 | 46s |
demo-deployment-86897ddbb-66s9d | 1/1 | Running | 0 | 46s |
demo-deployment-86897ddbb-jl6r6 | 1/1 | Running | 0 | 3m33s |
demo-deployment-86897ddbb-sgcjb | 1/1 | Running | 0 | 46s |
demo-deployment-86897ddbb-tgvnw | 1/1 | Running | 0 | 46s |
There are now five pods running for the demo-deployment
deployment. You can see from the AGE
column that the scale
command retained the original pod and added four new ones.
After further consideration, you might decide five replicas are unnecessary for this application. It’s only running a static NGINX web server, so resource consumption per user request should be low. Use the scale
command again to lower the replica count and avoid wasting cluster capacity:
Repeat the get pods
command:
NAME | READY | STATUS | RESTARTS | AGE |
---|---|---|---|---|
demo-deployment-86897ddbb-66lzc | 1/1 | Terminating | 0 | 3m21s |
demo-deployment-86897ddbb-66s9d | 1/1 | Terminating | 0 | 3m21s |
demo-deployment-86897ddbb-jl6r6 | 1/1 | Running | 0 | 6m8s |
demo-deployment-86897ddbb-sgcjb | 1/1 | Running | 0 | 3m21s |
demo-deployment-86897ddbb-tgvnw | 1/1 | Running | 0 | 3m21s |
Kubernetes has marked two of the running pods for termination. This will reduce the running replica count down to the requested three pods. The pods selected for eviction are sent a SIGTERM signal and allowed to gracefully terminate. They’ll be removed from the pod list once they’ve stopped.
Conditional scaling
Sometimes you might want to scale a resource, but only if there’s a specific number of replicas already running. This avoids unintentional overwrites of previous scaling changes, such as those made by other users in your cluster.
Include the --current-replicas
flag in the command to use this behavior:
This example scales the demo-deployment
deployment to five replicas, but only if there’s currently three replicas running. The --current-replicas
value is always matched exactly; you can’t express a condition as “less than” or “greater than” a particular count.
Scaling multiple resources
The kubectl scale
command can scale several resources at once when you supply more than one name as arguments. Each of the resources will be scaled to the same replica count set by the --replicas
flag.
This command scales the app
and database
deployments to five replicas each.
You can scale every resource of a particular type by supplying the --all
flag, such as this example to scale all the deployments in your default
namespace:
This selects every matching resource inside the currently active namespace. The objects that were scaled are shown in the command’s output.
You can obtain granular control over the objects that are scaled with the --selector
flag. This lets you use standard selection syntax to filter objects based on their labels. Here’s an example that scales all the deployments with an app-name=demo-app
label:
Changing the timeout
The --timeout
flag sets the time Kubectl will wait before it gives up on a scale operation. By default, there’s no waiting period. The flag accepts time values in human-readable format, such as 5m
or 1h
:
This lets you avoid lengthy terminal hangs if a scaling change can’t be immediately fulfilled. Although kubectl scale
is an imperative command, changes to scaling can sometimes take several minutes to complete while new pods are scheduled to nodes.
Best practices
Using kubectl scale
is generally the fastest and most reliable way to scale your workloads. However, there are some best practices to remember for safe operations. Here are a few tips.
- Avoid scaling too often. Changes to replica counts should be in response to specific events, such as congestion that’s causing requests to run slowly or be dropped. It’s best to analyze your current service capacity, estimate the capacity needed to satisfactorily handle all the traffic, then add an extra buffer on top to anticipate any future growth. Avoid scaling your application too often, as each operation can cause delays while pods are scheduled and terminated.
- Scaling down to zero will stop your application. You can run
kubectl scale --replicas=0
, which will remove all the containers across the selected objects. You can scale back up again by repeating the command with a positive value. - Make sure you’ve selected the correct objects. There’s no confirmation prompt, so be sure to pay attention to the objects you’re selecting. Manually selecting objects by name is the safest approach, and prevents you from accidentally scaling other parts of your application, which could cause an outage or waste resources.
- Use <terminal inline bold>--current-replicas<terminal inline bold> to avoid accidents. Using the
--current-replicas
flag increases safety by ensuring the scale only changes if the current count matches your expectation. Otherwise, you might unintentionally overwrite scaling changes applied by another user or the Kubernetes autoscaler.
Alternatives to kubectl scale
Running kubectl scale
is an imperative operation that has a direct effect on your cluster. You’re instructing Kubernetes to supply a specific number of replicas as soon as possible. This is logical if you created the object with the imperative kubectl create
command, but it’s inappropriate if you originally ran kubectl apply with a declarative YAML file, as shown above. After you run the scale
command, the number of replicas in your cluster will differ from that defined in your YAML’s spec.replicas
field. It’s better practice to modify the YAML file instead, then re-apply it to your cluster.
First change the spec.replicas
field to your new desired replica count:
Now repeat the kubectl apply
command with the modified file:
Kubectl will automatically diff the changes and take action to evolve the state of your cluster towards what is declared in the file. This will result in pods being automatically created or terminated, so the number of running instances matches the spec.replicas
field again.
Another alternative to kubectl scale
is Kubernetes’ support for autoscaling. Configuring this mechanism allows Kubernetes to automatically adjust replica counts between a configured minimum and maximum based on metrics such as CPU usage and network activity.
Final thoughts
The kubectl scale
command is an imperative mechanism for scaling your Kubernetes deployments, replica sets, replication controllers, and stateful sets. It targets one or more objects on each invocation and scales them so a specified number of pods are running. You can optionally set a condition, so the scale is only changed when there’s a specific number of existing replicas, avoiding unintentional resizes in the wrong direction.
You can track the number of replicas in your cluster by using a dedicated Kubernetes monitoring platform. However, if you’re looking to move away from managing your own infrastructure and would prefer a serverless solution that handles scaling for you, consider Airplane.
With Airplane, you get a robust, performant maintenance-free platform that can run all the sorts of tools you might run with Kubernetes. Shell scripts, Python functions, Javascript files, REST API calls, and SQL queries can all be turned into Tasks on Airplane, allowing you to build tooling for your team quickly and easily. If you need a frontend, Airplane comes with a UI framework called Views that helps even non-technical stakeholders understand and use your team’s internal tools.
To start building internal tools that work for your whole team without the complexities of managing your own infrastructure, sign up for a free Airplane account or book a demo.