A Complete Guide to Kubernetes Jobs

Millions of organizations have turned to the Kubernetes system as an ideal way to manage their microservices. These microservice applications are undeniably intricate—featuring many more moving parts than their monolithic predecessors. Accordingly, Kubernetes structure is a crucial element in managing them. Clusters encompass nodes (worker machines); nodes contain pods (deployable objects representing single processes); and pods play host to containers (application software packages). That hierarchy can be quite confusing to Kubernetes novices. As efficient as the software is, even seasoned professionals must wrangle these resources into prime working order.

Naturally, DevOps teams can’t just implement and forget a Kubernetes backend. With companies running 250+ Kubernetes containers on average, there are a multitude of processes happening in production at any given time. Keeping tabs on processes helps with gauging overall cluster performance. That’s where jobs come into play. When pods carry out processes, the job’s role is to supervise it from start to finish.

Additionally, a job will create pods in conjunction with a termination schedule, that is, a job governs process execution until “a specified number of successful completions is reached.” Deleting a job removes its associated pods. Suspending a job does the same, unless that job resumes later on.

In this article, we’ll cover some job use cases, best practices, and show you how to create them.

Job use cases

At its simplest, one job can run one pod. Accordingly, you may increase a job’s scope to encompass a greater number of pods or active processes. Should a pod initially fail or be deleted, the job will replace that troublesome pod according to its instructions. Jobs are considered objects within the Kubernetes universe. Per the Kubernetes documentation, this means that jobs are persistent entities that represent cluster states. They describe what containerized apps are active, any behavioral policies, and the available resources at specific times. You might see where these jobs have immense introspective value on the infrastructure side. Because jobs have a status, you can continually assess a job’s current state, whether running, pending, completed, or ContainerCreating. You may also view attributes like Restarts, Age, and Readiness.

You can run many jobs in Kubernetes. Here’s a shortened list of what they can oversee:

Countdowns
Computations
Prints
Message processing
Work queues
Node behaviors
Resource consumption

You can kick off a job by using a kubectl command. Your job’s specific code resides within a YAML or JSON file; it’s necessary to insert your resource URL within that command, as follows (originating from an official Kubernetes source, in this case):

It’s then helpful to check your job’s status via the same command interface. Kubernetes will spit out an output in human-readable YAML, though occasional numeric outputs are more geared toward computers. This may be the case with logs, where pod records are expressed as integer sequences. Say we’re checking in on a job named “inspect.” This might be our output after running a command like kubectl describe jobs/inspect.

Understanding the definition file

You cannot expect a job to execute properly without some predefined parameters. Accordingly, every job originates from a definition file or resource that determines a job object’s configuration. This determines how each job will function and has relevance to your overall objective. It may be structured like the following, for a job named random-job:

yaml

What if you want to apply that definition to a certain Kubernetes cluster? Use the following command:

By checking the status of your new job, you can track the job’s life cycle from ContainerCreating to Completed. Pay careful attention to restartPolicy, as this field is dichotic. Your only available options are Never and OnFailure, as there’s no sense in always restarting a pod following completion.

Job types

There are three principal job types in Kubernetes, defined according to how one or more processes are handled:

Multiple Parallel Jobs: Often called a work queue, this involves running multiple jobs concurrently, or in parallel. In many instances, it’s not practical to allow one job to finish before starting another one. Parallel processing is highly efficient and favorable when computer resources adequately support them.
Parallel Jobs with Fixed Completion Count: These jobs occur concurrently, but run a set amount of times before terminating successfully. By setting .spec.completions to a value greater than one, you trigger the formation of successful pods. You may also add an index to these jobs, meaning that each pod is assigned a portion of the overall task to complete.
Non-parallel Jobs: This specifies a job that executes single-handedly or independently. Only one successful pod is started, with additional pods forming in response to any startup failures. Once a pod terminates successfully, that specific job is complete.

Non-parallel jobs automatically default to Completions: 1 when .spec.completions isn’t defined. Work queues must have an unset .spec.completions attribute, and none of these job types utilizes negative integers (as a job cannot be run fewer than zero times).

As a final note, you may control parallelism for parallel job types. By setting the .spec.parallelism attribute to 0, jobs are paused until that number increases. By leaving that same field unset, Kubernetes defaults to 1.

Handling failures

Not every job runs smoothly, but thankfully there are ways to counteract this issue within Kubernetes. We touched briefly on the restartPolicy feature earlier, but we’ll add more nuance here. Pod failures and container failures can cause problems across your ecosystem. When a container fails, your applications might labor. By stipulating .spec.template.spec.restartPolicy = “OnFailure”, pods will remain on their nodes while rebooting the container. Additionally, a container failure when restartPolicy = “Never” may also cause a pod to fail. Applications must be smart enough to deal with launching in a new pod, and handle files or processes seamlessly.

In parallel workloads, it may be useful for your pods to handle concurrency well. It’s also possible that some programs will start twice, especially when parallelism and completions are set to 1, while restartPolicy = “Never”.

What if a job continues to fail time after time? When this happens, programming or configuration issues are almost certainly at fault. Consequently, it’s not beneficial to subject your pods to multiple failures. The .spec.backoffLimit attribute is therefore effective at limiting these retries to a specific number. Should a job falter six times (the default number), it’s automatically considered a failure. The job controller forms replacement pods at preset intervals following pod failures. A typical interval might be 10s, 20s, 30s, and so on, until it reaches the six-minute limit.

Quick notes on CronJobs

The cron job is quite useful when it comes to running automated tasks, or jobs. Packaged with Kubernetes 1.21, the service allows you to execute jobs on a schedule. You can define the elapsed time between jobs, or the frequency at which a job runs. This is typically denoted in seconds or minutes depending on the task at hand.

Accordingly, cron jobs are to Kubernetes what orchestration tools are to IT. If there’s a task you run habitually, automate it. These types of jobs can otherwise be tedious or easy to forget. A cron job ensures this doesn’t happen while removing job orchestration from your plate. Say you want to create periodic backups, generate timely emails, or even schedule jobs around user-activity periods. Cron jobs help make that possible via configuration files and available kubectl commands.

You may manually check the status of a job following its creation. Finally, you can delete any cron jobs that are no longer useful to you.

Conclusion

There’s little denying the integral role that jobs have in the Kubernetes world. By gaining an intricate understanding of these objects, it’s possible to unlock greater visibility and control over your ecosystem. As always, following best practices and adhering to the Kubernetes documentation is a surefire way to succeed with job utilization. Additionally, many third-party tutorials exist which can help flatten that learning curve.

If you're looking for an alternative job scheduling platform that is maintenance-free and serverless, then consider Airplane. Airplane is the developer platform for building custom internal workflows and UIs that offers a built-in, easy-to-use job scheduling feature. Airplane also offers strong defaults, such as audit logs, permissions setting, and notifications, making it secure and simple to automate your most critical tasks.

Creating a schedule takes minutes and you can get started for free. Book a demo if you'd like to learn more and schedule your first task within minutes.

A complete guide to Kubernetes jobs