Kubernetes is a useful tool for deploying and managing containerized applications. Applications deployed on Kubernetes are often stateless, meaning that the applications do not rely on any previously saved data in the cluster to function properly. However, there are scenarios where you need to deploy applications that collect and store data that must be always available, even when you terminate the pod. This is where Kubernetes’ persistent volumes come in.
Kubernetes persistent volumes provide persistent storage for your containerized applications: even after restarting, the application pod will still have access to the previously stored data. One of the most important functionalities of persistent volume is providing storage beyond the lifecycle of a pod. Some use cases for persistent volumes include providing storage for database applications, storage beyond the regular pod lifecycle, persistent storage for storing application logs, providing storage for storing user-generated files in content management applications, and so on.
In this article, we'll explore Kubernetes persistent volumes, including some relevant use cases and how they can be effectively set up and used.
What are persistent volumes?
Persistent volume (PV) is a piece of storage provided by an administrator in a Kubernetes cluster. When a developer needs persistent storage for an application in the cluster, they request that storage by creating a persistent volume claim (PVC) and then mounting the volume to a path in the pod. Once that is done, the pod claims any volume that matches its requirements (such as size, access mode, and so on). An administrator can create multiple PVs with different capacities and configurations. It is up to the developer to provide a PVC for storage, and then Kubernetes matches a suitable PV with the PVC. If there is no PV to match the PVC, the StorageClass
dynamically creates a PV and binds it to the PVC. We will go into more detail about StorageClass
later in this article.
It is important to note that Kubernetes does not restrict PVs to a namespace, which means that a pod in any namespace can claim a PV for storage.
Persistent volumes use cases
The common use cases for persistent volumes are providing storage for database applications and storage beyond the regular lifecycle of a pod.
Providing storage for database applications
Database applications require persistent storage that can last beyond their lifecycle. A database application can collect and store millions of data from different sources while running. It becomes a huge problem if the data no longer exists when you restart or close the database application. Therefore, to host database applications in a Kubernetes cluster, you need to configure its pod to use PV so that the data is still available, even when the pod is no longer active.
Storage beyond regular pod lifecycle
Apart from database applications, several other types of applications require long-term storage. For example, applications that store error logs for future analysis require the logs to be available for an extended period, even if you terminate or replace the pod.
Understanding persistent volumes (PV)
Below is an example of a PersistentVolume
YAML file used for creating persistent volume storage:
You can only create a PV resource declaratively. Kubernetes doesn’t provide an imperative way of creating PV.
Now, let’s look at the specifications of this YAML file.
Capacity
When creating a PV, you indicate its storage size in the Capacity
attribute. In the example above, you are creating a PV of 10 gibibytes.
Access modes
There are currently four access modes for PVs in Kubernetes:
ReadWriteOnce
: This allows only a single node to access the volume in read-write mode. Furthermore, all pods in that single node can read and write to such volumes.ReadWriteMany
: Multiple nodes can read and write to the volume.ReadOnlyMany
: This means that the volume will be in a read-only mode and accessible by multiple nodes.ReadWriteOncePod
: Only a single pod can gain access to the volume.
However, not all storage providers support the four access modes, so the available mode will vary. Check out the list of storage providers and the access modes they support here.
Types of PV
Next, specify the PV type you want to use. Several storage types are provided as Kubernetes plug-ins, and you can check them out here. The YAML file above uses the hostPath
with a path of where it should read and write data.
StorageClassName
The storageClassName is the name of the storage class that will bind the PV to the user’s PVC. When a developer needs storage, they request it by creating a PVC.
What are StorageClasses?
StorageClass is used to define the storage classes offered in the Kubernetes cluster. It abstracts the underlying storage provider.
For instance, you have an AWS Elastic Block Storage of 100Gi that you want to make available in the cluster. Think of StorageClass
as the link that makes the Elastic Block Storage available in your cluster. However, you wouldn’t want applications in your cluster to gain access to all the storage in your Elastic Block Storage. Therefore, you create PVs to designate pieces of storage for applications that need it.
StorageClass is also useful for creating dynamic PVs. In another scenario, a pod requires 10Mi storage capacity, and you create a PVC for it, which you associate with a StorageClass. If Kubernetes cannot find a PV to match your PVC, the StorageClass automatically creates a PV for the PVC.
Most cloud providers’ managed Kubernetes services supply a default storage class when you set up a Kubernetes cluster. To check the default/available storage class in your cluster, run the command below.
The following is an example of a StorageClass
manifest file used in creating a Storage Class:
We can better understand the above manifest file by explaining some of its keys.
Provisioner
The provisioner determines the volume plug-in used by the storageClass
. Several plug-ins such as AWS EBS and GCE PD are available for different storage providers.
Parameters
Parameters contain available configurations accepted by the provisioner.
Reclaim policy
The reclaimPolicy
value can either be Retain
or Delete
. The StorageClass uses it when creating a dynamic PV. The Retain
reclaim policy allows for manual reclamation of the PV resource after the PVC has been deleted. However, the PV cannot be used until the administrator deletes the data in it. The Delete
reclaim policy removes the PV and the associated external storage asset once its PVC is deleted.
Allow volume expansion
When you set the allowVolumeExpansion
value to true
, the Storage Class can expand the PVs attached to it. To expand a PV, edit the configuration of the PVC to the new capacity you need.
It is important to note that AllowVolumeExpansion
is only used for volume expansion and not for shrinking.
Mount options
The value specified in the MountOptions
key will be used when creating dynamic PVs.
Volume binding mode
This mode controls when dynamic provisioning of PVs and Volume Binding
should occur. This mode is explained in more detail in the Lifecycle of persistent volume and persistent volume claim section of this article, and further information about it can be found here.
Persistent volume claims
Persistent volume claim is a request for storage usage by a Kubernetes developer.
Here is an example of a persistent volume claim manifest file:
AccessModes, volumeMode, and resources
accessModes
, volumeMode
, and resources
follow the same convention in persistent volumes.
StorageClassName
You use this key to specify the storageClass
you want to use for storage. When you apply the manifest file, the PV that matches the storage and uses the same storageClassName
will be bound. The Storage Class defined in the storageClassName
will dynamically provision a PV if a matching PV cannot be found.
Lifecycle of persistent volume and persistent volume claim
Now that you have a deeper understanding of persistent volumes and persistent volume claims, we can look at the interaction between PVs and PVCs
Provisioning
There are two ways of provisioning PVs in a Kubernetes cluster, static and dynamic.
Statically provisioned volumes is the process where cluster administrators create several persistent volumes for consumption, which carry the details of the storage to be used in the cluster using a PV manifest file.
Dynamically provisioned volumes are dynamically created when a StorageClass provisions a dynamic PV for the PVC because none of the PVs created by the cluster administrator match the PVC requirements.
Binding
When you create a PVC, you specify the amount of storage your application needs. There is a control loop in the master plane that watches for new PVCs created. Once this loop detects a new PVC, it automatically finds a PV that matches the PVC requests and binds the two. However, if an appropriate PV for the PVC does not exist, StorageClass dynamically creates the PV for the PVC.
In some scenarios, a PVC can remain indefinitely unbound because a matching PV does not exist or the associated StorageClass cannot create the PV.
Using
You configure your application pod to use the PVC as a volume. Once you deploy the pod, Kubernetes looks for the PV associated with the PVC and mounts it to the pod. Once the claim is bound, the PV belongs to you as long as you need it, and no other developer in the cluster can use it.
Storage object in use protection
The cluster uses this protection to ensure that the system does not delete PVs and PVCs currently being used by a pod. For instance, if you accidentally delete a PVC while a pod is still using it, the system won’t remove it until it’s no longer in use by the pod. Likewise, if a cluster administrator deletes a PV that is already bound to a PVC, the system won’t delete the PV until Kubernetes unbinds the PVC from the PV.
Reclaiming
Once you are done with a PV, you can free it up for other developers in the cluster to use by deleting the PVC object. The reclaim policy defined in the PV informs the cluster of what to do after Kubernetes unbinds it from a PVC. The retain policy attribute can have one of the following values: Retained
, Recycled
, or Deleted
.
Expanding persistent volumes claims
There are scenarios where your application might require a larger volume, especially when it already exceeds the capacity limit. To increase the storage, edit the PVC object and specify a larger capacity than you need.
It is important to note that you shouldn’t directly edit the capacity of the PV, but rather the PVC. Furthermore, if you edit both the capacity of the PV and PVC to have the same size, the Kubernetes control plane will assume that the backing volume size has been manually increased and that it doesn’t need to resize it.
Bringing it together
Let’s create a demo Nginx application bringing together all the concepts you have learned in this article. For this demo, you will be using MiniKube to create a local Kubernetes cluster on your computer.
Minikube also comes with a default StorageClass after installation, so you will be using that StorageClass for this demo. To check the StorageClass, run the command below:
The MiniKube uses the virtual node’s filesystem for storage. Now, you will create a dummy file in the virtual node by adding ssh
. Run the following command:
Once the file is created, you can confirm it is created by running the command below:
Exit the MiniKube node by running the exit
command.
Then create a file called nginx-pv.yaml
and paste the following:
Then run the following command to create the objects:
You can check them by running the following command:
Now execute into the Nginx pod by running the command below:
Run the following commands:
This sequence of commands will output the content (Hello from ContainIQ testing Persistent Volume) you saved in the MiniKube virtual node.
Final thoughts
Kubernetes persistent volumes provide storage for applications deployed in a Kubernetes cluster to store data for a very long time. This article introduced you to the concept of PV and its relationship to PVCs and StorageClasses. Furthermore, you learned about some PV use cases and how to create and configure PVs, PVCs, and StorageClasses.
If you're looking to build a custom internal dashboard to help monitor your applications in real time, then Airplane is a good solution for you. With Airplane, you can build robust, React-based UIs that integrate with different databases by connecting to resources. Airplane can also help streamline your engineering workflows for your engineering-centric use cases, such as an incident command center, deployment pipeline, and more.
If you're looking to build a custom dashboard to monitor your applications in real-time, sign up for a free account or book a demo.