Deep dive into OSE 2.1 Kubernetes Backup & Restore
By: Date: 01/11/2021 Categories: VMware Tags:

We are delighted to announce that OSE 2.1 is now available and offers many new features and enhancements to start immediately benefitting from.

The plethora of the OSE 2.1 new capabilities span from global bucket sync policy applied to a single or multi-site tenants, through the app and subordinate user role customization to Kubernetes backup and restore.

Undoubtedly, the highlight of this release is the Kubernetes backup and restore.  This new feature complements the previously available OSE capabilities for storing unstructured data and backing up Cloud Director tenants’ vApps and Catalogs in S3 storage.

What is Kubernetes Backup and Restore?

The most notable feature in OSE 2.1 is undoubtedly the Kubernetes backup and restore. It comes in handy when you need to recover the data of a Kubernetes cluster that has experienced data loss, an infrastructure problem, or has a defective service that needs to be recovered. Another use case would be if you have a Kubernetes namespace that has been accidentally deleted and you want to restore it, or the whole cluster needs to be replicated for debugging or staging.

The Kubernetes clusters that can be backed up and restored include the CSE-managed Kubernetes clusters (TKG and CSE-managed native clusters) and external vanilla Kubernetes clusters. The CSE-managed clusters are automatically discovered, while the external Kubernetes clusters require their kubeconfig.yaml files are uploaded into the OSE 2.1 UI. In the second case, OSE connects to the selected cluster through that yaml file and deploys the backup and restore agent if the cluster is selected for protection.

How does the Kubernetes backup work?

The central piece of Kubernetes backup and restore is that it utilizes the Velero agent, an open-source tool acquired by VMware in 2018. This backup and restore agent is installed transparently in the Kubernetes cluster that has been selected for protection in the OSE 2.1 UI.  The Velero backup and restore agent works in the following way:

  1. An app deployer part of the OSE package installs the Velero helm chart in the selected Kubernetes cluster that will be protected (CSE-managed or external K8S cluster).
  2. Velero backups the selected Kubernetes data on-demand or during a specified time, e.g. every two weeks.
  3. Velero picks the selected Kubernetes data and stores it in a specified S3 bucket.

The following diagram illustrates the workflow.

After the backup and restore agent is installed in the cluster, you can trigger an on-demand or scheduled backup. When you select a scheduled backup, there are a few options you need to specify:

  • The S3 bucket where the backup will be stored.
  • Whether the backup will be encrypted at rest or not – The encrypt at rest helps you protect your backed up data from unauthorized access as the backup data is encrypted in the selected S3 bucket.
  • The backup frequency – This is to specify when the backup will be performed, e.g., every 14 days at 14:32:58. The backup timeline appears in the information of the protected Kubernetes cluster and is a convenient way to track your past backups.
  • The captured backup data can also be kept in the specified S3 bucket for a specific number of days and then automatically deleted. This helps you optimize storage usage, especially when using a public cloud.
  • Finally, what you need to define here is the scope of the scheduled policy. Unlike the on-demand backup, where you back up the entire Kubernetes cluster, the scheduled backup allows you to select what to back up. The options you have are – the entire Kubernetes cluster or only selected namespaces and labels.

Here is an example of a Kubernetes backup schedule:

What is the Kubernetes data that is backed up?

The data that is backed up includes:

  • Kubernetes states in etcd. To restore the cluster, etcd and all relevant assets must be backed up
  • Application data, i.e., persistent volumes

How does the Kubernetes restore work?

The Kubernetes restore process picks a selected backup and restores it on the Kubernetes cluster, which data has been backed up. The restored data replaces the lost data on the Kubernetes cluster. The restore process is fast and easy and all you need to do is pick the backup instance you wish to restore and trigger the restore process from the OSE 2.1 UI. The Velero agent then restores the backup transparently on the source Kubernetes cluster.