Using KubeStash with Azure AD Workload Identity

This guide will show you how to use KubeStash to backup and restore volumes of a Kubernetes workload running in Azure Kubernetes Service (AKS) with Azure AD Workload Identity. Here, we are going to backup a volume of a Deployment into Azure Blob Storage. Then, we are going to show how to restore this backed up data.

Before You Begin

To keep everything isolated, we are going to use a separate namespace called demo throughout this tutorial.

$ kubectl create ns demo
namespace/demo created

Create User-assigned Managed Identity

You need to create an AAD application or user-assigned managed identity and grant permissions to access the Azure Blob Storage. To create a user-assigned managed identity run the following command:

$ export RESOURCE_GROUP=<resource-group-name>
$ export USER_ASSIGNED_IDENTITY_NAME=<user-assigned-identity-name>
$ az identity create --name $USER_ASSIGNED_IDENTITY_NAME --resource-group $RESOURCE_GROUP

We need to assign Storage Blob Data Contributor role to user-assigned managed identity by running the following commands,

$ export STORAGE_NAME=<your-blob-storage-name>
$ export STORAGE_ID=(az storage account show --name $STORAGE_NAME --resource-group $RESOURCE_GROUP --query id -o tsv)
$ export USER_ASSIGNED_IDENTITY_CLIENT_ID=(az identity show --name $USER_ASSIGNED_IDENTITY_NAME --resource-group $RESOURCE_GROUP --query 'clientId' -otsv)
$ az role assignment create \
   --assignee $USER_ASSIGNED_IDENTITY_CLIENT_ID \
   --role 'Storage Blob Data Contributor' \
   --scope $STORAGE_ID

Prepare KubeStash Operator

During KubeStash installation in Azure AD Workload Identity cluster, you need to provide some labels and annotations described here. Here, we have installed KubeStash providing the required pod label azure.workload.identity/use: "true" and service account annotation azure.workload.identity/client-id: <user-assigned-managed-identity-client-ID> by running the following commands:

$ export USER_ASSIGNED_IDENTITY_CLIENT_ID=(az identity show --name $USER_ASSIGNED_IDENTITY_NAME --resource-group $RESOURCE_GROUP --query 'clientId' -otsv)
$  helm install kubestash oci://ghcr.io/appscode-charts/kubestash \
      --version <kubestash-version> \
      --namespace <kubestash-namespace> --create-namespace \
      --set-file global.license=/path/to/the/license.txt \
      --set-string kubestash-operator.podLabels."azure\\.workload\\.identity/use"="true" \
      --set-string kubestash-operator.serviceAccount.annotations."azure\\.workload\\.identity/client-id"=$USER_ASSIGNED_IDENTITY_CLIENT_ID \
      --wait --burst-limit=10000 --debug

Now, we are going to create identity federated credential for our KubeStash operator’s service account,

$ export SERVICE_ACCOUNT_ISSUER=(az aks show -n $CLUSTER_NAME -g $RESOURCE_GROUP --query "oidcIssuerProfile.issuerUrl" -otsv)
$ az identity federated-credential create \
     --name "operator-cred" \
     --identity-name $USER_ASSIGNED_IDENTITY_NAME \
     --resource-group $RESOURCE_GROUP \
     --issuer $SERVICE_ACCOUNT_ISSUER \
     --subject system:serviceaccount:kubestash:kubestash-kubestash-operator

Prepare Deployment

Here, we are going to deploy a Deployment with a PVC. This Deployment will automatically generate some sample data into the PVC. Then, we are going to backup this sample data using KubeStash.

Create Deployment

At first, let’s deploy the workload whose volumes we are going to backup. Here, we are going create a PVC and deploy a Deployment with this PVC.

Create PVC:

Below is the YAML of the sample PVC that we are going to create,

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: kubestash-pvc
  namespace: demo
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Let’s create the PVC we have shown above,

$ kubectl apply -f https://github.com/kubestash/docs/raw/v2024.9.30/docs/guides/platforms/aks/examples/pvc.yaml
persistentvolumeclaim/kubestash-pvc created

Deploy Deployment:

Now, we are going to deploy a Deployment that uses the above PVC. This Deployment will automatically generate sample data (text.txt file) in /source/data directory where we have mounted the PVC.

Below is the YAML of the Deployment that we are going to create,

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: kubestash-demo
  name: kubestash-demo
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kubestash-demo
  template:
    metadata:
      labels:
        app: kubestash-demo
      name: busybox
    spec:
      containers:
        - image: busybox
          command: ["/bin/sh", "-c","echo dummy_data > /source/data/text.txt && sleep 3000"]
          imagePullPolicy: IfNotPresent
          name: busybox
          volumeMounts:
            - mountPath: /source/data
              name: source-data
      restartPolicy: Always
      volumes:
        - name: source-data
          persistentVolumeClaim:
            claimName: kubestash-pvc

Let’s create the Deployment we have shown above.

$ kubectl apply -f https://github.com/kubestash/docs/raw/v2024.9.30/docs/guides/platforms/aks/examples/deployment.yaml
deployment.apps/kubestash-demo created

Now, wait for the pod of the Deployment to go into the Running state.

$ kubectl get pod -n demo
NAME                              READY   STATUS    RESTARTS   AGE
kubestash-demo-77f9c4cb8c-4l26t   1/1     Running   0          3m25s

To verify that the sample data has been created in /source/data directory, use the following command:

$ kubectl exec -it -n demo kubestash-demo-77f9c4cb8c-4l26t -- cat /source/data/text.txt
dummy_data

From the above, we can see the sample data is set successfully.

Prepare Backup

In this section, we are going to prepare the necessary resources before backup.

Prepare ServiceAccount

We are going create a Kubernetes service account and attach annotation azure.workload.identity/client-id: <user-assigned-managed-identity-client-ID> to it.

Let’s create a ServiceAccount in the demo namespace,

$ kubectl create serviceaccount -n demo bucket-user
serviceaccount/bucket-user created

Now, lets attach the annotation,

$ kubectl annotate sa -n demo bucket-user azure.workload.identity/client-id=$USER_ASSIGNED_IDENTITY_CLIENT_ID

Now, we are going to create identity federated credential for our newly created service account,

$ az identity federated-credential create \
     --name "demo-cred" \
     --identity-name $USER_ASSIGNED_IDENTITY_NAME \
     --resource-group $RESOURCE_GROUP \
     --issuer $SERVICE_ACCOUNT_ISSUER \
     --subject system:serviceaccount:demo:bucket-user

Prepare Backend

Now we are going to store our backed up data into an Azure Blob Storage. As we are using workload identity enabled cluster, we don’t need the storage secret to access the azure blob storage.

Create BackupStorage:

Now, let’s create a BackupStorage with the information of our desired azure blob storage. Below is the YAML of BackupStorage CR we are going to create,

apiVersion: storage.kubestash.com/v1alpha1
kind: BackupStorage
metadata:
  name: azure-storage
  namespace: demo
spec:
  storage:
    provider: azure
    azure:
      prefix: demo
      container: ishtiaq
      storageAccount: kubestash
  usagePolicy:
    allowedNamespaces:
      from: All
  default: true 
  deletionPolicy: WipeOut 
  runtimeSettings:
    pod:
      podLabels: 
        azure.workload.identity/use: "true"
      serviceAccountName: bucket-user

Notice the spec.runtimeSettings, here we have to provide the label azure.workload.identity/use: "true" in podLabes and have to provide the serviceAccountName. These values will be set in the cleaner job created by the KubeStash operator. To learn more about spec.runtimeSettings, visit here.

Let’s create the BackupStorage we have shown above,

$ kubectl apply -f https://github.com/kubestash/docs/raw/v2024.9.30/docs/guides/platforms/aks/examples/backupstorage.yaml
backupstorage.storage.kubestash.com/azure-storage created

Now, we are ready to backup our sample data into this backend.

Create RetentionPolicy:

Now, let’s create a RetentionPolicy to specify how the old Snapshots should be cleaned up.

Below is the YAML of the RetentionPolicy object that we are going to create,

apiVersion: storage.kubestash.com/v1alpha1
kind: RetentionPolicy
metadata:
  name: demo-retention
  namespace: demo
spec:
  default: true
  failedSnapshots:
    last: 2
  maxRetentionPeriod: 2mo
  successfulSnapshots:
    last: 5
  usagePolicy:
    allowedNamespaces:
      from: All

Let’s create the above RetentionPolicy,

$ kubectl apply -f https://github.com/kubestash/docs/raw/v2024.9.30/docs/guides/platforms/aks/examples/retentionpolicy.yaml
retentionpolicy.storage.kubestash.com/demo-retention created

Backup

To schedule a backup, we have to create a BackupConfiguration object targeting the respective Deployment. Then KubeStash will create a CronJob for each session to periodically backup the Deployment.

At first, we need to create a secret with a Restic password for backup data encryption.

Create Secret:

Let’s create a secret called encrypt-secret with the Restic password,

$ echo -n 'changeit' > RESTIC_PASSWORD
$ kubectl create secret generic -n demo encrypt-secret \
    --from-file=./RESTIC_PASSWORD \
secret "encrypt-secret" created

Create BackupConfiguration:

Below is the YAML for BackupConfiguration object we are going to use to backup the kubestash-demo Deployment we have deployed earlier,

apiVersion: core.kubestash.com/v1alpha1
kind: BackupConfiguration
metadata:
  name: sample-backup-dep
  namespace: demo
spec:
  target:
    apiGroup: apps
    kind: Deployment
    name: kubestash-demo
    namespace: demo
  backends:
    - name: azure-backend
      storageRef:
        name: azure-storage
        namespace: demo
      retentionPolicy:
        name: demo-retention
        namespace: demo
  sessions:
    - name: demo-session
      scheduler:
        schedule: "*/5 * * * *"
        jobTemplate:
          backoffLimit: 1
      repositories:
        - name: azure-demo-repo
          backend: azure-backend
          directory: /dep
          encryptionSecret:
            name: encrypt-secret
            namespace: demo
      addon:
        name: workload-addon
        tasks:
          - name: logical-backup
            targetVolumes:
              volumeMounts:
                - name: source-data
                  mountPath: /source/data
            params:
              paths: /source/data
              exclude: /source/data/lost+found
        jobTemplate:
          metadata:
            labels:
              azure.workload.identity/use: "true"
          spec:
            serviceAccountName: bucket-user
      retryConfig:
        maxRetry: 2
        delay: 1m

Here, spec.sessions[*].addon.jobTemplate.spec.serviceAccountName refers to the name of the ServiceAccount to use in the backup job and spec.sessions[*].addon.jobTemplate.metadata.labels refers to the labels to use in the backup job.

Let’s create the BackupConfiguration CR we have shown above,

$ kubectl apply -f https://github.com/kubestash/docs/raw/v2024.9.30/docs/guides/platforms/aks/examples/backupconfiguration.yaml
backupconfiguration.core.kubestash.com/sample-backup-dep configured

Verify Backup Setup Successful:

If everything goes well, the phase of the BackupConfiguration should be Ready. The Ready phase indicates that the backup setup is successful. Let’s verify the Phase of the BackupConfiguration,

$ kubectl get backupconfiguration -n demo
NAME                PHASE   PAUSED   AGE
sample-backup-dep   Ready            3m

Additionally, we can verify that the Repository specified in the BackupConfiguration has been created using the following command,

kubectl get repo -n demo
NAME               INTEGRITY   SNAPSHOT-COUNT   SIZE     PHASE   LAST-SUCCESSFUL-BACKUP   AGE
azure-demo-repo                0                0 B      Ready                            3m

KubeStash keeps the backup for Repository YAMLs. If we navigate to the Azure Blob Container, we will see the Repository YAML stored in the demo/dep directory.

Wait for BackupSession:

Now, wait for a schedule to appear. Run the following command to watch for a BackupSession object,

$ watch kubectl get backupsession -n demo
Every 2.0s: kubectl get backupsession -n demo                           AppsCode-PC-03: Wed Jan 10 16:52:25 2024

NAME                                        INVOKER-TYPE          INVOKER-NAME        PHASE       DURATION   AGE
sample-backup-dep-demo-session-1705907281   BackupConfiguration   sample-backup-dep   Succeeded              6m

Here, the phase Succeeded means that the backup process has been completed successfully.

Verify Backup:

Now, we are going to verify whether the backed up data is present in the backend or not. Once a backup is completed, KubeStash will update the respective Repository object to reflect the backup completion. Check that the repository azure-demo-repo has been updated by the following command,

$ kubectl get repository -n demo azure-demo-repo
NAME              INTEGRITY   SNAPSHOT-COUNT   SIZE    PHASE   LAST-SUCCESSFUL-BACKUP   AGE
azure-demo-repo   true        1                801 B   Ready   8m                       9m

At this moment we have one Snapshot. Run the following command to check the respective Snapshot which represents the state of a backup run for an application.

$ kubectl get snapshots -n demo -l=kubestash.com/repo-name=azure-demo-repo
NAME                                                        REPOSITORY        SESSION        SNAPSHOT-TIME          DELETION-POLICY   PHASE       AGE
azure-demo-repo-sample-backup-dep-demo-session-1705907281   azure-demo-repo   demo-session   2024-01-22T07:08:07Z   Delete            Succeeded   29m

Note: KubeStash creates a Snapshot with the following labels:

  • kubestash.com/app-ref-kind: <target-kind>
  • kubestash.com/app-ref-name: <target-name>
  • kubestash.com/app-ref-namespace: <target-namespace>
  • kubestash.com/repo-name: <repository-name>

These labels can be used to watch only the Snapshots related to our desired Workload or Repository.

If we check the YAML of the Snapshot, we can find the information about the backed up components of the Deployment.

$ kubectl get snapshots -n demo azure-demo-repo-sample-backup-dep-demo-session-1705907281 -oyaml
apiVersion: storage.kubestash.com/v1alpha1
kind: Snapshot
metadata:
  labels:
    kubestash.com/app-ref-kind: Deployment
    kubestash.com/app-ref-name: kubestash-demo
    kubestash.com/app-ref-namespace: demo
    kubestash.com/repo-name: azure-demo-repo
  name: azure-demo-repo-sample-backup-dep-demo-session-1705907281
  namespace: demo
spec:
  ...
status:
  components:
    dump:
      driver: Restic
      duration: 1.474354769s
      integrity: true
      path: repository/v1/demo-session/dump
      phase: Succeeded
      resticStats:
        - hostPath: /source/data
          id: c5a88e95e476161b3594ffb2630513a4d3a59007419f618e2aae62995e118eca
          size: 11 B
          uploaded: 1.041 KiB
      size: 801 B
  ...

For Deployment, KubeStash takes backup from only one pod of the Deployment. So, only one component has been backed up. For logical backup, KubeStash uses dump as the component name for Deployment.

Now, if we navigate to the Azure Blob Container, we will see the backed up data stored in the demo/dep/repository/v1/demo-session/dump directory. KubeStash also keeps backup for Snapshot YAMLs, which can be found in the demo/dep/snapshots directory.

Note: KubeStash stores all dumped data encrypted in the backup directory, meaning it remains unreadable until decrypted.

Restore

In this section, we are going to show you how to restore in the same Deployment which may be necessary when you have accidentally deleted any data.

Simulate Disaster:

Now, let’s simulate an accidental deletion scenario. Here, we are going to exec into the Deployment pod kubestash-demo-77f9c4cb8c-fc5qh and delete the text.txt file from /source/data.

$ kubectl exec -it -n demo kubestash-demo-77f9c4cb8c-fc5qh -- sh
/ # 
/ # rm /source/data/text.txt
/ # cat /source/data/text.txt
cat: can't open '/source/data/text.txt': No such file or directory
/ # exit

Create RestoreSession:

To restore the Deployment, you have to create a RestoreSession object pointing to the Deployment.

Here, is the YAML of the RestoreSession object that we are going to use for restoring our kubestash-demo Deployment.

apiVersion: core.kubestash.com/v1alpha1
kind: RestoreSession
metadata:
  name: sample-restore
  namespace: demo
spec:
  target:
    apiGroup: apps
    kind: Deployment
    name: kubestash-demo
    namespace: demo
  dataSource:
    repository: azure-demo-repo
    snapshot: latest
    encryptionSecret:
      name: encrypt-secret
      namespace: demo
  addon:
    name: workload-addon
    tasks:
      - name: logical-backup-restore
    jobTemplate:
      metadata:
        labels:
          azure.workload.identity/use: "true"
      spec:
        serviceAccountName: bucket-user

Here,

  • spec.addon.jobTemplate.spec.serviceAccountName refers to the name of the ServiceAccount to use in the restore job(s).
  • spec.addon.jobTemplate.metadata.labels refers to the labels to use in the restore job(s).
  • spec.dataSource.snapshot specifies to restore from latest Snapshot.

Let’s create the RestoreSession object we have shown above,

$ kubectl apply -f https://github.com/kubestash/docs/raw/v2024.9.30/docs/guides/platforms/aks/examples/restoresession.yaml
restoresession.core.kubestash.com/sample-restore created

Once, you have created the RestoreSession object, KubeStash will create restore Job(s). Run the following command to watch the phase of the RestoreSession object,

$ watch kubectl get restoresession -n demo
Every 2.0s: kubectl get restores... AppsCode-PC-03: Wed Jan 10 17:13:18 2024

NAME             REPOSITORY        FAILURE-POLICY   PHASE       DURATION   AGE
sample-restore   azure-demo-repo                    Succeeded   3s         53s

The Succeeded phase means that the restore process has been completed successfully.

Verify Restored Data:

Now, lets exec into the Deployment pod and verify whether actual data was restored or not,

$ kubectl exec -it -n demo kubestash-demo-77f9c4cb8c-fc5qh -- cat /source/data/text.txt
dummy_data

Hence, we can see from the above output that the deleted data has been restored successfully from the backup.

Cleaning Up

To clean up the Kubernetes resources created by this tutorial, run:

kubectl delete -n demo deployment kubestash-demo
kubectl delete -n demo backupconfiguration sample-backup-dep
kubectl delete -n demo restoresession sample-restore
kubectl delete -n demo backupstorage azure-storage
kubectl delete -n demo secret encrypt-secret
kubectl delete -n demo pvc --all

Next Steps

  1. See a step-by-step guide to backup/restore volumes of a StatefulSet here.
  2. See a step-by-step guide to backup/restore volumes of a DaemonSet here.
  3. See a step-by-step guide to Backup/restore Stand-alone PVC here