Introduction to Cloud Custodian
Cloud dependency around the globe keeps on increasing and continues to shape the technology around the public. With more enterprises and clients migrating to the cloud with vendors like AWS, Microsoft Azure, and GCP, the need for governance increases, mainly cost and security. This blog looks at an open-source tool that helps ensure compliance with security and tag policies, garbage collection of unused resources, and cost management.
Below are some prerequisites to implement governance with these policies: (Here we look at prerequisites related to AWS cloud provider)
- An AWS Account and user
- IAM access for the user to create policies and roles
- A virtual machine or docker setup can be used if Windows is used. If Linux or other distributions are used, nothing needs to be configured.
Installation: The installation details are straight forward and are mentioned here step by step according to the cloud provider of choice.
Discussing a Use Case:
To get a better idea of how Cloud Custodian works, we will look at a general use case that deletes unattached/unused volumes:
The approach taken here is that first, the resource is marked for a particular action later. Commonly, the resource is marked for an operation in n days, which means, if the action is deleted on a resource after n days, then the resource gets deleted after n days.
One of the essential things to note is that, for the policy to take effect on the ordained date, it has to be run on that day once. The policy doesn’t work independently after only running it once initially.
In this particular example, we explain the whole policy in 3 parts below:
Mark Unattached EBS Volumes:
policies: - name: ebs-mark-unattached-deletion resource: ebs comments: | Mark any unattached EBS volumes for deletion in 30 days. Volumes set to not delete on instance termination do have valid use cases as data drives, but 99% of the time they appear to be just garbage creation. filters: - Attachments: [] - \"tag:maid_status\": absent actions: - type: mark-for-op op: delete days: 30
- First, give this policy section a name using the ‘name’ field:
- name: ebs-mark-unattached-deletion
- We select the particular resource we want to work with:
- resource: ebs
- ‘comments’ is optional.
- Then we filter out the ebs resources which have no attachments and a particular tag is absent. In this case let the tag be “maid_status”. So, the filters section will look like:
- filters:
– Attachments: []
– \”tag:maid_status\”: absent
The empty square brackets indicate no attachments. For more generic filters related to ebs, click here.
- Now we decide what action to be taken. We first make use of the mark-for-op. This particular action marks ebs for action later. For details about more actions related to ebs, click here. The action here will be:
- actions:
– type: mark-for-op
op: delete
days: 30
Here, the unattached ebs volume identified by the filter, is marked for operation delete after 30 days.
Delete Marked EBS Volumes:
– name: ebs-delete-marked
resource: ebs
comments: |
Delete any attached EBS volumes that were scheduled for deletion
filters:
– type: marked-for-op
op: delete
actions:
– delete
- Just like previous section, first we give an appropriate name:
- name: ebs-delete-marked
- We select the resource
- resource: ebs
- Delete the ebs volumes which are marked and previously scheduled for deletion.
- filters:
type: marked-for-op
op: delete
The type: marked-for-op here searches for the ebs volumes which are marked for deletion from the first section of the policy. Then, for operation, then delete the identified resource.
- For action, we delete the unattached ebs volume:
- Actions: delete
The Final Policy:
policies:
– name: ebs-mark-unattached-deletion
resource: ebs
comments: |
Mark any unattached EBS volumes for deletion in 30 days.
Volumes set to not delete on instance termination do have
valid use cases as data drives, but 99% of the time they
appear to be just garbage creation.
filters:
– Attachments: []
– \”tag:maid_status\”: absent
actions:
– type: mark-for-op
op: delete
days: 30
– name: ebs-delete-marked
resource: ebs
comments: |
Delete any attached EBS volumes that were scheduled for deletion
filters:
– type: marked-for-op
op: delete
actions:
– delete
Running the Policy
Before we actually run the policy, we first create a yaml file and paste the data from above. Always remember to and add the “policies:” at the start of the policy as mentioned in the note below the image above.
- First for use case purpose we create two ebs volumes, one we tag key as “maid_status “ and another we tag NoTag on the Name key.
This volume is tagged maid_status
The above volume will be ignored when the policy is run.
- Before actually executing the policy, choose to always dry run the policy document using:
custodian run –dryrun -s out ebs-delete.yml
Where ebs-delete.yml is the name of the policy file. Once the policy successfully dry runs we can actually move to executing (running) the policy.
Successful dry run output:
- Now, run the policy using the command:
custodian run -s out ebs-delete.yml
Successful output will look like this:
Now, if we go back to the policy yaml document, we have three sections where we define three policies. Each one of them is mentioned in the output.
Note that the count is “0” for all since there is no ebs volumes here that the filters can be applied to. If there were ebs volumes which were according to the filters and policies and they get marked or unmarked depending on the filters, the specified action will take place and the count changes to “1”
- After running the policy, we see the volume which was initially without the maid_status marked for deletion with the same tag.
Here, according to the policy, we have marked the ‘days’ in filters as 1, so the policy which was run on 2021/12/15 will be marked for deletion on 2021/12/16.
Couple More Points to Remember
- If you are installing Cloud Custodian, remember that it gets installed on the local machine or the VM.
- The best practice is to dry-run it. (custodian run –dryrun -s out <File Name>)
- These policies can’t share log groups, lambda functions, and event rules. There is no sharing like CloudFormation templates. It is purely 1 to 1 relationship between the policy and the resource.
Written by Chetan Melhotra