Open Policy Agent (OPA) focuses on creating a single declarative policy language (rego) that can enforce compliance and promote security. Different projects focused on a range of areas can use Open Policy Agent, so users have one familiar language to use, and projects don’t have to invent their policy language. One project using OPA is Gatekeeper. Gatekeeper is a Kubernetes-aware policy enforcer and auditor. It can audit deployed resources in a cluster, while also denying resources to be deployed at all.
Let’s deploy Gatekeeper and experiment with creating a policy to forbid using the latest
tag in images.
Update (December 06, 2020)
- Use kind v0.9.0 instead of v0.8.1
- Use kubectl v1.19.4 instead of v1.18.5
How does Gatekeeper work?
Gatekeeper has three components: a controller for creating policies, an auditor, and a validating webhook.
The controller creates Constraint CustomResourceDefinitions for each ConstraintTemplate created in the cluster. The ConstraintTemplates define policies using OPA’s rego language. Constraints inform Gatekeeper the Kubernetes resources policies should be applied (pods, namespaces, etc.) to and any required parameters.
The auditor will scan the cluster’s resources to find any policy violations. Any violations will appear in a Constraint’s status.
Creating or updating resources in the cluster invokes the validating webhook. If a resource violates a Constraint, then the resource creation or modification is denied; otherwise, it’s allowed.
Constraints can either be enforced to deny or dryrun. Deny means the webhook will reject, while dryrun will let it pass. Deny is the more secure thing, but dryrun is excellent for testing out new policies.
Be aware that just because the webhook is enforcing a Constraint, any resources created in the cluster before the validating webhook exists will not be deleted or rejected. The auditor will report the resource as violating, though.
Deploy Gatekeeper
Like almost every post around Kubernetes I write, let’s create a kind cluster to play with Gatekeeper.
Install kind if not installed. I’m using kind version v0.9.0.
Run the following to create a cluster:
|  |  | 
Install kubectl so that we can
deploy Gatekeeper. I’m using kubectl version v1.19.4.
Deploy Gatekeeper’s components by running:
|  |  | 
The above commands will deploy Gatekeeper v3.2.1 and wait for the gatekeeper-audit and
gatekeeper-controller-manager deployments to be available.
Create ConstraintTemplate
Now that Gatekeeper is deployed, we can begin creating ConstraintTemplates. Let’s make a policy
that enforces images not to use the latest tag.
Create a file named latest-image-constraint-template.yaml with the following content:
|  |  | 
A couple of gotchas to be aware of:
- name of ConstraintName must be the lowercase name of spec.crd.spec.names.kind
- target must be admission.k8s.gatekeeper.sh
- targets list may only have one target
- if ConstraintTemplate has invalid rego, the ConstraintTemplate’s status shows build errors
The rego code must have a violation block defined. Gatekeeper will execute this rego for every
matching Kubernetes resource. During execution, the input.review.object is the Kubernetes resource
under evaluation.
In this case, we’re assuming the object under review is a pod. From there, we iterate over each
container in the pod’s spec. The _ is a nice feature of rego that creates a new iterator.
Then for each container found, we check that the image ends with :latest. If the image name
ends with:latest, we create a message as part of the violation.
A neat thing about rego is the block within violation continues processing if the statement is true.
The assignment statements (:=) evaulate as true. endswith only evaluates as true if
the string does, in fact, end with :lastest. If this is false, our violation will be resolved to false and not create a message. It’s a different way of thinking than most of
us are used to, but it’s a pretty nice approach to policies.
Finally, to submit the ConstraintTemplate run:
|  |  | 
Gatekeeper will detect the newly created ConstraintTemplate and create a new CustomResourceDefinition
named LatestImage.
At this point, Gatekeeper isn’t enforcing this policy. To enforce the policy, we’ll need to create a Constraint.
Create Constraint
ConstraintTemplates define a policy, while Constraints dictate which Kubernetes resources to apply the policy to and enforce the policy (dryrun or deny as described previously).
Create a new file named latest-image-constraint.yaml with the content:
|  |  | 
It’s a good practice to provide a match. If a match is missing, then Gatekeeper will apply the Constraint to each resource kind. Providing a match improves performance.
Now, create the Constraint via:
|  |  | 
We can see current violations in the status of the Constraint by running:
|  |  | 
to see something like:
Status:
  ...
  Total Violations: 0
To cause a violation, run:
|  |  | 
Wait about a minute for Gatekeeper to run audits and then describe the constraint again via:
|  |  | 
and now we’ll see:
Status:
  Total Violations: 1
  Violations:
    Enforcement Action:  dryrun
    Kind:                Pod
    Message:             container <nginx> uses an image tagged with latest <nginx:latest>
    Name:                nginx
    Namespace:           default
The offending pod is listed as well as our message from our rego policy.
We now have audit functionality, but this still doesn’t prevent someone from doing something malicious. To stop this completely, we need to leverage Gatekeeper’s Validating Webhook.
Enforce Constraint via webhook
Go ahead and delete the nginx pod we just created by running:
|  |  | 
For Gatekeeper to enforce a Constraint, a Constraint must specify an enforcementAction of deny.
So go ahead and change dryrun to deny in latest-image-constraint.yaml.
Then re-apply by running:
|  |  | 
Then let’s try creating the nginx pod again.
|  |  | 
And immediately our request errors with:
Error from server ([denied by not-allowed] container
uses an image tagged with latest nginx:latest): admission webhook “validation.gatekeeper.sh” denied the request: [denied by not-allowed] container uses an image tagged with latest nginx:latest 
Look at that! Now no one can create a pod using the latest tag.
Add a parameter to the ConstraintTemplate
So far, the value of ConstraintTemplates isn’t visible. Why not just combine ConstraintTemplates
and Constraints into a single resource? Imagine after we’ve created a Constraint to forbid
the latest tag, we get asked to ban a test tag. This reason is where ConstraintTemplates
start to shine.
Our ConstraintTemplate logic is the same for the latest and test tag. We need to fix the
hardcoding of latest, though. Fortunately, ConstraintTemplates support parameters.
Go ahead and delete our existing ConstraintTemplate and Constraint by running:
|  |  | 
Create a new file named image-tag-constraint-template.yaml with the following content:
|  |  | 
Notice we’ve added a validation section, which validates parameters. The hardcoding of :latest
has been replaced by sprintf(":%s", [input.parameters.tag]) to leverage our new parameter named tag.
Create this ConstraintTemplate by running:
|  |  | 
Now we need to create a Constraint for latest tag. Make a new file named latest-image-tag-constraint.yaml with:
|  |  | 
Notice the addition of the tag parameter and the kind change. Once again, we can deploy this by:
|  |  | 
We’ve now refactored our ConstraintTemplate and Constraint to match the existing behavior.
To enforce a test tag isn’t used, we create another file named test-image-tag-constraint.yaml:
|  |  | 
and submit it by:
|  |  | 
And now the following commands will fail:
|  |  | 
as Gatekeeper’s validating webhook will reject both requests.
We could have copied and pasted our original policy to forbid the latest tag and simply change
latest to test. We’d have to remember we duplicated this code for any bug fixes or enhancements
in the future. Using parameterized ConstraintTemplate enables us to de-duplicate our
ConstraintTemplates.
Next steps
In the next post, I’ll cover how to unit test our policies. This process provides a quicker feedback loop by not deploying to a cluster first and catches mistakes like syntax issues earlier.
Update: You can read about unit testing policies here: Open Policy Agent: Unit Testing Gatekeeper Policies