Understanding Kubernetes and Governance With Jim Bugwadia

Host: Hi everyone. Thanks for joining me in today’s episode of Scale to Zero, I’m Purusottam, Co-Founder and CTO of Cloud Annex. Today. We have Jim Bugwadia with us. Jim is the co founder and CEO of Nemata, the Kubernetes policy and governance company. He’s an active contributor in the cloud native community and currently serves as a co chair of of Kubernetes Policy and Multitenancy working groups. He’s also a co-creator and maintainer of Kiverno, the policy engine built for Kubernetes DevSecops teams.

Jim, it’s wonderful to have you here. For our viewers who may not know you, do you want to briefly share about your journey?

Jim: Absolutely. Thank you Puru. And thanks for having me. And thanks everybody for joining. So I started my journey as a software engineer, software developer, and I still actively code to get involved in some of the product aspects, especially on some of our open source projects. But by background in my career I started off working on telecommunications network management type of systems.

So building centralized, managed planes for complex systems, distributed systems which are mission critical, 24/7 always on, and lots of interesting lessons learned in scalability high availability security. Which of course, now if we fast forward to our current domains of cloud technology and cloud native technologies like Kubernetes containers, some of these patterns still, of course apply, but in many different ways in there.

Host: Makes sense. So one of the things that you highlighted that you still code honestly, I was not expecting but it’s good to see that you’re still coding because like, as an engineer, it makes us happy, right? It makes us feel that we are contributing directly. So I can see that.

So let’s get into the episode. Right? So the way we do it is we have two sections. The first section focuses on mostly like for today, it would be Kubernetes and Misconfiguration security, and the second part would be around the rapid fire. So I want to start with Kubernetes, right?

Like Kubernetes now has become the de facto like, orchestration platform for applications. And when it comes to Kubernetes, there are two ways users can deploy their workloads, right? One is you use a hyperscaler managed offering like your EKS for AWS, GKE for GCP or EKS for Azure. Or let’s say you do your cell phone. Like you go pair metal for edge scenarios and stuff like that. So maybe let’s start with cloud, like the hyperscaler managed offerings

When it comes to the cloud generally all the clouds have a shared responsibility model. They publish that. Does that apply to Kubernetes as well?

Jim: Yes, it does, right.  Kubernetes in fact has an interesting architecture where you have the control plane elements, which is where things like your API server and everything in Kubernetes goes through the API server as well as things like scheduler. So added Scott’s, kubernetes, as you probably know, is a bin packing scheduler, right? So it does placement of resources you’re trying to configure onto machines and nodes that are available. So all of these components, they run in the control plane and the control plane also has a database, etc is a default database. So in a managed Kubernetes environment, the cloud provider takes care of the control plane. But you’re responsible for the worker nodes. And what you’re paying for typically in your monthly bill is the worker nodes, right? So everything from configuring the worker nodes, securing the worker nodes, to running workloads in Kubernetes, everything above the control plane, it’s the user’s responsibility to secure, to manage, and to scale. So that shared responsibility model that you rightly pointed out does apply to Kubernetes. And in Kubernetes you can think of it very broadly as the control plane configuration and then the data plane configuration, which includes the worker nodes and the workloads.

Host: That makes a lot of sense because that’s what they abstract anyway, right? But some organizations are often unsure about it, like what is covered, what is not. So how should they think when they are, let’s say, getting into a digital transformation mode and they are moving to cloud and they are trying to deploy their workloads? How should they think about it so that they make sure that they are secure and they are also covering their workloads?

Jim: Yeah, so great points because Kubernetes is by default insecure, right? And even if you’re using a managed service, even though the cloud provider might be securing again, some control plane components and you might not get access to etc. Or the scheduler, the users, the customer is responsible for securing everything else. So the best way to think about it is to start with in Kubernetes, there’s Pod Security Standards and everything. In Kubernetes, every workload, the basic unit of deployment of operation is a Pod, and a pod is a group of containers. So starting with basics like making sure, first of all, your container images, you would want to scan them for vulnerabilities before they get deployed. Then you want to make sure that once you have sanitized those, you’ve done the image scanning.

When the container is deployed inside of a pod, you have the correct pod configurations. So each Pod there’s an element called a security context and that has several different settings in Kubernetes which users are responsible for configuring and setting. So if you don’t set those, the defaults are not secure. And that’s something to remember. But the community and the Kubernetes authors publish with every version of Kubernetes, something called the Pod Security Standards. And then you can have admission controllers, policy engines or other ways of enforcing these Pod Security standards. In prior versions of Kubernetes, there was a built in object called Pod Security Policies and that got deprecated, that’s replaced now with Pod Security Admission. But you have to still opt in. You have to make sure Pod Security admission, first of all, your service provider is enabling that and you have to make sure that it’s not set to privilege, which means allow all. So you want to set it to a secure default, something like restricted. Or if you want something a little bit more relaxed, there’s a baseline configuration you can use. So those are just very basic best practices for pod security. But then beyond that, a pod is just, again, a single, maybe one or more correlated containers. You typically have workloads in Kubernetes like deployments stateful sets. And those will have services. We will have an ingress for external traffic. You will have our back roles. So there are several configuration objects, right? Just for a single workload, you might have like a half a dozen to maybe a dozen or two dozen objects that you have to configure. So making sure all of those are configured correctly is extremely important. And that is something. One of the reasons why sometimes folks will say Kubernetes seems complex is because of all these moving pieces. But it’s complex because it’s solving a complex set of problems. And there are solutions to each one of these things which are fairly simple to configure and to install, to get to secure defaults for your users within the enterprise itself.

Host: So the first thing that you mentioned that was very important for everybody, right? That by default, it’s not secure. It’s your responsibility to make sure there are so many configurations, you have to make sure that you apply the right configuration so that you make it secure. So that’s a key thing, right? Often we assume that, yeah, it is secure. I just need to just follow what’s deployed. Right, but that’s the key one. The other thing that you highlighted is around your workload side image scanning. So we’ll talk about that in a bit. But one of the keywords that you used quite a bit is the configuration. Configuration or misconfiguration. Right?

And I want to talk about that a little bit. And similar to cloud misconfiguration, Kubernetes is also prone to misconfigurations. And I think there was a study recently where it was said that around 70% of the Kubernetes issues are because of misconfigurations.

So how should that be avoided? And what best practices would you recommend teams follow so that they set it up the proper way?

Jim: Yeah. So maybe it’s important to step back and also understand why Kubernetes has so much configuration that gets required, right. And also who’s responsible for some of this configuration. So one of the things you might have heard is in Kubernetes, the API, the interface that’s exposed, is declarative. And in programming, there’s two ways of doing APIs, or two. Even in programming language, there’s declarative programming languages and imperative programming languages. So in a declarative approach, you are telling the system or the compiler or whatever you’re interacting with the desired state you want to get to. And it’s the responsibility of the system to figure out how to get there. Right.

So you’re just specifying your desired state to say, here’s what I want. In an imperative approach, you are giving one instruction at a time to the system on how to get to your desired state, right? So very different approaches of APIs or interfaces in general, and Kubernetes follows a declarative approach. But what that means is, because Kubernetes is a platform for building platforms, it has gobs and gobs of configuration, right? And not all of this configuration is what developers care for. So if you’re coming from a past world and you have done nothing in operations or security, you’re going to look at Kubernetes and it’s going to be scary because it’s like, what is all of this? Why do I care? What is secamp? And do I need to configure this? Why doesn’t somebody else do it for me? But that’s all there for a good reason. It’s there because sysadmins have been configuring this in a traditional pass system. So similarly, in Kubernetes, if you kind of think about it, kubernetes is the first platform truly built for dev SEC apps, right? And it addresses the concern of developers, operations as well as security. And to do this, it requires lots and lots of configuration, right? And it does this in a manner where you are specifying the desired state and the system is working hard.

The controllers in that control plane that we talked about are working hard on your behalf to make the current state match your desired state. Given that background, kubernetes has a lot of configuration, hundreds of API objects. And plus it is extensible. It allows add ons and custom resources and things like that, which add another layer of configuration, right? So this is why Kubernetes can seem like, oh, it’s so easy to misconfigure, or maybe it’s not secure or not giving you all the defaults, right? So that is a big problem. But the solution to this and when we looked at this problem and again, coming from an operations and management background and having sort of developed systems in telecommunications and others, policy was always a major part of those systems, because if you think about how a telecommunications or a telephone network works, it always lights out management. There’s remote offices in places you cannot physically get to very easily. And there’s a small team.

It’s amazing how small of a team manages like a global network, right? And those networks have to stay running. There’s like life critical that you have connectivity and communications available. So to do that, there’s always redundancy. And the policy based management is very important. You cannot go and manually configure these things over and over again through thousands of deployments, right? So you want to set your policy, and the system takes care of that for you. So when we looked at Kubernetes and as we were trying to solve some key problems for our customers at Nirmata, we said, why can’t we apply the same approach? Why can’t we use policies to solve this major problems of misconfiguration. And that was a thesis.

And the idea behind Kivarno, which is the open source project that Nirmata built and then donated to CNCF.

Host: Makes a lot of sense. One thing that stood out from what you said is that Kubernetes is like the first platform, which is built for DevSecOps.

And when we talk about that, there are like, developers who are building capability in Kubernetes or security teams who are trying to secure that workload, or operations team who is trying to keep the platform up and running. Right. And each of them have different goals and objectives. According to you,

How should they work together to make sure that the Kubernetes set up that they have is secure?

Jim: Right. Yeah. Again, going back to this, as we were thinking about that fundamental problem, because let’s take an example of a pod. Within a pod, like I mentioned, there’s a security context that the security team cares very deeply about. Right. They want to make sure that each pod is securely configured. It cannot elevate privileges, it cannot access, like your host namespace, things like that. Right. But then within that pod, there might be some mount points that the operation team cares about. Maybe the operations teams want to inject if they are using a custom CA, they might want to inject certificate roots for that CA and that’s what they’re managing. Right. But then the developer cares about their image. Like maybe if it’s a Python app, they want to know which version of Python, which image, and others. So now you have three people managing one piece of configuration or three roles trying to manage this. So the best way, again, we believe to manage and to allow autonomy, but still alignment across these roles is through policies.

So if you set up policies where security says, I want to make sure that security context is the way I want it, and maybe then others want to say that I want to make sure that my CA roots are injected, the Ops team wants to do that. And then the developers are still managing parts of that config. But policies can then overlay on each other and the entire when the pod is delivered, it’s configured based on the organization or compliance best practices that is required within that deployment. Right. Or within even that cluster. So it could be environment specific, could be configuration based on where the deployment is. And of course there’s regulatory compliance and other organizational best practices that may apply.

Host: Yeah. So one of the things that you highlighted is pod security, like when it comes to, let’s say, pods, focusing on the right configuration of your security context, are there any other areas that the team should keep in mind when they’re working together, when they are trying to deploy the workloads?

Jim: Absolutely. Right. So every resource or object in the Kubernetes API will have similar security sort of related configurations and sometimes there’s other configuration which can indirectly impact security, right? So there might be things, for example, like if you’re not configuring resource quotas like requests and limits for memory and CPU, then one workload can impact the performance of another workload, or a malicious user could launch a denial of service attack by just spinning up resources which consumes all the resources in the cluster. Right? So there are other configuration aspects of a workload which can impact security.

Similarly, like if you’re using ingress, there’s CVE’s on ingress controllers. So depending on which ingress you’re using, you want to make sure you’re not allowing things like, again, running scripts, et cetera, which may be able to do a container escape and get to your host cluster or the underlying kind of file systems, et cetera, in your host namespace.

Right? So every layer, if you’re now beyond ingress, if you’re running service mesh, that has to be secured if you’re using Kubernetes. And by the way, Kubernetes is no longer just about running apps, it’s also about running managing infrastructure, about provisioning infrastructure with IAC type controllers. So if you’re doing some of those, and let’s say you’re provisioning an S three bucket using a project like Crossplane and Kubernetes, you want to make sure that S three bucket has the right configuration. It’s encrypted maybe it’s in the regions that you allow, things like that. Right. So the list keeps going on and on, where all of these in each layer of your system, the security best practices have to be applied through configuration and ideally through policy as code.

Host: One thing that you mentioned, right, it’s not just that you set the security context properly and you are secure. There are other components which can impact how the security of the overall cluster is set up.

One of the things that you highlighted earlier, and I want to dive a little deeper into it, is the workloads, right? Like every like we deploy images to deploy our work and as part of our code or images, we sort of tend to use a lot of open source libraries or third party dependencies or base images and we have no idea of the source or vulnerabilities that they have. Right. And this sort of leads to supply chain attacks, I think.

In one of the recent studies by Anchor, they highlighted that 85% to 97% of enterprise code business use open source and 62% of organizations have been impacted by supply chain attacks, which is like three out of five companies, right? So how should folks address this concern?

Jim: Those statistics are scary, but they are a big concern in the industry and there’s always like, I guess, bad news and good news, right? Of course, like you mentioned, everybody should be alarmed by this. And the challenge, what attackers figured out is if you think about in the last few years, CI systems have become extremely powerful and as much as we have kind of improved in securing production systems. That Jenkins server sitting in somebody’s data server. When was the last time it was patched and upgraded and secured? Right? So attackers have kind of figured out that one way to get to production systems and perhaps an easier way is through the CI systems and CI CD pipelines. Right? So like you mentioned, you know, if your application is composed of hundreds of open source packages, how do you know a library that you’re downloading is actually the library you’re downloading? And if somebody has spoofed that or injected some mileage or something like that, it can be trusted. How do you know that the image you’re deploying on your cluster is actually coming from your build system and it’s secured and it has been scanned? Right.

So when we talk about supply chain security and just in the last this is not a new problem, but it has escalated dramatically in the last few years. But so have the solutions to this, of course, within the Cloud Native Compute Foundation, which Nirmata is a part of, and what we’re within a lot of the projects we contribute to, including Kivarano, as well as Six Store, which is another organization in the Linux Foundation, there have been tools developed to help with all of this. There’s also openss, which is another kind of suborganization within the Linux Foundation.

So the way the basic steps to take to start securing your software supply chain is to sign and verify images, right? So signing the images at bill time with a private key or with some form of keyless signing. Six Store has a very clever solution of keyless signing where it integrates with OIDC, which could be machine identities. So this doesn’t have to be a user identity, could be a machine identity that the build system or a worker node within the build system gets. And you are able to trust it to say, yes, this image was built on a machine that I trust.

And then you want to also attach other metadata and sign metadata to create Attestations, which say yes, I know where this image was built, these are the attributes, this is the script that built it, or this is my GitHub action that built it. This is the user that invoked the bill. It was done on a main branch. And once you have all of this metadata as signed Attestations, along with the image digest which is signed, you can verify all of this through policy. And this is where Kabarno, the Nirmata project, comes in, where you can then before admission into your cluster, kubernetes allows admission controllers to run, which can do multiple checks on your images, check for the signature, check for the provenance, check for other Attestations. You can even do things like, hey, I want to make sure that at least two people in my organization have reviewed this before the code was deployed. Right? So after checking all of this. If you’re now allowing the image, that’s the first step in security. So starting with the signing verification, thinking about what Attestations. But then you also want to run these checks periodically because as we know, once an image has been deployed, there might be a new vulnerability or new CVE.

In that case, you want to periodically run these scans and Kiverno can pull this data from your OCI registries, make sure that there’s no new issues, or if there are, then it will start flagging those. Now, going beyond the basics, you can even do things like attach Sbombs as signed Attestations and verify those. So there’s a lot of interesting things being developed and in fact, there are some standards being developed by openss to provide various levels of compliance to these different security checks that can be performed.

Host: I would definitely take a look at the keyless signature of the images because that helps a lot, right? And using some of these Attestations, like was it from my main brands who invoked the build and stuff like that and I think it all ties back to the policy, right? How do you configure the policies when it comes to your Kubernetes set up? The way you gave an example, like in the admission control check, like have a policy in place so that you can check that the image signing was done properly and not just at the deployment time, but also at the runtime like periodically during the scan. It makes a lot of sense.

Initially we talked about there are two ways of deploying Kubernetes workloads, right? So we spoke about the hyperscaler managed Kubernetes. Now maybe let’s talk about the second way where you roll your own or self hosted. This might be useful for when you are trying to learn or you have edge use cases or you have bare metal use cases and stuff like that.

So the recommendations that you have been sharing, do they apply to the self-hosted capabilities as well? And how should practitioners think when they are thinking about managed offering versus self-hosted Kubernetes set up?

Jim: Certainly there are several use cases where you might want to, like you mentioned, install and manage your own control plane. And in there everything we talked about still applies. So everything about Pod security, about software, supply chain security, workload configuration, best practices, all of that applies. But in addition, I would also strongly recommend using things like CIS benchmarks for Kubernetes which cover the control plane configuration and best practices. Right? So every component in the control plane, like the API server, the scheduler, other controllers, CIS benchmarks has the recommended configuration settings. And you can run tools like at Nirmata. What we do is we will run scanning tools which collects all of this information and also reports on the control plane, whether it’s configured to comply with the CIS benchmarks.

So that would be a good starting point. And then of course, if you’re managing the control plane, you most likely have also your worker nodes as well as your control plane nodes. You want to make sure that your etcd is secure and not accessible because that’s where all the data lies. You want to make sure that you’re using a proper host security. So of course you don’t want attackers to get to your control plane component. So there’s more layers that you would want to cover, which of course, if you’re using cloud, you still need to think through all of that. But there are the cloud provider may provide some tools or some help with securing the infrastructure layers and other aspects.

Host: Okay, that makes a lot of sense because in a way, the things that hyperscalers were abstracting, it’s now your responsibility, right. You have to take care of those. So you need to look at CIS benchmarks and all those areas so that you can improve.

So last question that I have, and this is around the culture, right? Like security is mostly seen as a culture.

And what we have seen is most organization, when they are doing a business transformation process, they focus on deploying their workloads to the crowd environment first and security comes to mind later. Right. And we see that with Kubernetes as well.

What would you recommend organizations to so that they can change this mindset? How can teams work with, let’s say, their management to bring awareness around Kubernetes security?

Jim: Yeah, so I think in many ways, Kubernetes and Cloud Native does help with this. And one thing I often wondered about, and I don’t come from a pure security background, more from an operations and telecommunications. Like I was mentioning, in that domain where security is considered part of the management plane itself, it’s considered as part of the operations team’s responsibility. Right. So it was always strange where if you kind of step into our world of it and there where security was seen as a separate function, and I often question, well, why is it a separate function? Why isn’t it just part of operations? And in the telco world, there’s an acronym called FCAPS, right? So it’s like fault configuration, accounting performance and security. And those are all of the things you want to think about when you’re operating and managing a system. So going back to your question, I feel Cloud Native is moving us back into that direction where it’s saying, look, security is a platform engineering concern.

It’s not some separate team sitting in an ivory tower that’s telling us thou shalt do this or do that. That’s not going to work. We know the problems if you take that approach. And that’s why we kind of moved from separate teams to DevOps and then dev secs. And now everyone talks about platform engineering in the cloud native space because that’s bringing it all back together and saying, look, this is something fundamental you have to think about as a Day Zero concern from the beginning, and you cannot just defer it to some few people who are going to run some audits on your system later. That doesn’t work. So I feel in cloud native there’s a few principles and recurring themes you will see.

Right. And one of these themes is everything as code, so infrastructure as code, policy as code, and even in some ways security as code. Right. Join compliance as code. So going to that what you can now start doing with Kubernetes native policy engines like Kibaro is you can manage the security concerns, these policies, just like you would manage your workloads and just like you would manage your code. So they go through code reviews, they go through automated testing, they go through a release process and it’s really part of the same system. So that is an amazing step forward, I believe, for the entire industry. And I feel platform engineering teams embracing security as one of their core functions is what’s required and it’s not again, delegating that to a separate team is not going to work.

Host: So it’s very interesting, right, because of your background, you see it security as not as a separate function, but rather part of the platform engineering. Right, but I wish everybody would look at security the same way. We would be in a much secure world.

But yeah, that makes a lot of sense and that’s a great way to end the security section.

Summary:

Here are the few things which stood out for me.

  1. For managed kubernetes hyper scalers are sort of responsible for control plane, key value store like, etc. But when it comes to security of the workloads or worker notes, it’s the user’s responsibility.
  2. In case of Kubernetes, like securing. Kubernetes best practice is to implement a few poor security policies like resource quotas, resource limits, signing of images and checking for CVS for the ingress controllers.
  3. To avoid attack on workload security. One of the measure is to use image signing and Sigstore offers a keyless signature approach using the manifest. It should be taken a look at.

Host: So let’s move on to the rapid-fire section.

Rapid Fire:

Host: So the first question is if you were a superhero of cyber security, which power would you choose to have in you?

Jim: I think infinite kindness. And it’s not just cybersecurity, right. But it’s also dealing with, of course, in everything we do, working with folks from several different aspects, like domains, different skill sets, you never know what the other folks are going through. Right. So having that patience, having that understanding, I think is probably the most important power.

Host: Yeah, no, you are spot on. Like having the empathy when you are working in a team or across teams makes a huge difference. So the next one is what are some of the blogs, books or websites that you go to stay up to date when it comes to Kubernetes or Kubernetes security in general?

Jim: With Kubernetes. Most of the Kubernetes tech community is fairly active still on Twitter, even with some of the recent shifts away. So that tends to be, these days, at least one of my number one sort of news sources. Also, there’s tons of great books. I was just reading a book on continuous compliance, which it’s called Investments Unlimited and I highly recommend it. It’s kind of like the Phoenix Project, but more with the security aspect. So it’s talking about how you can have great DevOps practices, be very agile, but if you ignore security, the business can suffer. Right? And it talks about that from a very interesting storytelling perspective. So definitely I’m enjoying reading that. And there are several just what we’re lucky is, of course, just through blog articles, other things that people are sharing in the community and through different projects. There’s so much knowledge to be gained now.

Host: Thank you for sharing. We’ll make sure to tag the books when we publish the video. The last question is a one liner quote that keeps you going.

Jim: So something I recently saw, and this was on LinkedIn, of all places, somebody had shared a quote from Michael Caine, like the movie actor, and he said one of the things he uses in his career and life is ‘Use the difficulty’. And I found that very kind of appealing, where he talks about, of course, in whatever we’re doing, there’s always difficulties, there’s always challenges, but it’s how you use the difficulty to kind of plan your next move or to get ahead or to kind of embrace that and move to the next phase matters. Right? So used to difficulty. I found a very memorable and sort of a good quote.

Host: That seems very powerful to me as well. Thank you for sharing. So, yeah, thank you so much, Jim, for joining us and sharing your knowledge. It was very insightful. There were many things that I learned as well as part of it. So thanks for coming to the show.

Jim: Thank you, Puru.

Host: And to our viewers, thanks for watching. Hope you have learned something new. If you have any questions around security or Kubernetes in general, share those at scale to zero.

We’ll get those answered by an expert in the space. I’ll see you in the next episode.

Thank you.

Get the latest episodes directly in your inbox