How to secure the Kubernetes API behind a VPN
Earlier this week, the first major vulnerability (CVE-2018–1002105) was discovered in Kubernetes, the container management platform taking the DevOps world by storm. The vuln, on a default install, allows an attacker with access to the Kubernetes API to gain full administrator access to the cluster and everything running on it. In cyber security terms, it doesn’t get much worse.
Fortunately, public cloud platforms were quick to patch the vulnerability, but for those who care about security it was a reminder that a single layer of defence is rarely enough. What if anyone was exploiting this before it was publicly disclosed, what if there’s another vuln we don’t yet know about, what if you can’t upgrade your private cluster that quickly? If you’re running anything sensitive on Kubernetes clusters, these questions should matter to you.
Many Kubernetes implementations leave the API server exposed to the internet, and specifically, in Google Cloud Platform’s native “public” implementation you can’t even add a firewall. Even if you could, security by IP white-listing is rarely ideal, as it prevents flexible working locations, and also means an attacker who compromises any device on your office network has direct access to production systems. A VPN is a flexible and secure solution to this problem.
This blog describes a secure architecture for installing a Kubernetes cluster by hiding the Kubernetes API server behind a VPN, while allowing the containers to be accessible from the public internet as normal.
In this case we used the Kubernetes service native to Google Cloud Platform, but the proposed architecture could easily be applied in any other cloud or self-hosted infrastructure.
Secure Kubernetes Architecture
The following image shows our target architecture:
To get started, let’s first create our Kubernetes cluster in its own network. In Google Cloud you can do this by installing Kubernetes in private mode. By selecting this option, Kubernetes’ slave nodes will use non-public routable IP addresses. In this case, we created a VPC network with one subnet:
and two Secondary IP ranges (this isn’t mandatory, but we’re trying to stay true to the general Kubernetes setup guides) which will be used during the Kubernetes setup:
- kubernetes-services: 10.0.32.0/20
- kubernetes-pods: 10.4.0.0/14
Even in “private mode” Google Cloud by default still exposes the Kubernetes API to the internet, so we also have to configure it as a private master node. Now the API can only be accessed from our slave subnet, 10.50.40.0/26. The following diagram shows how our cluster is now set up:
Now you have your Kubernetes cluster securely installed inside your VPC, so it’s only accessible from inside the cloud, but your DevOps team still need to access the API for controlling the cluster. This is where the VPN comes in.
For this guide, we installed an OpenVPN access server (from the Google Marketplace) which gives access to the above private subnet. To make it work, we need two network interfaces:
- nic0, for the External IP address
- nic1, for the created VPC network
For some reason the OpenVPN access server from Google Marketplace comes with only one network interface. So to add the second we created a new virtual machine using the old one as template, this time with two interfaces:
Next we configure the access server to allow VPN users access to our cluster subnets. Add two lines to the “Specify the private subnets to which all clients should be given access (one per line)” setting as follows:
- 10.50.40.0/26 (This allows access to all the Kubernetes nodes and in general to all the machines on the created VPC network)
- 172.16.0.16/28 (This allows access to the Kubernetes master API server on Google Cloud’s own private network)
So far, so good. Except currently the OpenVPN access server doesn’t know how to route traffic to the network 172.16.0.16/28 where our Kubernetes API server exists. To allow the server to route this traffic we had to add a new route:
sudo ip route add 172.16.0.16/28 via 10.50.40.1 dev ens5
This routes traffic from any VPN users on the VPN subnet to the API server, via the interface we placed on the VPC network (nic1 - which in our case has been named as ens5 by the OpenVPN access server).
Now we have the cloud part fully configured, we need to install VPN clients on any workstations/laptops that need access to the Kubernetes API server, so the kubectl management tool can connect to the cluster.
So now everything is secure, our final step is to install an ingress to the cluster, to allow the public to access our apps and services located on the cluster. This creates a LoadBalancer, and the services can be accessed via the external IP of the LoadBalancer.
Hey presto, you now have a secure Kubernetes architecture operating inside a VPC, with public services fully accessible and the private ones nicely tucked up behind a VPN.
We hope this helps the DevOps community continue to move fast and break things, without compromising on security. Any feedback, comments or suggestions are welcome - would you have implemented it differently?
Thanks to Chris Wallis.
- Raw CVE Coverage
- Risk Rating Coverage
- Remote Check Types
- Check Publication Lead Time
- Local/Authenticated vs Remote Check Prioritisation
- Software Vendor & Package Coverage
- Headline Vulnerabilities of 2021 Coverage
- Analysis Decisions
Red teamers, security researchers, detection engineers, threat actors have to actively research type of vulnerability, location in vulnerable software and build an associated exploit.
Tenable release checks for 47.43% of the CVEs they cover in this window, and Greenbone release 32.96%.
Red teamers, security researchers, detection engineers and threat actors now have access to some of the information they were previously having to hunt themselves, speeding up potential exploit creation.
Tenable release checks for 17.12% of the CVEs they cover in this window, and Greenbone release 17.69%.
The likelihood that exploitation in the wild is going to be happening is steadily increasing.
Tenable release checks for 10.9% of the CVEs they cover in this window, and Greenbone release 20.69%.
We’re starting to lose some of the benefit of rapid, automated vulnerability detection.
Tenable release checks for 9.58% of the CVEs they cover in this window, and Greenbone release 12.43%.
Any detection released a month after the details are publicly available is decreasing in value for me.
Tenable release checks for 14.97% of the CVEs they cover over a month after the CVE details have been published, and Greenbone release 16.23%.
With this information in mind, I wanted to check what is the delay for both Tenable and Greenbone to release a detection for their scanners. The following section will focus on vulnerabilities which:
- Have CVSSv2 rating of 10
- Are exploitable over the network
- Require no user interaction
These are the ones where an attacker can point their exploit code at your vulnerable system and gain unauthorised access.
We’ve seen previously that Tenable have remote checks for 643 critical vulnerabilities, and OpenVAS have remote checks for 450 critical vulnerabilities. Tenable release remote checks for critical vulnerabilities within 1 month of the details being made public 58.4% of the time, but Greenbone release their checks within 1 month 76.8% of the time. So, even though OpenVAS has fewer checks for those critical vulnerabilities, you are more likely to get them within 1 month of the details being made public. Let’s break that down further.
In Figure 10 we can see the absolute number of remote checks released on a given day after a CVE for a critical vulnerability has been published. What you can immediately see is that both Tenable and OpenVAS release the majority of their checks on or before the CVE details are made public; Tenable have released checks for 247 CVEs, and OpenVAS have released checks for 144 CVEs. Then since 2010 Tenable have remote released checks for 147 critical CVEs and OpenVAS 79 critical CVEs on the same day as the vulnerability details were published. The number of vulnerabilities then drops off across the first week and drops further after 1 week, as we would hope for in an efficient time-to-release scenario.
While raw numbers are good, Tenable have a larger number of checks available so it could be unfair to go on raw numbers alone. It’s potentially more important to understand the likelihood that OpenVAS or Tenable will release a check of a vulnerability on any given day after a CVE for a critical vulnerability is released. In Figure 11 we can see that Tenable release 61% their checks on or before the date that a CVE is published, and OpenVAS release a shade under 50% of their checks on or before the day that a CVE is published.
So, since 2010 Tenable has more frequently released their checks before or on the same day as the CVE details have been published for critical vulnerabilities. While Tenable is leading at this point, Greenbone’s community feed still gets a considerable percentage of their checks out on or before day 0.
I thought I’d go another step further and try and see if I could identify any trend in each organisations release delay, are they getting better year-on-year or are their releases getting later? In Figure 12 I’ve taken the mean delay for critical vulnerabilities per year and plotted them. The mean as a metric is particularly influenced by outliers in a data set, so I expected some wackiness and limited the mean to only checks released 180 days prior to a CVE being published and 31 days after a CVE being published. These seem to me like reasonable limits, as anything greater than 6 months prior to CVE details being released is potentially a quirk of the check details and anything after a 1-month delay is less important for us.
What can we take away from Figure 12?
- We can see that between 2011 and 2014 Greenbone’s release delay was better than that of Tenable, by between 5 and 10 days.
- In 2015 things reverse and for 3 years Tenable is considerably ahead of Greenbone by a matter of weeks.
- But, then in 2019 things get much closer and Greenbone seem to be releasing on average about a day earlier than Tenable.
- For both the trendline over an 11-year period is very close, with Tenable marginally beating Greenbone.
- We have yet to have any data for 2021 for OpenVAS checks for critical show-stopper CVEs.
With the larger number of checks, and still being able to release a greater percentage of their remote checks for critical vulnerabilities Tenable could win this category. However, the delay time from 2019 and 2020 going to OpenVAS, and the trend lines being so close, I am going to declare this one a tie. It’s a tie.
The takeaway from this is that both vendors are getting their checks out the majority of the time either before the CVE details are published or on the day the details are published. This is overwhelmingly positive for both scanning solutions. Over time both also appear to be releasing remote checks for critical vulnerabilities more quickly.