Q&A with Kelsey Hightower Part 3
As some of you may already know Kelsey Hightower hosted Office Hours at Netris booth at KubeCon Europe 2023. We talked about the future of infrastructure and when to and when not to use Kubernetes, the future of Kubernetes and much more.
We are presenting you the third part of the interview which covers “technology talks”. Let’s dive in!
Question: Do you see K3s as the smaller version of Kubernetes. Do you see that it’s getting more popularity?
Kelsey: I mean, you use what’s necessary. If you wanna out Kubernetes in a retail store, and you only have a computer this big (showing with hands), there your limits of memory and CPU. What you gonna do? You gonna invent a new tool for that? No. You gonna take Kubernetes, and you gonna strip out the parts you don’t need, you gonna use it as is. And one day, someone is gonna come around and say, “Hey, we can make a simpler implementation of that idea, right?”
There was a company called Redpanda they rewrote Kafka in C++ to make it fast. You don’t need JVM, you don’t need Java, you don’t need ZooKeeper, but you keep the Kafka protocol. We see this all the time. We have people who take something, shrink it down, remove the parts we don’t need, and then we get to move forwards. So, yeah, it’s again solution for now. You gonna make money for another 10 to 20 years, don’t you worry. But you gotta ask yourself, you should pay attention to people looking to replace it. And then you’ll make money with the new thing.
Question: So, I think you guys saying that you can kind of get the same experience of VPC like they have in AWS on our own data centers, right? What do you mean? For example, I think one of the great things about EKS in Amazon is that you can set up a single VPC for prospect, and it spans multiple availability zones, and it’s great for high availability, you know, like one zone goes down and you have it, like the simplest experience of one single VPC, completely isolated IP address space spanning several zones. Now, we are a Datacenter called Net Zero, and we kind of operate two different data centers. It would be really nice if we could set up our own Kubernetes cluster with a single concept, each cluster having its own isolated IP Address spaces on VPC. But spanning availability zones, spanning data centers…
Kelsey: So I don’t work there, but I am an adviser for the Netris team. You know, I’m at Google Cloud, and it had a lot of this technology and software before, and one thing that I thought was missing was the concept of VPC, right? Remember when people were talking about automated networking and SDNs, there were so many moving parts, but we go to the cloud. This is one concept of VPC, and everything you kind of asked for is kind of what they do now. They have a concept of VPC that can span multiple things and, if necessary striped VPNs across various networks BGP inside of one Data Center, and in that mode, ideally they bring a lot of concepts down to that simplest of abstractions. You have VPC, networks, subnets, and then you assign them to the various sites, and when you create your clusters, basically, you’re attached to one of those things, and that becomes your IP space. So very similar to what we have in various cloud providers. I mean, I think their trick inside of a cloud is, and it’s probably also true when you think about the world as a flat network if you ever pay the cloud bill and your seeing traffic between availability zones, it’s going to get really expensive, so you got to be careful with the kind of flat abstraction, but the capability so if you stepped aside, founder of the company here Alex and the team and they could dive deeper into, and then you also have their information in here, and you reached out to the Netris guys, but this is exactly what they’ve been working on is this concept of a VPC Anywhere.
Question: We keep seeing that managing Kubernetes clusters in-house is hard, and market is seeing more and more flavors of managed Kubernetes offerings. Do we need this many flavors of Kubernetes management tools and offerings? What is your take on this?
Kelsey: So we talked about the historical context earlier… In the earlier question. So, historically remember Linux, people used to buy their own distribution of Linux, you know, you rolled a Kernel, you roll your user end, and then RPM starts to become a popular thing. Instead of trying to make install all these packages, it’s nice to have pre-built packages that work together. And so, when we go into that world, you start to think of distros Red Hat is optimized not just for giving you a good Linux distro but making sure everything works for the hardware you have. When I think about Kubernetes in the cloud, that’s job number one if I’m in GCP: integration at the networking stack, GCE, IAM, and GPUs. I don’t see a difference for on-prem. If you’re VMware, you need a decent distro that integrates deeply with VMware. So, we are definitely, in years, I used to talk about this, you will be in Kubernetes distro land soon enough. But I think in early days, you had no choice but to do so it yourself. There was always something missing, right? If you go from GitHub to Datacenter, there is no drivers for your networking thing. No one knows what kind of load balancer you’re using, so Ingress doesn’t have an implementation. But I think these days, we’re getting really close to just everyone picking their favorite Kubernetes distro.
Attendee: Kind of bringing the additional features you want.
Kelsey: Now, you know, “kubectl apply” becomes a new yum install. Whatever is missing, you bring it yourself, and then eventually, those distros will rebase. So I look at VMware now as a distro. So, I think that will be the future because most people have no need to be trying to manage because you saw that Microsoft has now a Kubernetes LTS. We knew that was going to eventually happen. If you’re a big Enterprise like dude, upgrading Kubernetes two or three times a year is insane because, with most upgrades, you’re still doing almost the exact same thing you were doing before and taking all this risk while doing so. So it’s no surprise that a Red Hat model has found its way into the Kubernetes world.
Question: You also talked about the Google session today. What’s the most important or interesting thing you learned? Most surprising thing you learned?
Kelsey: I don’t know if I learned much because I work there, so I kind of knew what they were doing, but I think the thing that resonated most with the audience was this was an old pattern that Google chose to produce…? So we’ve had cloud shell for a long time, right? This concept that spinning up a jump box in your VPC just to do these things that’s what most used to. Jump Box, put the whole team’s SSH keys there, and you put all your tools, and everyone uses it. And we looked at that pattern years ago and said, “Alright, we need cloud shell.” So VM comes up; at the time, we were doing it for free, and then we have all the right IAM credentials per person. So, no more everyone sharing one jump box. That machine could be tied to your IAM, but it wasn’t very flexible, right? You only got the one that we have, and every time you blow it away, you have to start over. When I was at CoreOS, we had this concept of a toolbox, so when you SSH at CoreOS, we had the container of your choosing become your UserLand. So, basically, SSH is mapped to like Docker pull and then map, and now you’re inside of the container, at least the user space. And so Cloud Shell started doing the same thing, and cloud workstation is basically taking that old pattern and making it an official product. So now we let the customer say, “Hey, we want to use this image in that VM and then only charge me when I’m using it, and then I want a bigger set of CPU and memory add it on demand,” so it’s an old pattern. And te we added IDE because a customer said, “ Hey, I want like an IDE there,” but it’s very far from what (porter?) would do. It’s like a whole end-to-end developer experience, whereas I think we are developing from a cloud shell and optimizing for lock down my VPC, lock down my console, and no one can do anything. And so instead of having people do a jump box, that’s where I think that workstation is. It’s more like a desktop Linux in the cloud; there happens to be a little IDE, but I think it’s really far away from what you see dedicated end-to-end dev environment instead of thinking about the bill process and everything else. Maybe it trends that way, but that was the biggest takeaway: old pattern turn into a product, and there you have it.
Question: What’s your experience with multi-tenancy or compared to separate clusters per tenant, maybe within your company…
Kelsey: We have hundreds of thousands of customers in the cloud, and we’ve seen it all, right?
Attendee: We just learned about virtual clusters…
Kelsey: Yeah, like people are trying to do that. If you look at Kubernetes, its security domain isn’t great at the API level, so some people are trying to do virtual API servers and try to fix some of this.
Attendee: And so is our experience as well. When we deploy multi-tenant clusters, we always have to implement role based access, and we have to add extra policies, extra cloud permissions… Everything becomes complicated, but it’s easier because you have one cluster, so it’s which part you choose or what you prefer…
Kelsey: I mean, look, to me, giving people access to the cluster is like giving people SSH access to the server. It’s the same concept. So in, many times, it is the first thing you do. “Hey, we’re using Kubernetes now, we don’t have any tooling, and we don’t have a process, so here is Kubectl.” When you buy your server first, before you have Puppet, chef, ansible and say we don’t have anything, so here’s your SSH key. And so now the goal is to remove the need to mess with a cluster directly, so some people want to install tools like Lens or Rancher and at least get some visualization to say, “Hey, if you want to click around and see, you don’t need Kubectl for that, ideally.
Attendee: We don’t need access to the infrastructure.
Kelsey: Exactly. But until you get those features, people are gonna ask for it. So trying to lock down a cluster to do multi-tenancy via Kubectl. It’s a lot of work, and you’re asking for it. Any bugs in the namespace, you going to break up the namespace, and now you own the whole cluster. So, like it with the VM, if I make a mistake, you own my VM. If you make a big mistake and give my VM, IAM, to my whole account, but this is the situation you ran in Kubernetes world. So the namespace is a very weak security boundary. So I think that’s the problem with multi-tenancy in Kubernetes. It’s only so strong.
Attendee: It depends a bit on how hard the multi-tenancy needs to be, I guess. If you are quite working the same but don’t bring access to each other, it’s fine, but if you need hard multi-tenancy…..
Kelsey: If you have a customer that says, “ I can have no other customer access anything from me,” and you are going to put it in writing, it probably be their own cluster. If you have a customer that just wants multi-tenancy on the app layer, well, everyone’s going through the same code to the same database. Kubernetes being shared probably isn’t a big deal because the code also shared. But when I need hard boundaries, Kubernetes is not enough type separation notice for me.
Kelsey: Yeah, multi-tenancy is a tough one. I think Kubernetes is getting better at the runtime multi-tenancy, where you say, “Hey, run this app using something like Firecracker or gVisor.” You can get pretty dice isolation between the workloads.
Attendee: Well, that requires extra work from an administrator…
Kelsey: Well, I mean, the good news is Kubernetes abstractions are pretty good. Like, we are using GKE at Google, for example. When you create your cluster, you can say, “I would like to runtime be gVisor,” and then that will be installed for you as part of the distribution. So when you create your pod, you can say, “Security code is true,” and then it will launch the container under gVisor instead of just the Linux kernel, and you have this additional layer. My guess is for other distros, you can say, “Firecracker,” which gives you a lightweight VM, can do the same thing so now you have at least harder tenancy at the OS level. But again, if I’m the app and you give me the wrong Kubernetes are back, then I’m gonna own the whole cluster. So you just got to be careful, but at certain levels, there is multi-tenancy, and other levels is just a little bit weak.
Question: A question regarding Netris. I am curious, what could you see in terms of Netris. What is unique about Netris in your opinion?
Kelsey: When I meet Founders and as an adviser that also sits on the cap table, you want to be involved in a way you can help. If I see the perfect product, they don’t need my help, right? Maybe I can speak at their conference or something. But when you see something where you can add some help. So when I met this team, and you look at their product, say, “Listen, there will always be a need for networking tools, especially on-premise incapacity. But could you build VPC anywhere? Can you take the concepts and abstractions from the cloud and put them somewhere else?” And when I’m talking to Alex, he’s like, “We could do that. We would have to change a few things. But we could do that.” So how do I find out where the gap is and if it’s possible? You just use the product. So as I’m clicking through it, he gave me a pretty good demo, most terrible at demos. Most people are terrible at telling their story about how their product works. But he gave me this nice demo, and he was like, “Hey, you know, here is Netris. You can get white box switches, put it in here, and we can create L3 blah blah blah…”
Like to me, it’s like, I’ve seen that a million times, and that’s okay, it’s cool, but not that interesting. But then he created a Kubernetes cluster, and he added the integration, and it was working. It wasn’t messing around with ten thousand things, and he also had metalLB or their own load balancer, working as well. And I was like, “Oh, that’s interesting because these are the two most complex problems people have when they just have the bare metal and just Kube. No VMware, no NSX, none of that, nothing else. These are the two big (batches?).” And I looked at it and said, “Alright, you have the technology piece, but then you have to have the idea.” And so we brainstorm together, and we come up with this: VPC Anywhere. And so as someone that has cloud experience here is VPC, you kind of already know what’s going to happen. You expect that there is gonna be a much simpler interface. We’re not talking about BGP, L2, L3, we are not talking about any of that. Here is like, if I create a VPC, I’m assume that you’re gonna figure out my subnet allocations. I’m assuming that you have an easy way to attach things to that VPC. And I’m going to assume that when you have a load balancer, you can connect two things that are running into VPC to the load balancer. They have those things. So that’s the thing that interested me: the team’s ability to execute. When I was at CoreOS, every couple of weeks, we saw something in the container space, and we had an idea: CNI, right? Container networking is not that great in Docker, we need something a little more robust. Three days, we have the first prototype, by Friday, we go talk about it at a meetup. That to me, is interesting. And so, there are some startups that have really good ideas but no execution speed, but when you see someone that has good execution speed, you know. For a person like me that has good ideas: “boy boy boy.” So that’s the thing that I look for when I’m advising startups.
Attendee: Do you think there are many startups that have on-prem solutions only and something that think about SaaS and don’t think directly of having a VPC kind of solutions. Do you think this could be a good product for them?
Kelsey: Well, you’re saying, if you’re on-prem and you have… Let’s say your Rancher before they got acquired. And you say, “Hey, look how cool our Kubernetes distro is,” And that customer now has to go and figure out the networking layer. That customer is like, “I have to go buy Juniper switches.” Have you ever tried to order Juniper switches? Is not easy it “call us,” And then you got to get a license to like even manage this thing. And that license is only good for a couple of years and then is unsupported and you can’t upgrade to the next version of Switch OS, whatever they have right? And so for a customer to be like, “Hey, I want to do Colo. I want to do Kubernetes. But I don’t understand anything around ordering network access top-of-rack switch, getting the leaf and spine set up.” This is too complicated, and it deters people from wanting to do this. And I think this is where VMware has succeeded. You know, VMware just says, “Look, get some machines, and install VMware, and we will take care of networking, storage, and all the whole layers.” But Vmware only gives you the VM. So I think, for all the people in this space, if you’re MinIO and you’re giving people S3 abstraction. Well, S3 abstraction still needs a network. So, I think now we start look at the puzzle, we used to say “Openstack or (Basts?)” Now people are like, “We don’t care about OpenStack because I don’t really want VMs as my endgame. I actually want to run apps.” And right now containers for a lot of people are the closest things to apps. And if you use GKE or AKS, or EKS, the last thing you want to go do is revisit the networking concept. And so I do think if you’re going to be on-prem, you want some of the same abstractions that you had in the cloud. So I’m looking for less NetApp and more MinIO because I probably want S3 interface. Maybe not NFS so much. So yeah, I think all of these concepts: from networking to storage to compute, I think we want the same abstractions from the cloud.
Attendee: I talked to a lot of companies and sometimes they say, “We only have our solution on-prem.” Sometimes progress is to want something that is a bit simpler, right? So, trying the VPC route.
Kelsey: Yeah, I think if you need some networking needs…. I mean, they’ve done integration with what’s the colo? They have like a bare metal offering. Equinix. Equinix is nice. You go in, you have a colo, you click it you get some bare metal, but then you have this like L2 network and then we’re going to do with it? And so within, they have a plug-in we can add in the VPC layer, and then it feels familiar again. “Oh, I have VPCs and then multi-sites. Sites now look like zones and regions.” And it’s like, “Ph, I know how to deal with that.” Versus gateways, subnets, and then gets real confusing. Trying to create your own tunnels, watch your VPN solution. Just squash all of that to just one concept, and ideally, the tool needs to deal with the abstraction underneath. That is how GCP works. There’s a lot of moving parts inside of a data center in Google. But what we present to users are VPCs. So, similar concept.
In case you missed, part 1 of the interview covered career talks – you can find the post here.
Part 2 was about “future talks”, which covered interesting topics and predictions – find the post here.