Kubernetes basics for Docker users
The aim of this article is to build a high-level mind-map and understanding of concepts like Kubernetes, Helm and cloud-native applications. The assumption is that you have worked already with Docker.
Docker
So you know Docker. Docker container is like a virtual machine but lighter. Why lighter? It does not have operating system inside. It relies on host operating system. Docker adds only applications layer on top. You can run many Docker containers in a single server / virtual machine.
Docker Compose
You may have also heard about Docker Compose. For example when your application consists of .NET Core web server, MySQL database, and Redis cache - you can define 3 separate containers for it. To run all of them in a virtual network you define a docker-compose.yml
file. Then 3 of them can be run with a single docker-compose up
command.
Scaling the application for production
Now let's imagine you want to scale your application. You introduce a load balancer and 2 additional web server containers. You are also adding a RabbitMq container and an instance of a background processing worker. There are also other requirements for production environment:
- containers need to be distributed across many servers
- containers which do not need much resources can be run together in same server to use provisioned servers in a cost-effective way
- when a container is not responding it should be restarted
- when connectivity with container is lost it should be replaced with a new instance
- number of containers should autoscale
- number of servers should autoscale
- new containers added to this environment should be auto-discovered
- it should be possible to mount and share storage volumes in a flexible way
Things are getting complicated. To achieve all that requirements we must write a lot of code to monitor and manage infrastructure. This is called containers orchestration. Or... we can use Kubernetes (k8s) which has all that features and more built-in!
Kubernetes concepts
Pod
Abstraction of a single app. It can have one ore more containers. If containers are tightly coupled they may be placed in same pod. All containers inside a pod share storage volumes. A pod is an unit of deployment and scalability. Each pod has IP address assigned, so there is no need to care about port conflicts.
Node
This is how k8s names physical servers or virtual machines hosting the containers.
Cluster
Set of nodes available for k8s. Example cluster would be 4 nodes and 20 pods running on them, managed dynamically by k8s.
Namespace
All objects within a cluster can have a namespace. It allows to create many virtual clusters inside single cluster. It is useful for example to model many independent environments for staging in a single k8s environment.
Service
Since each pod has it's own IP and pods can be started and shut down, it would be not easy for other pods to keep track of constantly changing IPs. That's why we have Services in k8s. Service groups a set of pods by given labels. Pods may come and go, but as long as labels criteria are matching, all matching pods are automatically tracked. Service has a logical name assigned, so that other pods can use this name to communicate with the pods behind the service. Service routes and load-balances the traffic dynamically to relevant pods.
Services can be also used to point traffic to an endpoint outside k8s cluster. In this case instead of defining a service by providing pod labels selector, it is necessary to define an IP of the service backend.
Ingress
Ingress is used to expose services to outside word via http(s). It can also terminate SSL.
Deployment
Deployment specifies the pod and the number of its replicas that should be run. Deployments controller is responsible for rolling out updated pods (e.g. with updated container image). It starts new pods, shuts down old pods and then keeps monitoring them to make sure that desired number of replicas is run.
StatefulSet
StatefulSets are used to manage containers which contain data. Containers that have data cannot be just removed and replaced as we cannot loose its data. Pods in StatefulSets have sticky identities and persistence storage assigned. Persistent storage is not deleted when pod is deleted.
Worth to mention that in many scenarios managing persistence would be simpler outside Kubernetes cluster. Many cloud providers have sql an noSql as-a-service offerings which usually takes care about things like backups, availability and replication.
Monitoring containers
Each container has a liveness and readiness probe defined. Typically those are HTTP endpoints called by Kubernetes to check if container is healthy. K8s calls the endpoints periodically, e.g. every 10 seconds depending on configuration. When liveness probe fails container is restarted. When readiness probe fails traffic is not anymore routed to this instance. Health check endpoints must be implemented in every service. Simple liveness endpoint could be just returning status code 200. Readiness endpoint could additionally check things like database connection, cache readiness or amount of currently used resources to check if service is really ready to process new requests. When readiness endpoint detects a problem that could be solved by restarting the container, it could potentially switch a variable to force liveness endpoint to fail causing a restart.
Helm
An application targeting Kubernetes is configured in a set of yaml files for deployments, services etc. Those sets of yaml files can grow pretty complex. We also need some versioning tool for them. A common approach is to package all k8s files into a Helm package called a chart. What is being deployed to k8s cluster is a chart.
Cloud native applications
Kubernetes is an operating system for cloud native applications. Cloud native applications are usually designed in microservices architecture and aim to be cloud-agnostic - can run in many public clouds or in a private/hybrid cloud. There is already a lot of predefined Helm charts available in public repositories if you'd like to pull scalable k8s setup e.g. for Cassandra or Prometheus in your cloud-native setup.
Sources
- https://kubernetes.io/docs/home/
- https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes
- https://medium.com/kubecost/understanding-kubernetes-cluster-autoscaling-675099a1db92
- https://helm.sh/docs/
- https://platform9.com/blog/kubernetes-helm-why-it-matters/
- https://thenewstack.io/10-key-attributes-of-cloud-native-applications/