Getting to know Docker… and God knows what else [Part 2]
Following from the last here, we will now look at Docker in production.
Within production there are many problems that arise: automate scheduling, scaling usage, understanding usage, focusing on minimising downtime, updating code, doing A|B testing and more. Production is a bitch.
To save the day there are many solutions. On the diagram below, we know up to the 3rd layer — using docker to create containers.
In this course I am going to learn about Docker Swarm, but I do plan to look at kubernetes (a widely used open source product) in the future.
Lab 1: Generate a swarm
First we jumped over to play-with-docker.com which is a site that gives a UI to be able to create and control multiple nodes (computers) and hence do the same to containers.
Within this site we generate multiple nodes with a click of a button. We do this with the intent of creating a swarm. This is essentially a group of nodes/computers which have some ability to work together and hence pool resources.
We do this by making one a manager node and then connecting it with others.
docker swarm init --advertise-addr eth0
Looking at the documentation,
--advertise-addr seems promote which channel or port the manager will open for ‘workers’ to contact it.
eth0 seems to mean give the first ethernet port, which seems to mean in English, an address. These can be configured so eth0 is a shortcut for typing out the entire port address.
After you have run this command, there is a response from the node saying it is now a manager and gives directions on how to connect other nodes. It’s response will look like this:
Swarm initialized: current node (zfjd0jzs9ikzpcpqzcyqwa9q2) is now a manager.To add a worker to this swarm, run the following command:docker swarm join --token SWMTKN-1-0w342gm5h9wz43d84nr35dkg0ir8xvk8tg6muc7hr9ndtuchi4-3o72xfut0lk6b43a2l5a0v4ec 192.168.0.23:2377To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
You can copy this command and run it in the command lines of the other nodes. It is essential a key/password that lets other nodes you control connect securely, stopping others jump onboard too.
So what are we doing here? We are looping together many devices/computers/nodes so they can work together and hence execute programs better. Read this brilliant post to understand where we are in the chain of terminology: https://medium.com/google-cloud/kubernetes-101-pods-nodes-containers-and-clusters-c1509e409e16
To summarise, a node is one computer or a piece of hardware we want to use. A cluster is a group of these devices. We can attach data-storage to help them run consistently (to stop problems if the one saving all the data goes down).
Now our hardware is inplace, we can add containers to be run. A group of containers would be needed to have an app running. In Kubernetes, we could save this group in a pod and then replicate the pod when demand gets high. You may then have another application overseeing this creation of pods and load balancing. You may then connect this to a port so traffic can get to your application.
Back to it
Now that we have the cluster/swarm of machines working together we should put some containers on to it.
To run containers on a swarm/cluster we need to create a service. A service can act as a load balancer or work coordinator for all the nodes running the containers.
We will do the same as the diagram and install nginx:
docker service create --detach=true --name nginx1 --publish 80:80 --mount source=/etc/hostname,target=/usr/share/nginx/html/index.html,type=bind,ro nginx:1.12
Yeah that monster is one command. So
docker service create is a standard command. We want a service so we need to do this.
--detach=true thought is weird. Detached services mean they don’t get input from your terminal or display outputs on it. This is the preference here as it is simpler.
--name allows use to name it nginx1.
--publish assigns the port to interact with for every node.
--mount is an odd one. Essentially it is an ask for the container to print out the name of the host. Why? We need it later. That’s all.
nginx:1.12 is the definition of what image to use to create the service’s containers.
The rest, I don’t know and have struggled to find out.
Now that we have a service running, how do we make it create new instances of the service?
docker service update --replicas=5 --detach=true nginx1
This one is a bit clearer. Again with
--detach stopping the command line filling up with stuff.
--replicas is self explanatory.
nginx1 is the name of the service.
Now lets check:
docker service ps nginx1
And boom it should print out all their names.
Now presume one of your team has completed the work and created and updated image. How the hell are we going to move that across to all of the replicas on all of the nodes?
docker service update --image nginx:1.13 --detach=true nginx1
Done. “Docker update the service, with the image called nginx version 1.13, called nginx1 and I don’t want to see it say anything on my command line”.
This will automatically go through each node and update one by one, meaning you have no downtime.
Once again, let’s check what we have with
docker service ps nginx1 . This will show the ones it has shutdown and the ones running.
So at certain times various components may fail. We will want to know this.
If we use the command
watch -n 1 docker service ps nginx1 the command
docker service ps nginx1 will be called every second and hence you have a sort of live screen. To get out of this view simply
c on Mac.
If the manager node goes down or freezes or breaks in our example above we may be fucked. No container can speak to another; no container is told to do work.
If we had two and one froze, half our containers would stop working because they wouldn’t be told what to do. This could cause huge issues. Especially if the work being sent is sequential.
For example, if I said start with 5, add 2, minus 1 and divide by 2, you should hopefully have 3. If we had a manager that wasn’t working and some of our instructions got through we wouldn’t. Imagine a story where Manager 2 is frozen but our service doesn’t know and so still sends it work to it and there is a database which the various containers share and hence put the result after doing the work:
- Start with 5 — Manager 1 — Saved in the database
- Add 2 — Manager 1 — Saved in the database
- Minus 1 — Manager 2 —Not saved in the database
- Divide by 2 — Manager 1 — Saved in the database
The result? 3.5. Wrong.
How do we get around this? There is something called Raft . (The link is a nice visual way to explain it, but I am still am struggling.) I will do my best to try to summarise.
What we want is consensus between all managers at the tasks need to do. To do this, you want all the managers to constantly be comparing their records. Clusters that use Raft do this through always electing a leader which becomes the main point of contact for incoming requests and telling its followers of the logs (log replication).
Raft achieves consensus via an elected leader. A server in a “raft” cluster is either a leader or a follower, and can be a candidate in the precise case of an election (leader unavailable). The leader is responsible for log replication to the followers. It regularly informs the followers of its existence by sending a heartbeat message. Each follower has a timeout (typically between 150 and 300 ms) in which it expects the heartbeat from the leader. The timeout is reset on receiving the heartbeat. If no heartbeat is received the follower changes its status to candidate and starts a leader election. ………. [Wikipedia]
What does this all boil down to?
Essentially, Docker Swarms use Raft Consensus. This means you need to use an odd number of leaders, otherwise you can not get consensus.
- Use 3 managers and they will survive 1 breaking
- Use 5 managers and they will survive 2 breaking
- Use 7 managers and they will survive 3 breaking
If you use and even number of managers you spend more resources for the same rigidity. More than 7 and there is a huge amount of chatter going on. Remember also that a manager can look after a huge amount of containers/workers.
So in these two parts we have gone from not knowing what a blue whale means to starting to understand the complexity of manager nodes using raft!
But seriously, we know what containers are, how to create and update them, how docker helps install them (on a developer level and in a production environment updating through a service) and what a service actually is.