Russell Bateman
March 2019
last update:
Consul is a distributed (clusterable) service mesh to connect, secure and configure services across any runtime platform and public or private cloud. Think of it as accurate configuration files on clusterable, shared, steroids. It is more or less a performant key-value store with lots of adaptable technology in the role it fulfills.
Consul is available on Linux Mint from the Software Manager, but like most ready software from Ubuntu and other distributions, it's very much out of date, so you'll want to get the software from HashiCorp's website.
russ@tirion:~/Downloads$ ll consul_1.4.3_linux_amd64.zip -rw-rw-r-- 1 russ russ 34777003 Mar 19 09:19 consul_1.4.3_linux_amd64.zip russ@tirion:~/Downloads$ sudo mv consul /usr/local/bin russ@tirion:~/Downloads$ echo $PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
russ@tirion:~/Downloads$ consul --version
Consul v1.4.3
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking\
to compatible agents)
russ@tirion:~/Downloads$ consul agent -dev
==> Starting Consul agent...
==> Consul agent running!
Version: 'v1.4.3'
Node ID: 'a3007bbc-cff7-ae19-2c86-2fd22e8ad6f0'
Node name: 'tirion'
Datacenter: 'dc1' (Segment: '<all>')
Server: true (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
==> Log data will now stream in as it occurs:
2019/03/19 09:31:07 [DEBUG] agent: Using random ID "a3007bbc-cff7-ae19-2c86-2fd22e8ad6f0" as node ID
2019/03/19 09:31:07 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:a3007bbc-cff7-ae19-2c86-2fd22e8ad6f0 Address:127.0.0.1:8300}]
2019/03/19 09:31:07 [INFO] raft: Node at 127.0.0.1:8300 [Follower] entering Follower state (Leader: "")
2019/03/19 09:31:07 [INFO] serf: EventMemberJoin: tirion.dc1 127.0.0.1
2019/03/19 09:31:07 [INFO] serf: EventMemberJoin: tirion 127.0.0.1
2019/03/19 09:31:07 [INFO] consul: Handled member-join event for server "tirion.dc1" in area "wan"
2019/03/19 09:31:07 [INFO] consul: Adding LAN server tirion (Addr: tcp/127.0.0.1:8300) (DC: dc1)
2019/03/19 09:31:07 [DEBUG] agent/proxy: managed Connect proxy manager started
2019/03/19 09:31:07 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
2019/03/19 09:31:07 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
2019/03/19 09:31:07 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp)
2019/03/19 09:31:07 [INFO] agent: started state syncer
2019/03/19 09:31:07 [INFO] agent: Started gRPC server on 127.0.0.1:8502 (tcp)
2019/03/19 09:31:08 [WARN] raft: Heartbeat timeout from "" reached, starting election
2019/03/19 09:31:08 [INFO] raft: Node at 127.0.0.1:8300 [Candidate] entering Candidate state in term 2
2019/03/19 09:31:08 [DEBUG] raft: Votes needed: 1
2019/03/19 09:31:08 [DEBUG] raft: Vote granted from a3007bbc-cff7-ae19-2c86-2fd22e8ad6f0 in term 2. Tally: 1
2019/03/19 09:31:08 [INFO] raft: Election won. Tally: 1
2019/03/19 09:31:08 [INFO] raft: Node at 127.0.0.1:8300 [Leader] entering Leader state
2019/03/19 09:31:08 [INFO] consul: cluster leadership acquired
2019/03/19 09:31:08 [INFO] consul: New leader elected: tirion
2019/03/19 09:31:08 [INFO] connect: initialized primary datacenter CA with provider "consul"
2019/03/19 09:31:08 [DEBUG] consul: Skipping self join check for "tirion" since the cluster is too small
2019/03/19 09:31:08 [INFO] consul: member 'tirion' joined, marking health alive
2019/03/19 09:31:08 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
2019/03/19 09:31:08 [INFO] agent: Synced node info
2019/03/19 09:31:08 [DEBUG] agent: Node info in sync
2019/03/19 09:31:10 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
2019/03/19 09:31:10 [DEBUG] agent: Node info in sync
Examine all the nodes in the cluster. There is just one node, obviously. Consul offers an HTTP-based API (port 8500) and also a DNS interface (port 8600).
russ@tirion:~/Downloads$ consul members Node Address Status Type Build Protocol DC Segment tirion 127.0.0.1:8301 alive server 1.4.3 2 dc1russ@tirion:~/Downloads$ curl localhost:8500/v1/catalog/nodes [ { "ID": "a3007bbc-cff7-ae19-2c86-2fd22e8ad6f0", "Node": "tirion", "Address": "127.0.0.1", "Datacenter": "dc1", "TaggedAddresses": { "lan": "127.0.0.1", "wan": "127.0.0.1" }, "Meta": { "consul-network-segment": "" }, "CreateIndex": 9, "ModifyIndex": 10 } ] russ@tirion:~/Downloads$ dig @127.0.0.1 -p 8600 tirion.node.consul ; <<>> DiG 9.11.3-1ubuntu1.5-Ubuntu <<>> @127.0.0.1 -p 8600 tirion.node.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58074 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;tirion.node.consul. IN A ;; ANSWER SECTION: tirion.node.consul. 0 IN A 127.0.0.1 ;; ADDITIONAL SECTION: tirion.node.consul. 0 IN TXT "consul-network-segment=" ;; Query time: 0 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Tue Mar 19 09:36:09 MDT 2019 ;; MSG SIZE rcvd: 99
Stopping the agent is done gracefully using Ctrl-C:
.
.
.
2019/03/19 09:38:08 [DEBUG] consul: Skipping self join check for "tirion" since the cluster is too small
2019/03/19 09:38:08 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically
2019/03/19 09:38:08 [DEBUG] agent: Node info in sync
^C 2019/03/19 09:39:06 [INFO] agent: Caught signal: interrupt
2019/03/19 09:39:06 [INFO] agent: Graceful shutdown disabled. Exiting
2019/03/19 09:39:06 [INFO] agent: Requesting shutdown
2019/03/19 09:39:06 [WARN] agent: dev mode disabled persistence, killing all proxies since we can't recover them
2019/03/19 09:39:06 [DEBUG] agent/proxy: Stopping managed Connect proxy manager
2019/03/19 09:39:06 [INFO] consul: shutting down server
2019/03/19 09:39:06 [WARN] serf: Shutdown without a Leave
2019/03/19 09:39:06 [WARN] serf: Shutdown without a Leave
2019/03/19 09:39:06 [INFO] manager: shutting down
2019/03/19 09:39:06 [INFO] agent: consul server down
2019/03/19 09:39:06 [INFO] agent: shutdown complete
2019/03/19 09:39:06 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (tcp)
2019/03/19 09:39:06 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (udp)
2019/03/19 09:39:06 [INFO] agent: Stopping HTTP server 127.0.0.1:8500 (tcp)
2019/03/19 09:39:06 [INFO] agent: Waiting for endpoints to shut down
2019/03/19 09:39:06 [INFO] agent: Endpoints down
2019/03/19 09:39:06 [INFO] agent: Exit code: 1
Now we're getting somewhere useful. We're still following the getting-started instructions here...
russ@tirion:~/Downloads$ sudo ls -alg -d /etc/consul* ls: cannot access '/etc/consul*': No such file or directory russ@tirion:~/Downloads$ sudo mkdir /etc/consul.d russ@tirion:~/Downloads$ sudo bash root@tirion:~/Downloads# cd /etc/consul.d/ root@tirion:/etc/consul.d# echo '{"service": {"name": "web", "tags": ["rails"], "port": 80}}' > /etc/consul.d/web.json root@tirion:/etc/consul.d# ll total 20 drwxr-xr-x 2 root root 4096 Mar 19 09:53 ./ drwxr-xr-x 144 root root 12288 Mar 19 09:42 ../ -rw-r--r-- 1 root root 60 Mar 19 09:53 web.json root@tirion:/etc/consul.d# consul agent -dev -config-dir=/etc/consul.d ==> Starting Consul agent... ==> Consul agent running! Version: 'v1.4.3' Node ID: '07f0fffe-08c5-7ca5-9de0-55729bd459b6' Node name: 'tirion' Datacenter: 'dc1' (Segment: '') Server: true (Bootstrap: false) Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600) Cluster Addr: 127.0.0.1 (LAN: 8301, WAN: 8302) Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false ==> Log data will now stream in as it occurs: 2019/03/19 09:56:13 [DEBUG] agent: Using random ID "07f0fffe-08c5-7ca5-9de0-55729bd459b6" as node ID 2019/03/19 09:56:13 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:07f0fffe-08c5-7ca5-9de0-55729bd459b6 Address:127.0.0.1:8300}] 2019/03/19 09:56:13 [INFO] raft: Node at 127.0.0.1:8300 [Follower] entering Follower state (Leader: "") 2019/03/19 09:56:13 [INFO] serf: EventMemberJoin: tirion.dc1 127.0.0.1 2019/03/19 09:56:13 [INFO] serf: EventMemberJoin: tirion 127.0.0.1 2019/03/19 09:56:13 [INFO] consul: Adding LAN server tirion (Addr: tcp/127.0.0.1:8300) (DC: dc1) 2019/03/19 09:56:13 [DEBUG] agent/proxy: managed Connect proxy manager started 2019/03/19 09:56:13 [WARN] agent/proxy: running as root, will not start managed proxies 2019/03/19 09:56:13 [INFO] consul: Handled member-join event for server "tirion.dc1" in area "wan" 2019/03/19 09:56:13 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp) 2019/03/19 09:56:13 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp) 2019/03/19 09:56:13 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp) 2019/03/19 09:56:13 [INFO] agent: started state syncer 2019/03/19 09:56:13 [INFO] agent: Started gRPC server on 127.0.0.1:8502 (tcp) 2019/03/19 09:56:13 [WARN] raft: Heartbeat timeout from "" reached, starting election 2019/03/19 09:56:13 [INFO] raft: Node at 127.0.0.1:8300 [Candidate] entering Candidate state in term 2 2019/03/19 09:56:13 [DEBUG] raft: Votes needed: 1 2019/03/19 09:56:13 [DEBUG] raft: Vote granted from 07f0fffe-08c5-7ca5-9de0-55729bd459b6 in term 2. Tally: 1 2019/03/19 09:56:13 [INFO] raft: Election won. Tally: 1 2019/03/19 09:56:13 [INFO] raft: Node at 127.0.0.1:8300 [Leader] entering Leader state 2019/03/19 09:56:13 [INFO] consul: cluster leadership acquired 2019/03/19 09:56:13 [INFO] consul: New leader elected: tirion 2019/03/19 09:56:13 [INFO] connect: initialized primary datacenter CA with provider "consul" 2019/03/19 09:56:13 [DEBUG] consul: Skipping self join check for "tirion" since the cluster is too small 2019/03/19 09:56:13 [INFO] consul: member 'tirion' joined, marking health alive 2019/03/19 09:56:14 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically 2019/03/19 09:56:14 [INFO] agent: Synced service "web" 2019/03/19 09:56:14 [DEBUG] agent: Node info in sync 2019/03/19 09:56:14 [DEBUG] agent: Service "web" in sync 2019/03/19 09:56:14 [DEBUG] agent: Node info in sync 2019/03/19 09:56:16 [DEBUG] agent: Skipping remote check "serfHealth" since it is managed automatically 2019/03/19 09:56:16 [DEBUG] agent: Service "web" in sync 2019/03/19 09:56:16 [DEBUG] agent: Node info in sync
Querying services using the DNS API. As may have been guessed from what's gone before, the DNS name for a node is node-name.node.consul. The DNS name for a service is service-name.service.consul.
root@tirion:/etc/consul.d# dig @127.0.0.1 -p 8600 web.service.consul
; <<>> DiG 9.11.3-1ubuntu1.5-Ubuntu <<>> @127.0.0.1 -p 8600 web.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32369
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;web.service.consul. IN A
;; ANSWER SECTION:
web.service.consul. 0 IN A 127.0.0.1
;; ADDITIONAL SECTION:
web.service.consul. 0 IN TXT "consul-network-segment="
;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Mar 19 10:43:52 MDT 2019
;; MSG SIZE rcvd: 99
An A record was returned with the IP address of the node making the service available. (A records only hold IP addresses.) You can use the DNS API to give this information as an SRV record...
root@tirion:/etc/consul.d# dig @127.0.0.1 -p 8600 web.service.consul SRV
; <<>> DiG 9.11.3-1ubuntu1.5-Ubuntu <<>> @127.0.0.1 -p 8600 web.service.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19275
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 3
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;web.service.consul. IN SRV
;; ANSWER SECTION:
web.service.consul. 0 IN SRV 1 1 80 tirion.node.dc1.consul.
;; ADDITIONAL SECTION:
tirion.node.dc1.consul. 0 IN A 127.0.0.1
tirion.node.dc1.consul. 0 IN TXT "consul-network-segment="
;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Mar 19 10:46:34 MDT 2019
;; MSG SIZE rcvd: 141
...and another to filter services (we've only one here) by their tags. Our service's (only) tag is rails, specified by /etc/consul.d/web.jason:
root@tirion:/etc/consul.d# dig @127.0.0.1 -p 8600 rails.web.service.consul ; <<>> DiG 9.11.3-1ubuntu1.5-Ubuntu <<>> @127.0.0.1 -p 8600 rails.web.service.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44674 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;rails.web.service.consul. IN A ;; ANSWER SECTION: rails.web.service.consul. 0 IN A 127.0.0.1 ;; ADDITIONAL SECTION: rails.web.service.consul. 0 IN TXT "consul-network-segment=" ;; Query time: 0 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Tue Mar 19 10:50:50 MDT 2019 ;; MSG SIZE rcvd: 105
Back to the HTTP API, we can query services too. And we can limit our query to return only "healthy" (passing) services:
root@tirion:/etc/consul.d# curl http://localhost:8500/v1/catalog/service/web [ { "ID": "07f0fffe-08c5-7ca5-9de0-55729bd459b6", "Node": "tirion", "Address": "127.0.0.1", "Datacenter": "dc1", "TaggedAddresses": { "lan": "127.0.0.1", "wan": "127.0.0.1" }, "NodeMeta": { "consul-network-segment": "" }, "ServiceKind": "", "ServiceID": "web", "ServiceName": "web", "ServiceTags": [ "rails" ], "ServiceAddress": "", "ServiceWeights": { "Passing": 1, "Warning": 1 }, "ServiceMeta": {}, "ServicePort": 80, "ServiceEnableTagOverride": false, "ServiceProxyDestination": "", "ServiceProxy": {}, "ServiceConnect": {}, "CreateIndex": 10, "ModifyIndex": 10 } ] root@tirion:/etc/consul.d# curl http://localhost:8500/v1/catalog/service/web?passing (ibid)
To change or update a service, you can modify its configuration file, then send SIGHUP to Consul:
root@tirion:/etc/consul.d# ps -ef | grep [c]onsul root 24034 24003 0 09:56 pts/0 00:00:24 consul agent -dev -config-dir=/etc/consul.d root@tirion:/etc/consul.d# kill -l | grep SIGHUP 1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP root@tirion:/etc/consul.d# kill -1 24034 . . . 2019/03/19 10:58:30 [DEBUG] agent: Node info in sync 2019/03/19 10:59:13 [DEBUG] consul: Skipping self join check for "tirion" since the cluster is too small 2019/03/19 10:59:27 [DEBUG] manager: Rebalanced 1 servers, next active server is tirion.dc1 (Addr: tcp/127.0.0.1:8300) (DC: dc1) 2019/03/19 10:59:37 [INFO] agent: Caught signal: hangup 2019/03/19 10:59:37 [INFO] agent: Reloading configuration... 2019/03/19 10:59:37 [DEBUG] agent: removed service "web" 2019/03/19 10:59:37 [INFO] agent: Synced service "web" 2019/03/19 10:59:37 [DEBUG] agent: Node info in sync 2019/03/19 10:59:37 [DEBUG] agent: Service "web" in sync 2019/03/19 10:59:37 [DEBUG] agent: Node info in sync
To do this demonstration, you'll need to install Linux socat. We're going to use it as a service to demonstrate communications with Consul.
russ@tirion:~$ which socat russ@tirion:~$ sudo apt-get install socat Reading package lists... Done Building dependency tree Reading state information... Done The following NEW packages will be installed: socat 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. Need to get 342 kB of archives. After this operation, 1,034 kB of additional disk space will be used. Get:1 http://mirrors.xmission.com/ubuntu bionic/main amd64 socat amd64 1.7.3.2-2ubuntu2 [342 kB] Fetched 342 kB in 1s (651 kB/s) Selecting previously unselected package socat. (Reading database ... 308541 files and directories currently installed.) Preparing to unpack .../socat_1.7.3.2-2ubuntu2_amd64.deb ... Unpacking socat (1.7.3.2-2ubuntu2) ... Setting up socat (1.7.3.2-2ubuntu2) ... Processing triggers for doc-base (0.10.8) ... Processing 1 added doc-base file... Registering documents with scrollkeeper... Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
socat will accept TCP connections and echo back any data sent on them. Here's what happens when we run socat and talk to it using UNIX nc, a utility that makes arbitrary TCP and UDP connections, then listens to them. Using nc, we type a greeting to socat which is echoed back to us. In socat's console, we see it receiving and echoing the greeting:
Console on which we launch socat:
russ@tirion:~$ socat -v tcp-l:8181,fork exec:"/bin/cat"
> 2019/03/19 11:08:02.857124 length=13 from=0 to=12
Hello socat!
< 2019/03/19 11:08:02.857313 length=13 from=0 to=12
Hello socat!
|
Second console using nc to send greeting:
russ@tirion:~$ nc 127.0.0.1 8181 Hello socat! Hello socat! |
Registering the service with Consul and connecting. First, let's create a configuration file, socat.json, from the command line:
root@tirion:/etc/consul.d# cat <<EOF | tee /etc/consul.d/socat.json
> {
> "service" :
> {
> "name" : "socat",
> "port" : 8181,
> "connect" : { "sidecar_service" : {} }
> }
> }
> EOF
{
"service" :
{
"name" : "socat",
"port" : 8181,
"connect" : { "sidecar_service" : {} }
}
}
Then, with this new (could have been a changed) service, we reload Consul either by sending SIGHUP as we did before or by using a Consul command forcing it to re-read configuration:
root@tirion:/etc/consul.d# consul reload
Configuration reload triggered
(Be sure to kill nc in that other console because if you do not, you'll get this in Consul's output:
2019/03/19 11:22:50 [WARN] agent: Check "service:socat-sidecar-proxy:1" socket connection failed: dial tcp 127.0.0.1:21000: \ connect: connection refused
As noted in the getting-started guide, the sidecar service registration is just telling Consol that a proxy is/will be running, but Consol doesn't run it for you. Here's the command which uses the socat configuration we just set up. After doing this, you'll see happy messages in Consul's output.
russ@tirion:~$ consul connect proxy -sidecar-for socat
==> Consul Connect proxy starting...
Configuration mode: Agent API
Sidecar for ID: socat
Proxy ID: socat-sidecar-proxy
==> Log data will now stream in as it occurs:
2019/03/19 11:25:14 [INFO] Proxy loaded config and ready to serve
2019/03/19 11:25:14 [INFO] TLS Identity: spiffe://88f508f8-3a3b-ee7d-8c32-3c4942baa537.consul/ns/default/dc/dc1/svc/socat
2019/03/19 11:25:14 [INFO] TLS Roots : [Consul CA 7]
2019/03/19 11:25:14 [INFO] public listener starting on 0.0.0.0:21000
Now, let's connect the web service (we first set up) with socat (set up second):
russ@tirion:~$ consul connect proxy -service web -upstream socat:9191
==> Consul Connect proxy starting...
Configuration mode: Flags
Service: web
Upstream: socat => :9191
Public listener: Disabled
==> Log data will now stream in as it occurs:
2019/03/19 11:46:36 [INFO] 127.0.0.1:9191->service:default/socat starting on 127.0.0.1:9191
2019/03/19 11:46:36 [INFO] Proxy loaded config and ready to serve
2019/03/19 11:46:36 [INFO] TLS Identity: spiffe://88f508f8-3a3b-ee7d-8c32-3c4942baa537.consul/ns/default/dc/dc1/svc/web
2019/03/19 11:46:36 [INFO] TLS Roots : [Consul CA 7]
This part of the demonstration is very unclear. socat is echoing, but it's uncertain what port number, 8181 or 9191, this happens on. This whole connection section is very muddy because it doesn't say what executables to terminate before proceeding to the next demonstration and which ones to leave up and running. I end up with plenty of pending (running) commands to socat, nc and consul, errors coming out of the original consul instance and others as well, perhaps because this or that new console should have been terminated before moving on.