Set up Private DNS-over-TLS/HTTPS

Domain Name System (DNS) is a crucial part of Internet infrastructure. It is responsible for translating a human-readable, memorizable domain (like leaseweb.com) into a numeric IP address (such as 89.255.251.130).

In order to translate a domain into an IP address, your device sends a DNS request to a special DNS server called a resolver (which is most likely managed by your Internet provider). The DNS requests are sent in plain text so anyone who has access to your traffic stream can see which domains you visit.

There are two recent Internet standards that have been designed to solve the DNS privacy issue:

  • DNS over TLS (DoT):
  • DNS over HTTPS (DoH)

Both of them provide secure and encrypted connections to a DNS server.

DoT/DoH feature compatibility matrix:

Firefox Chrome Android 9+ iOS 14+
DoT
DoH

iOS 14 will be released later this year.

In this article, we will setup a private DoH and DoT recursor using pihole in a docker container, and dnsdist as a DNS frontend with Letsencrypt SSL certificates. As a bonus, our DNS server will block tracking and malware while resolving domains for us.

Installation

In this example we use Ubuntu 20.04 with docker and docker-compose installed, but you can choose your favorite distro (you might need to adapt a bit).

You may also need to disable systemd-resolved because it occupies port 53 of the server:

# Check which DNS resolvers your server is using:
systemd-resolve --status
# look for "DNS servers" field in output

# Stop systemd-resolved
systemctl stop systemd-resolved

# Then mask it to prevent from further starting
systemctl mask systemd-resolved

# Delete the symlink systemd-resolved used to manage
rm /etc/resolv.conf

# Create /etc/resolv.conf as a regular file with nameservers you've been using:
cat <<EOF > /etc/resolv.conf
nameserver <ip of the first DNS resolver>
nameserver <ip of the second DNS resolver>
EOF

Install dnsdist and certbot (for letsencrypt certificates):

# Install dnsdist repo
echo "deb [arch=amd64] http://repo.powerdns.com/ubuntu focal-dnsdist-15 main" > /etc/apt/sources.list.d/pdns.list
cat <<EOF > /etc/apt/preferences.d/dnsdist
Package: dnsdist*
Pin: origin repo.powerdns.com
Pin-Priority: 600
EOF
curl https://repo.powerdns.com/FD380FBB-pub.asc | apt-key add -

apt update
apt install dnsdist certbot

Pihole

Now we create our docker-compose project:

mkdir ~/pihole
touch ~/pihole/docker-compose.yml

The contents of docker-compose.yml file:

version: '3'
services:
  pihole:
    container_name: pihole
    image: 'pihole/pihole:latest'
    ports:
    # The DNS server will listen on localhost only, the ports 5300 tcp/udp.
    # So the queries from the Internet won't be able to reach pihole directly.
    # The admin web interface, however, will be reachable from the Internet.
      - '127.0.1.53:5300:53/tcp'
      - '127.0.1.53:5300:53/udp'
      - '8081:80/tcp'
    environment:
      TZ: Europe/Amsterdam
      VIRTUAL_HOST: dns.example.com # domain name we'll use for our DNS server
      WEBPASSWORD: super_secret # Pihole admin password
    volumes:
      - './etc-pihole/:/etc/pihole/'
      - './etc-dnsmasq.d/:/etc/dnsmasq.d/'
    restart: unless-stopped

Start the container:

docker-compose up -d

After the container is fully started (it may take several minutes) check that it is able to resolve domain names:

dig +short @127.0.1.53 -p5300 one.one.one.one
# Excpected output
# 1.0.0.1
# 1.1.1.1

Letsencrypt Configuration

Issue the certificate for our dns.example.com domain:

certbot certonly

Follow the instructions on the screen (i.e. select the proper authentication method suitable for you, and fill the domain name).

After the certificate is issued it can be found by the following paths:

  • /etc/letsencrypt/live/dns.example.com/fullchain.pem – certificate chain
  • /etc/letsencrypt/live/dns.example.com/privkey.pem – private key

By default only the root user can read certificates and keys. Dnsdist, however, is running as user and group _dnsdist, so permissions need to be adjusted:

chgrp _dnsdist /etc/letsencrypt/live/dns.example.com/{fullchain.pem,privkey.pem}
chmod g+r /etc/letsencrypt/live/dns.example.com/{fullchain.pem,privkey.pem}

# We should also make archive and live directories readable.
# That will not expose the keys since the private key isn't world-readable
chmod 755 /etc/letsencrypt/{live,archive}

The certificates are periodically renewed by Certbot, so dnsdist should be restarted after that happens since it is not able to detect the new certificate. In order to do so, we put a so-called deploy script into /etc/letsencrypt/renewal-hooks/deploy directory:

mkdir -p /etc/letsencrypt/renewal-hooks/deploy
cat <<EOF > /etc/letsencrypt/renewal-hooks/deploy/restart-dnsdist.sh
#!/bin/sh
systemctl restart dnsdist
EOF
chmod +x /etc/letsencrypt/renewal-hooks/deploy/restart-dnsdist.sh

Dnsdist Configuration

Create dnsdist configuration file /etc/dnsdist/dnsdist.conf with the following content:

addACL('0.0.0.0/0')

-- path for certs and listen address for DoT ipv4,
-- by default listens on port 853.
-- Set X(int) for tcp fast open queue size.
addTLSLocal("0.0.0.0", "/etc/letsencrypt/live/dns.example.com/fullchain.pem", "/etc/letsencrypt/live/dns.example.com/privkey.pem", { doTCP=true, reusePort=true, tcpFastOpenSize=64 })

-- path for certs and listen address for DoH ipv4,
-- by default listens on port 443.
-- Set X(int) for tcp fast open queue size.
-- 
-- In this example we listen directly on port 443. However, since the DoH queries are simple HTTPS requests, the server can be hidden behind Nginx or Haproxy.
addDOHLocal("0.0.0.0", "/etc/letsencrypt/live/dns.example.com/fullchain.pem", "/etc/letsencrypt/live/dns.example.com/privkey.pem", "/dns-query", { doTCP=true, reusePort=true, tcpFastOpenSize=64 })

-- set X(int) number of queries to be allowed per second from a IP
addAction(MaxQPSIPRule(50), DropAction())

--  drop ANY queries sent over udp
addAction(AndRule({QTypeRule(DNSQType.ANY), TCPRule(false)}), DropAction())

-- set X number of entries to be in dnsdist cache by default
-- memory will be preallocated based on the X number
pc = newPacketCache(10000, {maxTTL=86400})
getPool(""):setCache(pc)

-- server policy to choose the downstream servers for recursion
setServerPolicy(leastOutstanding)

-- Here we define our backend, the pihole dns server
newServer({address="127.0.1.53:5300", name="127.0.1.53:5300"})

setMaxTCPConnectionsPerClient(1000)    -- set X(int) for number of tcp connections from a single client. Useful for rate limiting the concurrent connections.
setMaxTCPQueriesPerConnection(100)    -- set X(int) , similiar to addAction(MaxQPSIPRule(X), DropAction())

Checking if DoH and DoT Works

Check if DoH works using curl with doh-url flag:

curl --doh-url https://dns.example.com/dns-query https://leaseweb.com/

Check if DoT works using kdig program from the knot-dnsutils package:

apt install knot-dnsutils

kdig -d @dns.example.com +tls-ca leaseweb.com

Setting up Private DNS on Android

Currently only Android 9+ natively supports encrypted DNS queries by using DNS-over-TLS technology.

In order to use it go to: Settings -> Connections -> More connection settings -> Private DNS -> Private DNS provider hostname -> dns.example.com

Conclusion

In this article we’ve set up our own DNS resolving server with the following features:

  • Automatic TLS certificates using Letsencrypt.
  • Supports both modern encrypted protocols: DNS over TLS, and DNS over HTTPS.
  • Implements rate-limit of incoming queries to prevent abuse.
  • Automatically updated blacklist of malware, ad, and tracking domains.
  • Easily upgradeable by simply pulling a new version of Docker image.
Share

Using Correlation IDs in API Calls

Over the years, the IT industry has moved from a single domain, monolithic architecture to a microservice architecture. In a microservice architecture, complex processes are split into smaller and simpler sub-processes. While this kind of architecture has many benefits, there are also some downsides – for example, if you send one request to a Leaseweb API, it ends up in multiple requests in other backend systems [FIGURE 1]. How do you keep track of requests and responses processed by multiple systems? This is where Correlation IDs come into play.

[FIGURE 1: Example request/response flow]

Using a Correlation ID

A Correlation ID is a unique, randomly generated identifier value that is added to every request and response. In a microservice architecture, the initial Correlation ID is passed to your sub-processes. If a sub-system also makes sub-requests, it will also pass the Correlation ID to those systems.

How you pass the Correlation ID to other systems depends on your architecture. At Leaseweb we are using REST APIs a lot, with HTTP headers to pass on the Correlation ID. As a rule, we assign a Correlation ID as soon as possible, and always use a Correlation ID if it is passed on. Our public API only accepts Correlation IDs from internally trusted clients. For any other client (such as an employee or customer API clients) a new Correlation ID is generated for the request.

Real Value of Correlation IDs

The real value of Correlation IDs is realized when you also log the Correlation IDs. Debugging or tracing requests becomes much easier, as you can search all of your logs for the same Correlation ID. Combined with central logging solutions (such as the ELK stack), searching logs becomes even easier and can be done by non-technical colleagues. Providing tools to your colleagues to troubleshoot issues allows them to have more responsibility and gives you more time to work on more technical projects.

We mainly use Correlation IDs at Leaseweb for debugging purposes. When an error occurs, we provide the Correlation ID to the client/customer. If users provide the Correlation ID when submitting a support ticket, we can visualize the entire process needed to fulfil the client’s initial intent. This has significantly improved the time it takes us to fix bugs.

[FIGURE 2: Example of one Correlation ID with multiple requests]

Debugging issues is a time-consuming process if Correlation IDs are not used. When your environment scales, you will need to find solutions to group transactions happening in your systems. By using a Correlation ID, you can easily group requests and events in your systems, allowing you to spend more time fixing the problem and less time trying to find it.

Practical examples on how to implement Correlation IDs

The following examples use Symfony, a popular web application framework. These concepts can also be applied to any other framework, such as Laravel, Django, Flask or Ruby on Rails.

If you are unfamiliar with the concept of Service Containers and Dependency Injection, we recommend reading the excellent Symfony documentation about it here: https://symfony.com/doc/current/service_container.html

Using Monolog to append Correlation IDs to your application logs

When processing a HTTP request your application often logs some information – such as when an error occurred, or an important change made in your system that you want to keep track of. When using the Monolog logging library in PHP (https://seldaek.github.io/monolog/), you can use the concept of “Processors” (read more about that here on symfony.com).

One way to do this is by creating a Monolog Processor class:

<?php

namespace App\Monolog\Processor;

use Symfony\Component\HttpFoundation\RequestStack;

class CorrelationIdProcessor
{
    protected $requestStack;

    public function __construct(RequestStack $requestStack)
    { 

       $this->requestStack = $requestStack;

    }
 
    public function processRecord(array $record)
    {
        $request = $this->requestStack->getCurrentRequest();

        if (!$request) {
            return;
        }

        $correlationId = $request->headers->get(‘X-My-Correlation-ID');

        if (empty($correlationId)) {
             return;
        }

        // If we have a correlation id include it in every monolog line
        $record['extra']['correlation_id'] = $correlationId;
 
        return $record;
    }
}

Then register this class on the service container as a monolog processor in services.yml:

# app/config/services.yml

services:
  App\Monolog\Processor\CorrelationIdProcessor:
    arguments: ["@request_stack"]
    tags:
      - name: monolog.processor
        method: processRecord

Now, every time you log something in your application with Monolog:

$this->logger->info('shopping_cart_emptied', [‘cart_id’ => 123]);

You will see the Correlation ID of the HTTP Request in your log files:

$ grep ‘shopping_cart_emptied’ var/logs/prod.log

[2020-07-03 12:14:45] app.INFO: shopping_cart_emptied {“cart_id”: 123} {"correlation_id":"d135d5f1-3dd0-45fa-8f26-55d8d6a44876"}

You can utilize the same pattern to log the name of the user that is currently logged in, the remote IP address of the API client, or anything else that makes troubleshooting faster for you.

Using Guzzle to append Correlation IDs when making sub-requests

If your API makes API calls to other microservices (and you use Guzzle to do this) you can make use of Handlers and Middleware.

Some teams at Leaseweb depend on many downstream microservices, and can therefore have multiple guzzle clients as services on the service container. While each Guzzle client is configured with its own base URL and/or authentication, it is possible for all of the Guzzle clients to share the same HandlerStack.

First, create the middleware:

<?php

namespace App\Guzzle\Middleware;

use Symfony\Component\HttpFoundation\RequestStack;
use Psr\Http\Message\RequestInterface;

class CorrelationIdMiddleware
{
    protected $requestStack;
 
    public function __construct(RequestStack $requestStack)
    {
        $this->requestStack = $requestStack;
    }

    public function __invoke(callable $handler)
    {
        return function (RequestInterface $request, array $options = []) use ($handler) {
            $request = $this->requestStack->getCurrentRequest();

            if (!$request) {
                return $handler($request, $options);
            }

            $correlationId = $request->headers->get(‘X-My-Correlation-ID');

            if (empty($correlationId)) {
                 return $handler($request, $options);
            } 
 
            $request = $request->withHeader(‘X-My-Correlation-ID’, $correlationId);
 
            return $handler($request, $options);
        };
    }
}

Define this middleware as service on the service container and create a HandlerStack:

# app/config/services.yml

services:
  correlation_id_middleware:
    class: App\Guzzle\Middleware:
    arguments: ["@request_stack"]

  correlation_id_handler_stack:
    class: GuzzleHttp\HandlerStack
    factory: ['GuzzleHttp\HandlerStack', 'create']
    calls:
      - [push, ["@correlation_id_middleware", "correlation_id_forwarder"]]

With these two services defined, you can now configure all your Guzzle clients using the HandlerStack so that the Correlation ID of the current HTTP request is forwarded to downstream HTTP requests:

# app/config/services.yml

services:
  my_downstream_api:
    class:
    arguments:
      - base_uri: https://my-downstream-api.example.com
        handler: "@correlation_id_handler_stack”

Now every API call that you make to https://my-downstream-api.example.com will include the HTTP request header ‘X-My-Correlation-ID’ and have the same value as the Correlation ID of the current HTTP request. You can also apply the same Monolog and Guzzle tricks described here to the downstream API.

Expose Correlation IDs in error responses

The missing link between these processes is to now expose your Correlation IDs to your users so they can also log them or use them in support cases they report to your organization.

Symfony makes this easy using Event Listeners. You can define Event Listeners in Symfony to pre-process HTTP requests as well as to post-process HTTP Responses just before they are returned by Symfony to the API caller. In this example, we will create a HTTP Response listener and add the Correlation ID of the current HTTP request as a HTTP Header in the HTTP Response.

First, we create a service on the Service Container:

<?php
 
namespace App\Listener;
 
use Symfony\Component\HttpFoundation\RequestStack;
use Symfony\Component\HttpKernel\Event\FilterResponseEvent;

class CorrelationIdResponseListener
{
    protected $requestStack;
 
    public function __construct(RequestStack $requestStack)
    {
        $this->requestStack = $requestStack;
    }

    public function onKernelResponse(FilterResponseEvent $event)
    {
        $request = $this->requestStack->getCurrentRequest();

        if (!$request) {
            return;
        }

        $correlationId = $request->headers->get(‘X-My-Correlation-ID');

        if (empty($correlationId)) {
             return;
        }

        $event->getResponse()->headers->set(‘X-My-Correlation-ID’, $correlationId);
    }
}

Now configure it as a Symfony Event Listener:

# app/config/services.yml

services:
  correlation_id_response_listener:
    class: App\Listener\CorrelationIdResponseListener
    arguments: ["@request_stack"]
    tags:
      - { name: kernel.event_listener, event: kernel.response, method: onKernelResponse }

Every response that is generated by your Symfony application will now include a X-My-Correlation-ID HTTP response header with the same Correlation ID as the HTTP request.

The Value of Correlation IDs

Using Correlation IDs throughout your whole stack gives you more insight into all (sub)requests during a transaction. Using the right tools allows others to debug issues, giving your developers more time to work on new awesome features.

Implementing Correlation IDs isn’t hard to do, and can be achieved quickly depending on your software stack. At Leaseweb, the use of Correlation IDs has saved us hours of time while debugging issues on numerous occasions.

Technical Careers at Leaseweb

We are searching for the next generation of engineers and developers to help us build infrastructure to automate our global hosting services! If you are interested in finding out more, check out our Careers at Leaseweb.

Share

Measuring IP Connectivity with RIPE Atlas

One of the most important but often overlooked criteria when choosing a hosting provider is the performance of its network. Poor network performance for any kind of online service leads to low customer satisfaction. What is the best way to measure network performance that most accurately simulates a customers experience with a network? We found this question to be an interesting idea to explore during a Leaseweb hackathon!

RIPE Atlas

RIPE Atlas is the RIPE NCC’s main Internet data collection system. Its a global network of devices, called probes and anchors, that actively measure Internet connectivity. Anyone can access this data via Internet traffic maps, streaming data visualizations, and an API. RIPE Atlas users can also perform customized measurements to gain valuable data about their own networks.

Due to the size and reach of the Atlas project, it’s one of the most important internet measurement initiatives internationally. What’s great is that the project is run by RIPE, but driven by the global community of RIPE members (and non-members!) who contribute some of their resources to help the project. This improves visibility into the inner-workings of the global Internet. It’s used by Internet professionals all over the world to improve quality of their network, debug issues, and learn. Leaseweb contributes to the RIPE Atlas by hosting 7 ‘anchors’ in various data centers all over the globe.

Since Leaseweb already contributes to RIPE Atlas with these anchors, it’s an obvious choice as a source of random probes to be used against those anchors and compared with other infrastructure providers’ anchors. (By the way, if you would like to do your own measurements and contribute to the RIPE Atlas project at the same time, you can request a probe for your home or office right here!). You can read more on the structure of the Atlas network (and how probes, anchors and the RIPE backend work together) in various posts on the RIPE labs pages.

The main elements needed to use RIPE Atlas are measurements, sources and targets. These elements all have their own function to facilitate experiments.

Getting The Data

Triggering one-off measurements and fetching the results of those measurements can be done using the RIPE Atlas API. RIPE-NCC also developed a wrapper around the RIPE Atlas API to allow anyone to communicate with the RIPE Atlas API using Python. It is maintained by RIPE Atlas developers and is therefore the best choice for consuming their API. Using it requires an API key. Wrapper is open-source and available on GitHub.

Installation is very simple using pip:

$ pip install ripe.atlas.cousteau

Included classes in the Python script:

from ripe.atlas.cousteau import (
Ping,
Traceroute,
AtlasSource,
AtlasCreateRequest,
AtlasLatestRequest
)

Measurement Types

RIPE Atlas offers several measurement types – Ping, Traceroute, DNS, HTTP, SSL, NTP, and WiFi. Creating a measurement allows you to specify a type of test (or ‘experiment’) to perform. Typically, these have something to do with latency for a service, but there are also options to check things like resolving a domain name and checking DNS propagation. These measurements can be one-off or recurring. For our Hackathon project we used one-off measurements.

Sources

Probes defined as sources are used to trigger the measurement. They can be defined explicitly or taken from a pool (area, country, prefix or AS number). Below, the defined source takes 50 random probes from Europe to be used as a source of measurement.

source = AtlasSource(
    type="area",
    value= "North-Central",
    requested=50
)

Targets

Targeted IPs that the measurements are run against. In this example, ping is run against one of Leaseweb’s anchors.

pingLSW = Ping(
    af=4,
    target="5.79.112.97", 
    description="Ping nl-haa-as60781.anchors.atlas.ripe.net"
)

Create Request

Defining a request combines all of the elements together: measurements and sources. It also requires a start time, API key and – in this case – a one-off flag.

atlas_request = AtlasCreateRequest(
    start_time=datetime.utcnow(),
    key=ATLAS_API_KEY,
    measurements=[pingLSW, pingOVH, pingAzure, pingAWS, pingUni],
    sources=,
    is_oneoff=True
    )

To summarize what this request will do:

  • We specify a number of ping tests to various endpoints
  • We specify where we want to have those request come from and how many we want
  • …and we bundle those tests into a single one-off request.

Calling RIPE Atlas API now is simple

response = atlas_request.create()

After calling this function, Atlas will launch tests towards 50 random probes in the area we designated, and will store the results.

Returned values, stored in response, are measurement counts. In this case, there are 5 values as there are 5 measurements defined. The actual results have to be retrieved separately, so the next step is fetching the measurement data. Here, class AtlasLatestRequest is used:

results = AtlasLatestRequest(msm_id=measurement_id).create()

The results variable now has stored all of the measurement details needed to calculate and compare latencies. It’s clear to see how powerful Atlas is. In a few simple lines we’ve generated latency information from 50 end points to multiple targets!

Visualizing Data

Visualizing data that was fetched from RIPE Atlas API was done with pygal Python library that supports various chart types and styles. For this Hackathon project pygal.Bar() was used to draw out the comparison results. (Pygal usage is out of the scope of this blog post). Two charts below show the data from measurements taken from Europe and from Russia.

Conclusion and Going Forward

This Hackathon project showed the basic features of RIPE Atlas and what it can accomplish.  RIPE Atlas also maintains a parsing library name Sagan, available in GitHub, that handles format changes in measurement results and always returns a native Python object.

RIPE Atlas has a huge amount of functionality and can be easily used in your own experiments and measurements. Remember to always use the Atlas infrastructure responsibly.

Other tools for using permanent measurementsstreaming data, and building dashboards are RIPE Prometheus exporter, working as a metric exporter of RIPE Atlas measurement results that exports collected metrics to Prometheus. Grafana is common tool that works well with Prometheus to create dashboards with useful metrics, gathered from RIPE Atlas measurements.  

Comments? Questions? Other ways to use RIPE Atlas and receive measurements? I’d love to hear your feedback! 

Share

Measuring and Monitoring With Prometheus and Alertmanager

As one of the most successful projects of the Cloud Native Computing Foundation (CNCF), it is highly likely that you have heard of Prometheus. Initially built at SoundCloud in 2012 to fulfil their monitoring needs, Prometheus is now one of the most popular solutions for time-series based monitoring.

At Leaseweb, we use Prometheus for a variety of purposes – from basic system monitoring of our internal systems, to blackbox monitoring from several of our network locations, to cloud data usage and capacity monitoring.

Whether you have one or several servers, it is always good to have insight into what your systems are doing and how they are performing. In this article, we will show you how to set up a basic Prometheus server and expose system metrics using node_exporter.

For later blogs in this series, we will add Alertmanager to our Prometheus server and use Grafana to graph our recorded metrics.

This is an overview of the components involved and their role:

  • Prometheus: Scrapes metrics on external data sources (or ‘exporters’), stores metrics in time-series databases, and exposes metrics through API.
  • node_exporter: Exposes several system metrics, such as CPU & disk usage
  • Alertmanager: Handles alerts generated by the Prometheus server. Takes care of deduplicating, grouping, and routing alerts to the correct alert channel such as email, Telegram, PagerDuty, Slack, etc.
  • Grafana: Uses Prometheus as a datasource to graph the recorded metrics.

For this tutorial, we are going to use three servers running Ubuntu 18.04 LTS. However, the instructions can be easily adapted for any other recent Linux distribution. These can either be bare metal servers or cloud instances. When your Prometheus setup grows and you start to scrape more and more metrics, it is advisable to have SSD based storage in your Prometheus server.

If you want to start out small or experiment, you can also combine several components on one system.

A Note on Security

Since Prometheus was designed to be run in a private network/cloud setting, it does not offer any authentication or access control out of the box. Because of this, be careful not to expose any of the services to the outside world. There are several ways you can achieve this (implementation of which is outside of the scope of this tutorial).

To achieve this, you could use the Leaseweb private networking feature and bind the Prometheus related services to your private networking interface. Other options are to use a reverse proxy that implements basic authentication, or using firewall rules to only allow certain IP addresses to connect to your Prometheus-related services.

Installing Prometheus

To start off, we will install the Prometheus server. The prometheus package is part of the standard Ubuntu distribution repositories, but unfortunately the version (2.1.0) is quite old. At the time of writing this blog post, the latest version is 2.16.0, which is what we will be using.

On the system that will be our Prometheus server, we start off by creating a user and group called prometheus:

useradd -M -r -s /bin/false prometheus

Next, we create the directories that will contain the configuration and the data of Prometheus:

mkdir /etc/prometheus /var/lib/prometheus

Download Prometheus server and verify its integrity:

cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.16.0/prometheus-2.16.0.linux-amd64.tar.gz
wget -O - -q https://github.com/prometheus/prometheus/releases/download/v2.16.0/sha256sums.txt | grep linux-amd64 | shasum -c -

The last command should result in  prometheus-2.16.0.linux-amd64.tar.gz: OK. If it doesn’t, the downloaded file is corrupted. Next we unpack the file and move the various components into place:

tar xzf prometheus-2.16.0.linux-amd64.tar.gz
cp prometheus-2.16.0.linux-amd64/{prometheus,promtool} /usr/local/bin/
chown prometheus:prometheus /usr/local/bin/{prometheus,promtool}
cp -r prometheus-2.16.0.linux-amd64/{consoles,console_libraries} /etc/prometheus/
cp prometheus-2.16.0.linux-amd64/prometheus.yml /etc/prometheus/prometheus.yml

chown -R prometheus:prometheus /etc/prometheus
chown prometheus:prometheus /var/lib/prometheus

And clean up our downloaded files in /tmp

rm -f /tmp/prometheus-2.16.0.linux-amd64.tar.gz
rm -rf /tmp/prometheus-2.16.0.linux-amd64

Add prometheus itself to the config for scraping initially.

To be able to start and stop our prometheus server, we will create a systemd unit file.Use you favorite editor to create the file /etc/systemd/system/prometheus.service and add the following to it:

[Unit]
Description=Prometheus Time Series Collection and Processing Server
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries

[Install]
WantedBy=multi-user.target

Activate and start the service with the following commands:

systemctl daemon-reload
systemctl start prometheus
systemctl enable prometheus

The command systemctl status prometheus should now indicate that our service is up and running:

You should be able to access the web interface of the prometheus server now on http://<server IP>:9090:

If we go to Status > Targets we can see that the Prometheus server itself has already been added as a scraping target for metrics. This default target collects metrics about the performance of the Prometheus server. You can view the metrics that are being recorded under http://<server IP>:9090/metrics.

Prometheus provides two convenient endpoints for monitoring its health and status. You can use these to add to any other monitoring system you might have.

root@HRA-blogtest:~# curl localhost:9090/-/healthy
Prometheus is Healthy.
root@HRA-blogtest:~# curl localhost:9090/-/ready
Prometheus is Ready.

Monitor System Metrics with the Node Exporter

To make things a little more interesting, we are going to add a target to obtain system metrics of the Prometheus server. For this, we need to install the node exporter first.

Installing the node exporter

Download Prometheus node exporter and verify its integrity:

cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
wget -O - -q https://github.com/prometheus/node_exporter/releases/download/v0.18.1/sha256sums.txt | grep linux-amd64 | shasum -c -

The last command should result in node_exporter-0.18.1.linux-amd64.tar.gz: OK. If it doesn’t, the downloaded file is corrupted.

Next we unpack the file and move the node exporter into place:

tar xzf node_exporter-0.18.1.linux-amd64.tar.gz
cp node_exporter-0.18.1.linux-amd64/node_exporter /usr/local/bin/
chown prometheus:prometheus /usr/local/bin/node_exporter

And clean up our downloaded files in /tmp

rm -f /tmp/node_exporter-0.18.1.linux-amd64.tar.gz
rm -rf /tmp/node_exporter-0.18.1.linux-amd64

Create a unit file /etc/systemd/system/node_exporter.service for the node exporter using your favorite editor.

[Unit]
Description=Prometheus Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Reload the systemd configuration to activate our unit file, start the service, and enable the service to start at boot time:

systemctl daemon-reload
systemctl start node_exporter.service
systemctl enable node_exporter.service

The node exporter should now be running. You can verify this with systemctl status node_exporter

The node exporter listens on TCP port 9100. You should be able to see the node exporter metrics now at http://<server IP>:9100/metrics.

Adding the node exporter target to Prometheus

Now that the node exporter is running, we need to adapt the configuration of the Prometheus server so it can start scraping our node exporter metrics.

Open /etc/prometheus/prometheus.yml in your editor and adapt the scrape config section to look like the following:

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node'
    scrape_interval: 5s
    static_configs:
    - targets: ['localhost:9100']

Save the changes and restart the prometheus server configuration with systemctl restart prometheus

The Prometheus server web interface should show a new target now under Status > Targets:

Querying and Graphing the Recorded Metrics

Now that everything is set up, it is time to start looking into some of the things we are now measuring! Switch to the Graph tab in the Prometheus server web interface.

Enter node_memory_MemAvailable_bytes and click Execute. The Console tab will show you the current amount of memory free in bytes.

Switch to the Graph tab and you will see a graph of the amount of bytes of free memory there were over the course of the last hour. You can increase and decrease the time range with the plus and minus on the top left of the graph.

There is another metric that records the total amount of memory in the system. It is called node_memory_MemTotal_bytes. We can use this to calculate the percentage of memory free in the system. Enter the following in the query area and click execute:

(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

The graph will now show the percentage of free memory over time.

We can make this even more accurate by taking into account buffered and cached memory:

((node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes) / node_memory_MemTotal_bytes) * 100

Or turn it around and show the percentage of used memory instead:

(node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100

The CPU usage is recorded in the metrics under node_cpu_seconds_total. This metric has several modes of the CPU recorded:

  • user: Time spent in userland
  • system: Time spent in the kernel
  • iowait: Time spent waiting for I/O
  • idle: Time the CPU had nothing to do
  • irq&softirq: Time servicing interrupts
  • guest: If you are running VMs, the CPU they use
  • steal: If you are a VM, time other VMs “stole” from your CPUs

These metrics are recorded as counters, so to get the per second values we will use the irate function:

irate(node_cpu_seconds_total{job="node"}[5m])

As you can see, when you have multiple CPU’s in your server, it will return metrics for each CPU individually. To get the overall value across all CPU’s we can use PromQL’s aggregation features using sum by:

sum by (mode, instance) (irate(node_cpu_seconds_total{job="node"}[5m]))

We can also calculate the percentage of CPU used by taking the per second idle rate and multiplying it by 100 (to get the percent CPU idle), and then subtracting it from 100%:

100 - (avg by (instance) (irate(node_cpu_seconds_total{job="node",mode="idle"}[5m])) * 100)

And finally, to get the amount of data sent or received by our server, we can use irate(node_network_transmit_bytes_total{device!="lo"}[1m]) and irate(node_network_receive_bytes_total{device!="lo"}[1m]). This will give us a bytes-per-minute graph. The device!="lo" makes sure we exclude the local loopback interface.

To turn this into megabits, we will have to do some math:

(sum(irate(node_network_receive_bytes_total{device!="lo"}[1m])) by (instance, device) * 8 / 1024 / 1024)

To get a full idea of the possibilities of the PromQL querying language, see the documentation. By investigating the metrics available in the node exporter, you can create a lot more graphs like these – for example, for the amount of available disk space, the amount of file descriptors used, and a lot more.

In the next part of this blog, we will go deeper into visualizing the metrics using Grafana, and will also define alerting rules to receive alerts through Alertmanager.

Share

How to create a highly available web hosting platform using Floating IPs

In this Leaseweb Labs post, we’re going step-by-step to a proof of concept of a (very basic) highly available web hosting platform. Using Floating IPs and keepalived, we’ll create an active/standby setup on two different dedicated servers, with automatic failover through the Leaseweb API, so your application will never be down. We’ll use 2 dedicated servers and 1 Floating IP address from Leaseweb to make this happen.

What are Floating IPs?

Floating IPs are a kind of virtual IP address that can be dynamically routed to any server in the same network. Some hosting providers may also call this Elastic IPs or Virtual IP’s.

Multiple servers can own the same Floating IP address, but it can only be active on one server at any given time.

Floating IPs can be used to implement features such as:

  • Failover in a high-availability cluster
  • Zero-downtime Continuous Deployment

Using Floating IPs

Using Floating IPs is quite simple, with Leaseweb, you can order them through the customer portal and set them up on your server as an additional IP address. But the real power lies in automation. By using the Leaseweb API, it’s possible to use any script or even some 3rd party software to automatically control Floating IPs.

When paired with free software such as keepalived, which can detect when a server is down and take action accordingly, it becomes possible to create a fully automated highly available platform for any application.

Step one: Set up the servers and Floating IPs

First, let’s set up the two servers with a simple HTTP web server and use a Floating IP address to access the website of either one server.

  • Server A (Leaseweb Server Id 20483) has IP address 212.32.230.75 and is pre-installed with CentOS 7
  • Server B (Leaseweb Server Id 37089) has IP address 212.32.230.66 and is pre-installed with Ubuntu 18.04
  • 89.149.192.0 is the Floating IP address

Setting up the Floating IP address in the Customer Portal

If you don’t have a Floating IP yet, then from the Floating IPs page in the Leaseweb Customer Portal click the  button to order Floating IPs. Once delivered, you will see an entry like this:

Click on the range to open its detail page:

Here it is possible to set up a relationship between a Floating IP and an Anchor IP. Leaseweb calls this a “Floating IP Definition”, and can be done with the  button.

Let’s create a new definition to link Floating IP 89.149.192.0 to the Anchor IP 212.32.230.75 of server A:

Once saved, there will be one Floating IP Definition visible:

Setting up the Floating IP address and a demonstration webpage on the servers

On a server, a Floating IP can be set up as any other additional IP address. A gateway address is not necessary, and the subnet mask is always 255.255.255.255, or /32 in CIDR notation.

To add an additional IP address to an interface in Linux without making the change persistent, we can simply use the
ip -4 address show
command to show which device the main IP address is configured on, and then do
ip address add <Floating IP address>/32 dev <Device>
to add the floating IP to the same device.

We also install a HTTP server and create a simple demonstration webpage:

# Check which device we need to add then IP address to
ip -4 address show
ip address add 89.149.192.0/32 dev eno1

# The Floating IP address should now be visible on the device
ip -4 address show

# Install a web server and create a basic default webpage
yum install -y httpd
systemctl start httpd
cat <<EOF > /var/www/html/index.html
<!DOCTYPE html>
<html>
<head><title>This is test server A</title></head>
<body><h1>This is test server A</h1></body>
</html>
EOF

Result:

tim@laptop:~$ ssh root@20483.lsw
[root@servera ~]# ip -4 address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 212.32.230.75/26 brd 212.32.230.127 scope global eno1
       valid_lft forever preferred_lft forever

[root@servera ~]# ip address add 89.149.192.0/32 dev eno1

[root@servera ~]# ip -4 address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 212.32.230.75/26 brd 212.32.230.127 scope global eno1
       valid_lft forever preferred_lft forever
    inet 89.149.192.0/32 scope global eno1
       valid_lft forever preferred_lft forever

[root@servera ~]# yum install -y httpd

[...]

[root@servera ~]# systemctl start httpd

[root@servera ~]# cat <<EOF > /var/www/html/index.html
> <!DOCTYPE html>
> <html>
> <head><title>This is test server A</title></head>
> <body><h1>This is test server A</h1></body>
> </html>
> EOF

[root@servera ~]#

(note: ssh root@20483.lsw is a neat little trick explained here: https://gist.github.com/timwb/1f95737d54563aedd7c97d5e671667cc)

You should now already be able to ping the Floating IP address, and opening it in a browser loads the demo webpage:

Next, add the same Floating IP address to server B, install a HTTP web server and create a simple demo webpage:

# Check which device we need to add the IP address to
ip -4 address show
ip address add 89.149.192.0/32 dev enp32s0

# The Floating IP address should now be visible on the device
ip -4 address show

# Install a web server and create a basic default webpage
apt install -y nginx
cat <<EOF > /var/www/html/index.html
<!DOCTYPE html>
<html>
<head><title>This is test server B</title></head>
<body><h1>This is test server B</h1></body>
</html>
EOF

Result:

tim@laptop:~$ ssh root@37089.lsw
root@serverb:~# ip -4 address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: enp32s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 212.32.230.66/26 brd 212.32.230.127 scope global enp32s0
       valid_lft forever preferred_lft forever
3: enp34s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.32.18.208/27 brd 10.32.18.223 scope global enp34s0
       valid_lft forever preferred_lft forever

root@serverb:~# ip address add 89.149.192.0/32 dev enp32s0

root@serverb:~# ip -4 address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: enp32s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 212.32.230.66/26 brd 212.32.230.127 scope global enp32s0
       valid_lft forever preferred_lft forever
    inet 89.149.192.0/32 scope global enp32s0
       valid_lft forever preferred_lft forever
3: enp34s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.32.18.208/27 brd 10.32.18.223 scope global enp34s0
       valid_lft forever preferred_lft forever

root@serverb:~# apt install -y nginx

[...]

root@serverb:~# cat <<EOF > /var/www/html/index.html
> <!DOCTYPE html>
> <html>
> <head><title>This is test server B</title></head>
> <body><h1>This is test server B</h1></body>
> </html>
> EOF

root@serverb:~#

FLIP’ing a Floating IP

Initially, we’ve setup Floating IP 89.149.192.0 with Anchor IP 212.32.230.75, which belongs to server A.

Suppose we’ve developed an updated web application on server B and after months of testing, it’s finally ready.

To direct users visiting 89.149.192.0 to server B, we need to update the Anchor IP of Floating IP 89.149.192.0, changing (FLIP’ing) it from 212.32.230.75 (server A) to 212.32.230.66 (server B).

To do this manually, click  in the Customer Portal and change the Anchor IP:

Now, when you refresh your browser, the page from server B is shown:

Congratulations, you’ve just done a zero-downtime deployment, but also set your first step towards a high availability, continuous deployment web hosting cluster.

Step 2: Using the API to manage Floating IPs

Of course, using the Leaseweb Customer Portal is a convenient way to set up and play with Floating IPs, but the real power is in automation.

The official documentation of the Floating IPs API can be found on developer.leaseweb.com

In the following examples we’ll use curl to perform http requests and the jq tool to pretty-print the API responses, but you can use any tool or library for interacting with a RESTful API. You can find your API key (X-Lsw-Auth) in the Customer Portal under API

Floating IPs and Floating IP ranges have a prefix length and are always written in CIDR notation. In the context of API calls, the forward slash “/” is replaced with an underscore “_” for compatibility in URLs. For a single Floating IP address (/32), the prefix length may be omitted.

List Floating IP ranges

To list Floating IP ranges, make a GET request to /floatingIps/v2/ranges:
curl --silent --request GET --url https://api.leaseweb.com/floatingIps/v2/ranges --header 'X-Lsw-Auth: 213423-2134234-234234-23424' |jq

{
  "ranges": [
    {
      "id": "89.149.192.0_29",
      "range": "89.149.192.0/29",
      "customerId": "12345678",
      "salesOrgId": "2000",
      "pop": "AMS-01"
    }
  ],
  "_metadata": {
    "limit": 20,
    "offset": 0,
    "totalCount": 1
  }
}

List the Floating IP definitions in a Floating IP range

To list the Floating IP definitions within a certain Floating IP range, make a GET request to 

{
  "floatingIpDefinitions": [
    {
      "id": "89.149.192.0",
      "rangeId": "89.149.192.0_29",
      "pop": "AMS-01",
      "customerId": "12345678",
      "salesOrgId": "2000",
      "floatingIp": "89.149.192.0/32",
      "anchorIp": "212.32.230.66",
      "status": "ACTIVE",
      "createdAt": "2019-06-17T14:15:11+00:00",
      "updatedAt": "2019-06-26T09:26:52+00:00"
    }
  ],
  "_metadata": {
    "totalCount": 1,
    "limit": 20,
    "offset": 0
  }

{
  "id": "89.149.192.0",
  "rangeId": "89.149.192.0_29",
  "pop": "AMS-01",
  "customerId": "12345678",
  "salesOrgId": "2000",
  "floatingIp": "89.149.192.3/32",
  "anchorIp": "212.32.230.66",
  "status": "CREATING",
  "createdAt": "2019-06-26T14:30:40+00:00",
  "updatedAt": "2019-06-26T14:30:40+00:00"
}

{
  "floatingIpDefinitions": [
    {
      "id": "89.149.192.0",
      "rangeId": "89.149.192.0_29",
      "pop": "AMS-01",
      "customerId": "12345678",
      "salesOrgId": "2000",
      "floatingIp": "89.149.192.0/32",
      "anchorIp": "212.32.230.66",
      "status": "ACTIVE",
      "createdAt": "2019-06-17T14:15:11+00:00",
      "updatedAt": "2019-06-26T14:23:58+00:00"
    },
    {
      "id": "89.149.192.3",
      "rangeId": "89.149.192.0_29",
      "pop": "AMS-01",
      "customerId": "12345678",
      "salesOrgId": "2000",
      "floatingIp": "89.149.192.3/32",
      "anchorIp": "212.32.230.66",
      "status": "ACTIVE",
      "createdAt": "2019-06-26T14:30:40+00:00",
      "updatedAt": "2019-06-26T14:30:45+00:00"
    }
  ],
  "_metadata": {
    "totalCount": 2,
    "limit": 20,
    "offset": 0
  }
}

 updating 89.149.192.0 with Anchor IP 212.32.230.75, so we’re directing traffic back to server A again:
curl --silent --request PUT --url https://api.leaseweb.com/floatingIps/v2/ranges/89.149.192.0_29/floatingIpDefinitions/89.149.192.0_32--header 'X-Lsw-Auth: ' --header 'content-type: application/json' --data '{
    "anchorIp": "212.32.230.75"
}' |jq

{
  "id": "89.149.192.0",
  "rangeId": "89.149.192.0_29",
  "pop": "AMS-01",
  "customerId": "12345678",
  "salesOrgId": "2000",
  "floatingIp": "89.149.192.0/32",
  "anchorIp": "212.32.230.66",
  "status": "UPDATING",
  "createdAt": "2019-06-17T14:15:11+00:00",
  "updatedAt": "2019-06-26T14:35:57+00:00"
}

Note that in the response, the old anchorIP is still listed and the status has changed to UPDATING. The update process is very fast, but not instantaneous. When making another GET request to , you can see that the update has processed seconds later:

{
  "floatingIpDefinitions": [
    {
      "id": "89.149.192.0",
      "rangeId": "89.149.192.0_29",
      "pop": "AMS-01",
      "customerId": "12345678",
      "salesOrgId": "2000",
      "floatingIp": "89.149.192.0/32",
      "anchorIp": "212.32.230.75",
      "status": "ACTIVE",
      "createdAt": "2019-06-17T14:15:11+00:00",
      "updatedAt": "2019-06-26T14:36:01+00:00"
    }
  ],
  "_metadata": {
    "totalCount": 1,
    "limit": 20,
    "offset": 0
  }
}

Delete a Floating IP definition

Deleting a Floating IP definition is as easy as making a DELETE call to :
curl --silent --request DELETE --url https://api.leaseweb.com/floatingIps/v2/ranges/89.149.192.0_29/floatingIpDefinitions/89.149.192.3--header 'X-Lsw-Auth: ' |jq

{
  "id": "89.149.192.3",
  "rangeId": "89.149.192.0_29",
  "pop": "AMS-01",
  "customerId": "12345678",
  "salesOrgId": "2000",
  "floatingIp": "89.149.192.3/32",
  "anchorIp": "212.32.230.66",
  "status": "REMOVING",
  "createdAt": "2019-06-26T14:30:40+00:00",
  "updatedAt": "2019-06-26T14:39:34+00:00"
}

Just like with the POST and PUT calls, it will take a couple of seconds to process.

Step three: Putting it all together – creating a highly available web hosting platform with Keepalived

Keepalived is a versatile piece of software that can be used to implement automatic failover using the Leaseweb Floating IPs API. We’ll demonstrate how to create a simple active/backup setup where the Floating IP is automatically routed to server B in the event that server A fails.

It can do many more things, and keep in mind this is meant as a proof-of-concept example only, meant to demonstrate the how to be highly available with automatic failover and Floating IPs in the simplest possible way.

The keepalived configuration

After installing, the configuration of keepalived resides in the /etc/keepalived/keepalived.conf file. In this file, we’ll instruct keepalived to:

  • Create a “vrrp” instance named webservers with id 123:
    Note: the id can be any random number between 0-255, but it needs to be the same between all servers.
    vrrp_instance webservers { ... }
    virtual_router_id
  • Setup server A to be the master, with priority 200:
    state MASTER
    priority 200
  • Setup server B to be the backup, with priority 100:
    state BACKUP
    priority 100
  • Communicate with each other using a shared secret:
    interface <interface name> (see the instructions under Setting up the Floating IP address on the servers)
    unicast_src_IP <server's IP address>
    unicast_peer { <other server's IP address> }
    authentication { ... }
  • Run a script to update the Anchor IP when either server becomes master
    notify_master /etc/keepalived/becomemaster.sh
  • Run a command to check if the web server is still running. On server A (CentOS) this is the httpd process, on server B (Ubuntu), this is the nginx process and we need to wrap the command in a small script instead.
    track_script { ... }

So, we run the following commands to setup server A:

# Install keepalived
yum install -y keepalived

# Write keepalived config
cat <<EOF > /etc/keepalived/keepalived.conf
vrrp_instance webservers {
    virtual_router_id 123
    state MASTER
    priority 200
    interface eno1
    unicast_src_ip 212.32.230.75
    unicast_peer {
        212.32.230.66
    }
    authentication {
        auth_type PASS
        auth_pass supersecret
    }
    notify_master /etc/keepalived/becomemaster.sh
    track_script {
        chk_apache
    }
}

vrrp_script chk_apache {
    script "/usr/sbin/pidof httpd"
    interval 2
}
EOF

# Write script that calls floating IP API to update the Floating IP with this server as Anchor IP
cat <<EOF > /etc/keepalived/becomemaster.sh
#!/bin/sh
curl --silent --request PUT --url https://api.leaseweb.com/floatingIps/v2/ranges/89.149.192.0_29/floatingIpDefinitions/89.149.192.0_32 --header 'X-Lsw-Auth: '"213423-2134234-234234-23424" --header 'content-type: application/json' --data '{ "anchorIp": "212.32.230.75" }'
EOF
chmod +x /etc/keepalived/becomemaster.sh

# Restart keepalived
systemctl restart keepalived

# Check keepalived status
systemctl status keepalived

Result:

tim@laptop:~$ ssh root@20483.lsw
[root@servera ~]# yum install -y keepalived

[...]

[root@servera ~]# cat <<EOF > /etc/keepalived/keepalived.conf
> vrrp_instance webservers {
>     virtual_router_id 123
>     state MASTER
>     priority 200
>     interface eno1
>     unicast_src_ip 212.32.230.75
>     unicast_peer {
>         212.32.230.66
>     }
>     authentication {
>         auth_type PASS
>         auth_pass supersecret
>     }
>     notify_master /etc/keepalived/becomemaster.sh
>     track_script {
>         chk_apache
>     }
> }
>
> vrrp_script chk_apache {
>     script "/usr/sbin/pidof httpd"
>     interval 2
> }
> EOF

[root@servera ~]# cat <<EOF > /etc/keepalived/becomemaster.sh
> #!/bin/sh
> curl --silent --request PUT --url https://api.leaseweb.com/floatingIps/v2/ranges/89.149.192.0_29/floatingIpDefinitions/89.149.192.0_32 --header 'X-Lsw-Auth: '"213423-2134234-234234-23424" --header 'content-type: application/json' --data '{ "anchorIp": "212.32.230.75" }'
> EOF

[root@servera ~]# chmod +x /etc/keepalived/becomemaster.sh

[root@servera ~]# systemctl restart keepalived

[root@servera ~]# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2019-07-23 11:27:03 UTC; 30s ago
  Process: 1346 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 1347 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─1347 /usr/sbin/keepalived -D
           ├─1348 /usr/sbin/keepalived -D
           └─1349 /usr/sbin/keepalived -D

Jul 23 11:27:03 servera Keepalived_vrrp[1349]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 23 11:27:03 servera Keepalived_vrrp[1349]: WARNING - default user 'keepalived_script' for script execution does not exist ...reate.
Jul 23 11:27:03 servera Keepalived_vrrp[1349]: Truncating auth_pass to 8 characters
Jul 23 11:27:03 servera Keepalived_vrrp[1349]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Jul 23 11:27:03 servera Keepalived_vrrp[1349]: Using LinkWatch kernel netlink reflector...
Jul 23 11:27:03 servera Keepalived_vrrp[1349]: VRRP sockpool: [ifindex(2), proto(112), unicast(1), fd(10,11)]
Jul 23 11:27:03 servera Keepalived_vrrp[1349]: VRRP_Script(chk_apache) succeeded
Jul 23 11:27:04 servera Keepalived_vrrp[1349]: VRRP_Instance(webservers) Transition to MASTER STATE
Jul 23 11:27:05 servera Keepalived_vrrp[1349]: VRRP_Instance(webservers) Entering MASTER STATE
Jul 23 11:27:05 servera Keepalived_vrrp[1349]: Opening script file /etc/keepalived/becomemaster.sh
Hint: Some lines were ellipsized, use -l to show in full.

[root@servera ~]#

Then we setup server B:

# Install keepalived
apt install -y keepalived

# Write keepalived config
cat <<EOF > /etc/keepalived/keepalived.conf
vrrp_script chk_nginx {
    script "/etc/keepalived/chk_nginx.sh"
    interval 2
}

vrrp_instance webservers {
    virtual_router_id 123
    state BACKUP
    priority 100
    interface enp32s0
    unicast_src_ip 212.32.230.66
    unicast_peer {
        212.32.230.75
    }
    authentication {
        auth_type PASS
        auth_pass supersecret
    }
    notify_master /etc/keepalived/becomemaster.sh
    track_script {
        chk_nginx
    }
}
EOF

# Write script that calls floating IP API to update the Floating IP with this server as Anchor IP
cat <<EOF > /etc/keepalived/becomemaster.sh
#!/bin/sh
curl --silent --request PUT --url https://api.leaseweb.com/floatingIps/v2/ranges/89.149.192.0_29/floatingIpDefinitions/89.149.192.0_32 --header 'X-Lsw-Auth: '"213423-2134234-234234-23424" --header 'content-type: application/json' --data '{ "anchorIp": "212.32.230.66" }'
EOF
chmod +x /etc/keepalived/becomemaster.sh

# Restart keepalived
systemctl restart keepalived

# Check keepalived status
systemctl status keepalived

Result:

tim@laptop:~$ ssh root@37089.lsw
[root@serverb ~]# apt install -y keepalived

[...]

[root@serverb ~]# cat <<EOF > /etc/keepalived/keepalived.conf
> vrrp_instance webservers {
>     virtual_router_id 123
>     state BACKUP
>     priority 100
>     interface enp32s0
>     unicast_src_ip 212.32.230.66
>     unicast_peer {
>         212.32.230.75
>     }
>
>     authentication {
>         auth_type PASS
>         auth_pass supersecret
>     }
>
>     notify_master /etc/keepalived/becomemaster.sh
>
>     track_script {
>         chk_nginx
>     }
> }
>
> vrrp_script chk_nginx {
>     script "/etc/keepalived/chk_nginx.sh"
>     interval 2
> }
> EOF

[root@serverb ~]# cat <<EOF > /etc/keepalived/becomemaster.sh
> #!/bin/sh
> curl --silent --request PUT --url https://api.leaseweb.com/floatingIps/v2/ranges/89.149.192.0_29/floatingIpDefinitions/89.149.192.0_32 --header 'X-Lsw-Auth: '"213423-2134234-234234-23424" --header 'content-type: > application/json' --data '{ "anchorIp": "212.32.230.66" }'
> EOF

[root@serverb ~]# cat <<EOF > /etc/keepalived/chk_nginx.sh
> #!/bin/sh
> /bin/pidof nginx
> EOF

[root@serverb ~]# chmod +x /etc/keepalived/becomemaster.sh

[root@serverb ~]# systemctl restart keepalived

[root@serverb ~]# systemctl status keepalived
● keepalived.service - Keepalive Daemon (LVS and VRRP)
   Loaded: loaded (/lib/systemd/system/keepalived.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-07-23 11:27:12 UTC; 48s ago
  Process: 24346 ExecStart=/usr/sbin/keepalived $DAEMON_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 24355 (keepalived)
    Tasks: 3 (limit: 4574)
   CGroup: /system.slice/keepalived.service
           ├─24355 /usr/sbin/keepalived
           ├─24357 /usr/sbin/keepalived
           └─24358 /usr/sbin/keepalived

Jul 23 11:27:12 serverb Keepalived_vrrp[24358]: Registering Kernel netlink command channel
Jul 23 11:27:12 serverb Keepalived_vrrp[24358]: Registering gratuitous ARP shared channel
Jul 23 11:27:12 serverb Keepalived_vrrp[24358]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 23 11:27:12 serverb Keepalived_vrrp[24358]: WARNING - default user 'keepalived_script' for script execution does not exist - please create.
Jul 23 11:27:12 serverb Keepalived_vrrp[24358]: Truncating auth_pass to 8 characters
Jul 23 11:27:12 serverb Keepalived_vrrp[24358]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Jul 23 11:27:12 serverb Keepalived_vrrp[24358]: Using LinkWatch kernel netlink reflector...
Jul 23 11:27:12 serverb Keepalived_vrrp[24358]: VRRP_Instance(webservers) Entering BACKUP STATE
Jul 23 11:27:12 serverb Keepalived_healthcheckers[24357]: Opening file '/etc/keepalived/keepalived.conf'.
Jul 23 11:27:12 serverb Keepalived_vrrp[24358]: VRRP_Script(chk_nginx) succeeded

[root@serverb ~]# 

Watching keepalived in action

So now that we have our redundant setup and server A is the master. If we visit the Floating IP address in our browser, we see that it’s being served from server A:

Let’s simulate a failure on server A by shutting down the Apache web server with the and watch server B take over.

On server A, run:
systemctl stop httpd

Within a couple of seconds, you’ll see it failover to server B. Feel free to hammer F5 like your life depends on it!

Looking at the logs of keepalived on server B, you can see that it detected the failure on server A and automatically executed the script to update the Anchor IP:

journalctl -u keepalived |tail

[ ... ]

Jul 23 11:51:43 diy-dhcp-ams01-nl Keepalived_vrrp[24358]: VRRP_Instance(webservers) Transition to MASTER STATE
Jul 23 11:51:44 diy-dhcp-ams01-nl Keepalived_vrrp[24358]: VRRP_Instance(webservers) Entering MASTER STATE
Jul 23 11:51:44 diy-dhcp-ams01-nl Keepalived_vrrp[24358]: Opening script file /etc/keepalived/becomemaster.sh

That’s it, you now have your own (minimal implementation of) a highly available web hosting platform!

Share