Notes from the field: Highly Available L7 Load Balancing for Exchange 2013 with HAProxy

Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 1 - Introduction and lab description
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 2 - Deploy and configure the PKI infrastructure
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 3 - Configure and test the Exchange 2013 Client Access role
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 4 - Install CentOS 7
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 5 - Install and configure HAProxy
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 6 - Make HAProxy highly available (this page)
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 7 - Demo

In part 5 we installed and fully configured HAProxy. Technically we would be good to go, but we take it one step further: we want our HAProxy servers to be highly available.

In this part we will install and configure keepalived and will make HAProxy highly available. Part 6 is organised into the following sections:

Install and configure keepalived.
Testing keepalived.

Install and Configure keepalived

We log on to lab-hap01 via Putty. By default we'll download the source tarball to root's home directory:

cd ~

wget http://www.keepalived.org/software/keepalived-1.2.13.tar.gz

We then uncompress the tarball, change to the uncompressed directory, configure the installation, compile the program, and install it. Lots of fast scrolling output, so no screenshots. Here are the commands:

tar –zxvf keepalived-1.2.13.tar.gz

cd keepalived-1.2.13

./configure

make

make install

We need to tell the kernel to allow binding to non-local addresses, so we open the /etc/sysctl.conf file and add the following line:

net.ipv4.ip_nonlocal_bind=1

We create the /etc/keepalived/keepalived.conf file (note that the file will be created and written to by vi when we save it)...

mkdir /etc/keepalived

vi /etc/keepalived/keepalived.conf

...and add the following content:

global_defs {

  notification_email {

    administrator@digitalbrain.com.au

  }

  notification_email_from lab-hap01@digitalbrain.com.au

  smtp_server 10.30.1.11

  smtp_connect_timeout 30

}

vrrp_script check_haproxy {

  script "killall -0 haproxy"

  interval 2

  weight 2

}

vrrp_instance VI_1 {

  interface ens160

  state MASTER

  virtual_router_id 10

  priority 101

  virtual_ipaddress {

    10.30.1.15

  }

  track_script {

    check_haproxy

  }

  smtp_alert

}

In your lab, update the interface name in the interface ens160 line with your server’s interface, for example interface eth0. If not sure what your interface name is, then run ifconfig on your server:

Also, if you still remember from Part 1, the HAProxy virtual IP in my lab is 10.30.1.15. In yours, replace the virtual_ipaddress value with one that’s valid in your environment.

Our keepalived solution also supports SMTP (email) notifications in case something happens. In your implementation, change the recipient in the notification_email directive. Also change the sender e-mail address on the notification_email_from line with the hostname@yourdomain that’s valid for your environment. Hostname is the computer’s host part of its FQDN. Technically it can be anything you like, but it makes sense to have it set so.

Due to a coding issue in keepalived which returns a blank host name under certain conditions, we need to add the following line to the /etc/hosts file, otherwise email notifications will fail:

10.30.1.13   lab-hap01.localdomain

It is important that we add the FQDN of the server, and not just the hostname.

For those interested, I found in my lab that the gethostbyname(name.nodename) function in /root/keepalived-1.2.13/lib/utils.c (remember that we extracted the sources to /root/keepalived-1.2.13) will return NULL, and keepalived will greet Exchange with HELO (null). Exchange doen't know who (null) is, and therefore it will drop the communication, causing SMTP notifications to fail.

I also want to make the point that in my lab the SMTP server is a single point of failure: email notifications go to the IP address of a single server as opposed to a clustered/HA SMTP agent. In real life I would send notifications to a system that is always up and not affected by failures of a single mail server

For additional safety in terms of monitoring, SNMP support can also be built into keepalived and integrated into your enterprise monitoring system of choice. Not in this lab.

We now make the keepalived daemon start automatically:

cp /usr/local/etc/rc.d/init.d/keepalived /etc/init.d/

chmod +x /etc/init.d/keepalived

chkconfig keepalived on

cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/

What these commands do:

cp - it copies the keepalived init script from the default installation location to /etc/init.d. More on /etc/init.d here.
chmod - makes the script executable. More on chmod here.
chkconfig - enables the keepalived service to run at startup. More on chkconfig here.
cp - it copies the default keepalived configuration file to /etc/sysconfig/. The /etc/sysconfig directory contains system configuration files, including our keepalived configuration file. For more click here.

The default daemon line in /etc/init.d/keepalived looks like this:

daemon keepalived ${KEEPALIVE_OPTIONS}

Open /etc/init.d/keepalived in your favorite text editor and change the daemon line as follows so that keepalived can actually start:

daemon /usr/local/sbin/keepalived ${KEEPALIVED_OPTIONS}

Our CentOS 7 minimal install doesn’t include killall. We need it in our keepalived config script to test whether the haproxy service is running. We install it as part of the psmisc package:

yum install psmisc –y

Also, by default, the CentOS 7 firewll blocks VRRP traffic. VRRP is essential for keepalived to function. We allow VRRP traffic with the following command – read more about it here:

firewall-cmd –-permanent -–add-rich-rule='rule protocol value="vrrp" accept'

Now we restart our server:

shutdown –r now

Log back on as root and run these commands to do a basic health check:

service keepalived status
The service is running.

cat /var/log/messages | grep VRRP_Instance
keepalived started in MASTER mode.

ip a | grep "inet 10"
We have the virtual IP bound to our ens160 interface on lab-hap01...

ping 10.30.1.15 (run it on another machine, e.g. LAB-WS01)
...and it is communicating on the VIP.

firewall-cmd –-list-rich-rule
We confirm that our firewall rule survived the restart.

Awesome, we are looking good!

Now we repeat these steps on lab-hap02, with a couple of important differences.

In the /etc/keepalived/keepalived.conf file we change the priority to a lower value than the master, for instance to 100:

While still in the /etc/keepalived/keepalived.conf file, we also change the notification_email_from line to lab-hap02@digitalbrain.com.au.
This is an obvious one, but need to ensure it doesn't slip through the cracks: in the /etc/hosts file we enter the correct hostname for lab-hap02.

When it’s all done and lab-hap02 has been rebooted, we repeat the same tests:

service keepalived status
The service is running.

cat /var/log/messages | grep VRRP_Instance
keepalived started in BACKUP mode.

Note that the server entered the BACKUP state because it received a higher priority advert and removed the VIP from its network card as the VIP is supposed to live on the MASTER.
ip a | grep "inet 10"
We do NOT have the virtual IP bound to our ens160 interface on lab-hap02 because lab-hap02 is the BACKUP node.

firewall-cmd --list-rich-rule
We confirm that our firewall rule survived the restart.

We skipped the ping test as the VIP is bound to lab-hap01 and therefore it hasn’t got anything to do with lab-hap02 testing.

Testing keepalived

Time for some HA testing. To recap:

haproxy is running on both lab-hap01 and lab-hap02.
keepalived is running on both lab-hap01 and lab-hap02.
lab-hap01 is the MASTER and lab-hap02 is the BACKUP.
lab-hap01 holds the VIP.

Let’s confirm. On lab-hap01:

ps –A | grep haproxy

ps –A | grep keepalived

ip a | grep "inet 10"

Same check on lab-hap02:

On lab-hap01 we stop haproxy and we check its IP addresses:

systemctl stop haproxy.service

ip a | grep "inet 10"

Then we confirm that lab-hap-01 is no longer the MASTER (expected for the VIP is not bound to its network card):

cat /var/log/messages | grep VRRP-Instance

On lab-hap02 we confirm that the VIP has been bound to the NIC:

ip a | grep "inet 10"

Then we confirm that lab-hap02 is now the new MASTER:

cat /var/log/messages | grep VRRP-Instance

Up to this point we confirmed that stopping haproxy on lab-hap01 was correctly detected and the VIP has been transferred to lab-hap02. Therefore, if we point our Exchange DNS records to the VIP, continued service is assured.

Now we start the haproxy service on lab-hap01 and check the IP address:

systemctl start haproxy.service

ip a | grep "inet 10"

Checking the IP address on lab-hap02 shows that the VIP has been removed from it:

And last, we want to know how long it takes for the VIP to fail over once a service failure is detected. For this we kick off a continuous PING to the VIP from LAB-WS02:

ping 10.30.1.15 -t

Then we stop the haproxy service on lab-hap01 and watch how many pings are lost while service failure is detected and the VIP is moved to lab-hap02:

systemctl stop haproxy.service

Finally we start the haproxy service on lab-hap01 and, again, we watch the pings:

systemctl start haproxy.service

The screenshot shows that failover is virtually instantaneous, with only one ping lost during service failover:

Impressive!

In this part we installed, configured and tested keepalived, the bit which makes HAProxy highly available, on both HAProxy servers. Technically we've almost reached the end of our journey, with only one last step left: confirm that client access actually works, traffic is load balanced, and service level failure is correctly detected and handled.

In part 7, our last part, we will test various client access methods and we’ll confirm that load balancing, error detection and high availability actually works from a client’s perspective too.

5 comments:

Ronald30 November 2017 at 13:38
Hola
I really like your write-up very much, very appreciated. I think one thing was missing and that is to add the line net.ipv4.ip_nonlocal_bind=1 to /etc/sysctl.conf. This allows HAProxy to bind to a non-existing IP, so when the VIP is on the other node, HAProxy can still startup. Otherwise it fails.
Further, really great article! Many thanks!
Zoli27 December 2017 at 16:16
Ronald, this is already addressed. Please see the section "Install and Configure keepalived" at the top of this article.
Billy1 March 2018 at 10:56
Hi Zoltan.

I am constrained by the configuration in the backup node at the time of the test by turning off the haproxy service in the master node the result of virtual ip can not move in the backup node. please also show for the configuration in the backup node in this discussion so I can do the testing again.

Many Thx.
Unknown25 October 2022 at 23:08
Thank you so much.

Monday, 27 October 2014

Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 6

5 comments: