Nginx作为对外暴露的访问入口,必须具有高可用性,才能保证能够正常提供服务。单机Nginx服务的情况下,一旦出现宕机,将会导致需要Nginx路由的服务不可用访问,因此,保证Nginx服务的HA(high availabitlity),也就是高可用性。
keepalived+lvs+nginx如何保证Nginx高可用?
keepalived是一个集群高可用的轻量级解决方案,关于他的介绍不多做描述,度娘很多。这里主要分析一下是如何保证nginx高可用。
我们都知道单机无法保证高可用,那么必定要实现主备或者集群来保证其可用性。Nginx本身并没有提供这样的功能,keepalived就是解决这种问题的一种实现方案。利用keepalived可以实现主备架构,在master故障发生时进行故障转移,选举备机作为新的master提供服务,同时结合keepalived提供的检测机制,可以保证Nginx的高可用。
按照我的理解,画了下面的架构图,下面看图分析。
- 首先是外部请求,客户端访问在 keepalived中的vrrp配置的对外暴露的虚拟ip,访问到keepalived-service-master所在服务器server1,此时keepalived-service-backup服务做备用,不提供对外服务。
- 通过keepalived-service-master中的路由配置,keepalived将请求路由到实际处理请求的service,在HA Nginx架构中,keepalived将请求路由到Nginx服务,并且这个Nginx服务需要根keepalived服务在同一个server上面(稍后分析为何需要同一个server),一个keepalived服务相当于“监控”一个Nginx服务,图中keepalived-master-service的请求都路由到 Nginx service1(这里也可以配置多个Nginx服务),不会到Nginx service2。
- Nginx service1接收请求之后,根据负载配置分发请求。
以上就是一个正常的请求通过keepalived的处理流程,在此时server2中的keepalived和nginx服务其实都是没有处理请求的,只做备机。下面分析如果Nginx-service1不可用时,如何保证高可用。
如上图所示,加入Nginx service1服务挂了,那么势必需要启用备用Nginx服务,这时候就需要Keepalived发挥作用了。
- 在keepalived中,配置定时执行脚本,检测Nginx service是否可用。例如上图,如果Nginx service1不可用,那么就需要把请求都转移到Nginx service2中。通过脚本方式,检测到Nginx service1不可用,此时我们自动把server1的keepalived service master关闭
ps:因为需要执行shell脚本检测Nginx服务的可用性以及自动关闭Keepalived service,所以二者需要在同一台server。 - 当server1的keepalived service master关闭后,会自动选举server2的keepalived service backup为新的master,通过虚拟ip访问的所有请求都会转发到server2的 keepalived service。同样,server2的keepalived service只会转发请求给同机器的Nginx service2,这样就完成了请求转移处理,保证Nginx的服务可用。
keepalived安装与配置
安装步骤
- 下载压缩包,官网地址
- 解压缩压缩包
- 新建一个安装路径,我使用的是
/usr/local/lib/keepalived
-
apt-get install libssl-dev
,在ubuntu下安装openssl依赖,其他系统类似。 - 安装前进入解压缩文件夹,执行命令预先配置
./configure -prefix=/usr/local/lib/keepalived --sysconf=/etc
- 配置好安装路径之后,执行安装命令
make && make install
做完上面的工作之后,keepalived就安装完成了,但是为了操作方便,我们可以把keepalived的相关命令添加到系统中。
- 进入安装路径,为keepalived命令创建软连接,进入
/usr/local/lib/keepalived
,执行命令ln -s sbin/keepalived /sbin
。 - 复制解压文件夹中的init.d到系统环境中
cp /usr/local/download/keepalived-2.0.11/keepalived/etc/init.d/keepalived /etc/init.d
。 - ubuntu中检查添加系统服务命令:
sysv-rc-conf --list
,查看是否有keepalived。 -
sysv-rc-conf keepalived on
启用keepalived相关命令。
做完以上工作就可以使用service keepalived [start|stop|restart]等命令操作。
配置步骤
基本配置
在这里面,配置文件放在/etc/keepalived/
文件夹下面。我们需要配置一个vrrp虚拟路由,拥有主备两个节点。完整的配置文件说明在官网地址中有详细说明。执行vim /etc/keepalived/keepalived.conf
精简原配置文件,Master配置文件如下。
! Configuration File for keepalived
global_defs {
# String identifying the machine (doesn't have to be hostname).
# (default: local host name)
router_id LVS_DEVEL_15
}
#vrrp 虚拟路由冗余协议定义部分
vrrp_instance VI_1 {
# Initial state, MASTER|BACKUP
# As soon as the other machine(s) come up,
# an election will be held and the machine
# with the highest priority will become MASTER.
# So the entry here doesn't matter a whole lot.
# 实际上还是根据优先级来选取master 这个地方的定义不重要
state MASTER
# interface for inside_network, bound by vrrp
interface eth0
# arbitrary unique number from 0 to 255
# used to differentiate multiple instances of vrrpd
# running on the same NIC (and hence same socket).
virtual_router_id 51
#优先级决定谁是master
priority 100
# VRRP Advert interval in seconds (e.g. 0.92) (use default),vrrp主备之间检查时间间隔
advert_int 1
authentication {
auth_type PASS
# should be the same on all machines.所有节点应该相同
auth_pass 1111
}
#对外暴露的虚拟ip,可以配置多个
virtual_ipaddress {
192.168.0.16
}
}
#为虚拟ip配置真实ip映射
virtual_server 192.168.0.16 80 {
#health check
delay_loop 6
#负载均衡算法,表示这里可以配置多个realserver
lb_algo rr
lb_kind NAT
#会话保持时间
persistence_timeout 50
protocol TCP
#路由到实际的工作nginx服务进行请求分发,master转发到15的nginx
real_server 192.168.0.15 80 {
weight 1
TCP_CHECK {
connect_timeout 3 #超时时间
delay_before_retry 3 #重试间隔
connect_port 80 #监测端口
}
}
}
备份机Backup配置文件如下。
! Configuration File for keepalived
global_defs {
# String identifying the machine (doesn't have to be hostname).
# (default: local host name)
router_id LVS_DEVEL_15
}
#vrrp 虚拟路由冗余协议定义部分
vrrp_instance VI_1 {
# Initial state, MASTER|BACKUP
# As soon as the other machine(s) come up,
# an election will be held and the machine
# with the highest priority will become MASTER.
# So the entry here doesn't matter a whole lot.
# 实际上还是根据优先级来选取master 这个地方的定义不重要
state BACKUP
# interface for inside_network, bound by vrrp
interface eth0
# arbitrary unique number from 0 to 255
# used to differentiate multiple instances of vrrpd
# running on the same NIC (and hence same socket).
virtual_router_id 51
#优先级决定谁是master
priority 50
# VRRP Advert interval in seconds (e.g. 0.92) (use default),vrrp主备之间检查时间间隔
advert_int 1
authentication {
auth_type PASS
# should be the same on all machines.所有节点应该相同
auth_pass 1111
}
#对外暴露的虚拟ip,可以配置多个
virtual_ipaddress {
192.168.0.16
}
}
#为虚拟ip配置真实ip映射
virtual_server 192.168.0.16 80 {
#health check
delay_loop 6
#负载均衡算法,表示这里可以配置多个realserver
lb_algo rr
lb_kind NAT
#会话保持时间
persistence_timeout 50
protocol TCP
#路由到实际的工作nginx服务进行请求分发,备机转发到13的nginx
real_server 192.168.0.13 80 {
weight 1
TCP_CHECK {
connect_timeout 3 #超时时间
delay_before_retry 3 #重试间隔
connect_port 80 #监测端口
}
}
}
上面的配置文件,配置了一个vrrp实例VI_1,拥有Master(195.168.0.15)和backup(192.168.0.13),对外暴露的虚拟ip是192.168.0.16。
客户端访问192.168.0.16:80,请求都会路由到Master进行处理转发,当Master的keepalived服务挂了,备机的keepalived服务升级为Master,继续对外提供服务。目前,上面的配置中,没有配置定时检测Nginx服务可用性以及自动关闭故障Keepalived master服务,所以故障目前不能转移,但是正常操作可以。
- 13、15分别执行命令
service keepalived satrt
启动keepalived服务 - 分别执行
ip addr
命令
13(备机)ip信息如下
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:97:9b:fd brd ff:ff:ff:ff:ff:ff
inet 192.168.0.13/24 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe97:9bfd/64 scope link
valid_lft forever preferred_lft forever
15(Master)ip信息如下
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:42:db:66 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.15/24 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.0.16/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe42:db66/64 scope link
valid_lft forever preferred_lft forever
可以明显看出Master的网卡上挂载了192.168.0.16这个虚拟ip
- 在15(master)上执行命令
service keepalived stop
,然后在13(备机)上执行ip addr
,ip信息如下
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:97:9b:fd brd ff:ff:ff:ff:ff:ff
inet 192.168.0.13/24 brd 192.168.0.255 scope global eth0
valid_lft forever preferred_lft forever
inet 192.168.0.16/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe97:9bfd/64 scope link
valid_lft forever preferred_lft forever
备机成为Master之后,网卡信息上也挂载了192.168.0.16这个虚拟ip,表示此时由备机对外提供服务。
配置踩过的坑
- vrrp_strict配置需要注释掉,否则可能无法ping 通我们配置的对外暴露的虚拟ip,192.168.0.16这个虚拟ip。
- 如果注释了 vrrp_strict,依然不能ping 通虚拟ip,那么尝试执行
iptables --list
,查看拟配置的虚拟ip是否已经包含在iptables规则中,如果在,那么也可能导致你的虚拟ip无法ping通。 - 再另外可能就是防火墙的问题,粗暴点关闭防火墙
定时检测脚本配置–实现故障时自动转移
在基本配置中我们配置了基本的转发,与备机启用,没有配置检测检测,无法做到自动的故障转移,还需要手动执行service keepalived stop
来关闭故障Nginx服务对应的keepalived,下面配置定时检测Nginx服务可用性,并且决定是否自动关闭keepalived服务,实现故障自动转移。
Master初始主机配置内容修改为如下。新增了【newadd】标记部分
! Configuration File for keepalived
global_defs {
# String identifying the machine (doesn't have to be hostname).
# (default: local host name)
router_id LVS_DEVEL_15
# Don't run scripts configured to be run as root if any part of the path,启用执行脚本【newadd】
# is writable by a non-root user.
enable_script_security
}
#定义vrrp执行的脚本【newadd】
vrrp_script nginx_check_for_keepalived {
script "/usr/local/lib/keepalived/script/nginx-check-for-keepalived.sh"
#执行周期2秒一次
interval 2
user root
}
#vrrp 虚拟路由冗余协议定义部分
vrrp_instance VI_1 {
# Initial state, MASTER|BACKUP
# As soon as the other machine(s) come up,
# an election will be held and the machine
# with the highest priority will become MASTER.
# So the entry here doesn't matter a whole lot.
# 实际上还是根据优先级来选取master 这个地方的定义不重要
state MASTER
# interface for inside_network, bound by vrrp
interface eth0
# arbitrary unique number from 0 to 255
# used to differentiate multiple instances of vrrpd
# running on the same NIC (and hence same socket).
virtual_router_id 51
#优先级决定谁是master
priority 100
# VRRP Advert interval in seconds (e.g. 0.92) (use default),vrrp主备之间检查时间间隔
advert_int 1
authentication {
auth_type PASS
# should be the same on all machines.所有节点应该相同
auth_pass 1111
}
#对外暴露的虚拟ip,可以配置多个
virtual_ipaddress {
192.168.0.16
}
#配置检测脚本【newadd】
track_script {
nginx_check_for_keepalived
}
}
#为虚拟ip配置真实ip映射
virtual_server 192.168.0.16 80 {
#health check
delay_loop 6
#负载均衡算法,表示这里可以配置多个realserver
lb_algo rr
lb_kind NAT
#会话保持时间
persistence_timeout 50
protocol TCP
#路由到实际的工作nginx服务进行请求分发
real_server 192.168.0.15 80 {
weight 1
TCP_CHECK {
connect_timeout 3 #超时时间
delay_before_retry 3 #重试间隔
connect_port 80 #监测端口
}
}
}
配置修改之后,我么你需要在对应的路径创建相应脚本,在/usr/local/lib/keepalived/script/
下面创建nginx-check-for-keepalived.sh
shell脚本。脚本内容大致为检测nginx是否存活,不存活的话就关闭keepalived服务,启用备机keepalived服务。下面简单写了个脚本,内容可能不够严谨,仅供参考(shell命令不够熟悉…emmmm)。
#! /bin/bash
c=`ps -ef|grep nginx|grep -v nginx-check-for-keepalived|wc -l`
if [ $c -le 1 ];then
echo "nginx service is dead"
service keepalived stop
else echo "nginx service is healthy"
fi
- 以上配置完成之后,重启主机的keepalived服务
- 执行命令,关闭nginx服务
- 观察keepalived服务是否也会自动关闭
- 原master keepalived service自动关闭后,请求自动转发到备机,故障时转移成功。
遇到的坑
配置完成后,发现脚本一直没有执行成功,执行cat /var/log/syslog |grep keepalived
查看keepalived日志,发现报错如下。
Jan 9 22:32:55 zhoujy-VirtualBox Keepalived_vrrp[4368]: Error exec-ing command '/usr/local/lib/keepalived/script/nginx-check-for-keepalived.sh', error 8: Exec format error
Jan 9 22:32:57 zhoujy-VirtualBox Keepalived_vrrp[4369]: Error exec-ing command '/usr/local/lib/keepalived/script/nginx-check-for-keepalived.sh', error 8: Exec format error
Jan 9 22:32:59 zhoujy-VirtualBox Keepalived_vrrp[4370]: Error exec-ing command
原来是脚本开头定义出错了,平时执行脚本可以,但是keepalived执行时报错。
#! /bin/bash
写成了
# !/bin/bash
keepalived执行脚本时报错,改正后即可。
总结
总的来说,使用keepalived保证Nginx高可用,就是基于主-备架构,利用keepalived实现故障时自动切换到备机。一般使用一个keepalived服务+一个Nginx服务搭配作为一个(主节点),备机节点也一样。相当于keepalived服务监控着Nginx服务,然后利用keepalived自身的故障选举机制,实现间接实现Nginx的故障转移。