文章目录
# PostgreSQL
https://iothub.org.cn/docs/middleware/
https://iothub.org.cn/docs/middleware/postgresql/postgres-ha/
一、概述
1.PostgreSQL高可用
PostgreSQL是一款功能,性能,可靠性都可以和国际上成熟的商业数据库相媲美的开源数据库。而且PostgreSQL的许可和生态完全开放,不被任何一个单一的公司或国家所操控,保证了使用者没有后顾之忧。国内越来越多的企业开始用PostgreSQL代替原来昂贵的国外商业数据库。
在部署PostgreSQL到生产环境中时,选择适合的高可用方案是一项必不可少的工作。
PostgreSQL的开源HA工具有很多种,下面几种算是比较常用的
- PAF(PostgreSQL Automatic Failomianver)
- repmgr
- Patroni
2.Patroni
Patroni基于Python开发的模板,结合DCS(例如 ZooKeeper, etcd, Consul )可以定制PostgreSQL高可用方案。
Patroni并不是一套拿来即用的PostgreSQL高可用组件,涉及较多的配置和定制工作。
Patroni接管PostgreSQL数据库的启停,同时监控本地的PostgreSQL数据库,并将本地的PostgreSQL数据库信息写入DCS。
Patroni的主备端是通过是否能获得 leader key 来控制的,获取到了leader key的Patroni为主节点,其它的为备节点。
Patroni + Etcd 方案优点
最近在在看基于 Patroni + Etcd 的高可用方案,此方案使用Patroni管理本地库,并结合Etcd作为数据存储和主节点选举
具有以下优势:
- 健壮性: 使用分布式key-value数据库作为数据存储,主节点故障时进行主节点重新选举,具有很强的健壮性
- 支持多种复制方式: 基于内置流复制,支持同步流复制、异步流复制、级联复制
- 支持主备延迟设置: 可以设置备库延迟主库WAL的字节数,当备库延迟大于指定值时不做故障切换
- 自动化程度高:
- 支持自动化初始PostgreSQL实例并部署流复制
- 当备库实例关闭后,支持自动拉起
- 当主库实例关闭后,首先会尝试自动拉起
- 支持switchover命令,能自动将老的主库进行角色转换
- 避免脑裂: 数据库信息记录到 ETCD 中,通过优化部署策略(多机房部署、增加实例数)可以避免脑裂
二、PostgreSQL高可用部署
1.安装简介
# 安装环境
node1: 172.51.216.81
node2: 172.51.216.82
node3: 172.51.216.83
操作系统:centos7.5
数据库: PostgreSQL12
# 部署方案:
PostgreSQL12:node1、node2
etcd: node1、node2、node3
Patroni: node1、node2
haproxy: node1、node2
keepalived: node1、node2
VIP: 172.51.216.88
# 三台服务器允许挂掉一个节点,etcd挂掉两个节点就无法工作
2.安装准备
1.所有节点设置时钟同步
yum install -y ntpdate
ntpdate time.windows.com && hwclock -w
2.关闭防火墙
setenforce 0
sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config
systemctl disable firewalld.service
systemctl stop firewalld.service
iptables -F
3.安装ETCD
- 安装
yum install -y gcc python-devel epel-release
yum install -y etcd
- 修改配置文件
配置文件:/etc/etcd/etcd.conf
# node1
ETCD_DATA_DIR="/var/lib/etcd/etcd1.etcd"
ETCD_LISTEN_PEER_URLS="http://172.51.216.81:2380"
ETCD_LISTEN_CLIENT_URLS="http://172.51.216.81:2379,http://127.0.0.1:2379"
ETCD_NAME="etcd1"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.51.216.81:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://172.51.216.81:2379"
ETCD_INITIAL_CLUSTER="etcd1=http://172.51.216.81:2380,etcd2=http://172.51.216.82:2380,etcd3=http://172.51.216.83:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
# node2
ETCD_DATA_DIR="/var/lib/etcd/etcd2.etcd"
ETCD_LISTEN_PEER_URLS="http://172.51.216.82:2380"
ETCD_LISTEN_CLIENT_URLS="http://172.51.216.82:2379,http://127.0.0.1:2379"
ETCD_NAME="etcd2"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.51.216.82:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://172.51.216.82:2379"
ETCD_INITIAL_CLUSTER="etcd1=http://172.51.216.81:2380,etcd2=http://172.51.216.82:2380,etcd3=http://172.51.216.83:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
# node3
ETCD_DATA_DIR="/var/lib/etcd/etcd3.etcd"
ETCD_LISTEN_PEER_URLS="http://172.51.216.83:2380"
ETCD_LISTEN_CLIENT_URLS="http://172.51.216.83:2379,http://127.0.0.1:2379"
ETCD_NAME="etcd3"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.51.216.83:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://172.51.216.83:2379"
ETCD_INITIAL_CLUSTER="etcd1=http://172.51.216.81:2380,etcd2=http://172.51.216.82:2380,etcd3=http://172.51.216.83:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
- 修改 etcd.service
vim /usr/lib/systemd/system/etcd.service
# 直接删除原有内容,替换为以下配置
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=/etc/etcd/etcd.conf
User=etcd
# set GOMAXPROCS to number of processors
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/bin/etcd \
--name=\"${ETCD_NAME}\" \
--data-dir=\"${ETCD_DATA_DIR}\" \
--listen-peer-urls=\"${ETCD_LISTEN_PEER_URLS}\" \
--listen-client-urls=\"${ETCD_LISTEN_CLIENT_URLS}\" \
--initial-advertise-peer-urls=\"${ETCD_INITIAL_ADVERTISE_PEER_URLS}\" \
--advertise-client-urls=\"${ETCD_ADVERTISE_CLIENT_URLS}\" \
--initial-cluster=\"${ETCD_INITIAL_CLUSTER}\" \
--initial-cluster-token=\"${ETCD_INITIAL_CLUSTER_TOKEN}\" \
--initial-cluster-state=\"${ETCD_INITIAL_CLUSTER_STATE}\""
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
- 运行ETCD
# 依次启动 node1、node2、node3 节点的 etcd
#启动Etcd
systemctl daemon-reload && systemctl enable etcd && systemctl start etcd
systemctl start etcd
systemctl enable etcd
systemctl status etcd
- 测试
# 任意节点查看集群状态
etcdctl cluster-health
etcdctl member list
[root@localhost ~]# etcdctl cluster-health
member 77b98c1c4d35f89e is healthy: got healthy result from http://172.51.216.83:2379
member 9ecc9898244a78c8 is healthy: got healthy result from http://172.51.216.81:2379
member e63b126eea6bfc6c is healthy: got healthy result from http://172.51.216.82:2379
cluster is healthy
[root@localhost ~]# etcdctl member list
77b98c1c4d35f89e: name=etcd3 peerURLs=http://172.51.216.83:2380 clientURLs=http://172.51.216.83:2379 isLeader=true
9ecc9898244a78c8: name=etcd1 peerURLs=http://172.51.216.81:2380 clientURLs=http://172.51.216.81:2379 isLeader=false
e63b126eea6bfc6c: name=etcd2 peerURLs=http://172.51.216.82:2380 clientURLs=http://172.51.216.82:2379 isLeader=false
4.安装PostgreSQL
1.安装PostgreSQL 12
yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
yum install -y postgresql12-server postgresql12-contrib
2.创建PostgreSQL数据目录
mkdir -p /var/lib/pgsql/12/data
chown postgres:postgres -R /var/lib/pgsql/12
chmod -R 700 /var/lib/pgsql/12/data
3. 配置环境变量
解决pg_ctl的问题
-bash: pg_ctl: command not found
[root@localhost ~]# su - postgres
-bash-4.2$ pwd
/var/lib/pgsql
-bash-4.2$ vim .bash_profile
-bash-4.2$ source .bash_profile
增加配置信息
PGDATA=/var/lib/pgsql/12/data
export PGDATA
export PATH=/usr/pgsql-12/bin:$PATH
pg_ctl stop -m fast
pg_ctl --help
5.安装Patroni
- 安装Patroni
yum install libffi-devel openssl-devel
# 安装gcc软件包
yum install gcc
wget https://www.python.org/ftp/python/3.7.4/Python-3.7.4.tgz
tar zxvf Python-3.7.4.tgz
cd Python-3.7.4/
./configure
make
su
make install
python3 -m pip install --upgrade pip
python3 -m pip install psycopg2_binary
python3 -m pip install patroni[etcd]
# 验证是否安装成功
which patroni
patronictl --help
patronictl version
[root@localhost Python-3.7.4]# patronictl version
patronictl version 2.0.2
- 创建Partoni service配置文件
/etc/systemd/system/patroni.service
vim /etc/systemd/system/patroni.service
[Unit]
Description=Runners to orchestrate a high-availability PostgreSQL
After=syslog.target network.target
[Service]
Type=simple
User=postgres
Group=postgres
#StandardOutput=syslog
ExecStart=/usr/local/bin/patroni /etc/patroni.yml
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=process
TimeoutSec=30
Restart=no
[Install]
WantedBy=multi-user.target
# 位置
/usr/local/bin/patroni
- 创建Patroni配置文件
配置文件/etc/patroni.yml
vim /etc/patroni.yml
# node1
scope: pgsql
namespace: /service/
name: pg1
restapi:
listen: 0.0.0.0:8008
connect_address: 172.51.216.81:8008
etcd:
host: 172.51.216.81:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
listen_addresses: "0.0.0.0"
port: 5432
wal_level: logical
hot_standby: "on"
wal_keep_segments: 100
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
initdb:
- encoding: UTF8
- locale: C
- lc-ctype: zh_CN.UTF-8
- data-checksums
pg_hba:
- host replication repl 0.0.0.0/0 md5
- host all all 0.0.0.0/0 md5
postgresql:
listen: 0.0.0.0:5432
connect_address: 172.51.216.81:5432
data_dir: /var/lib/pgsql/12/data
bin_dir: /usr/pgsql-12/bin
authentication:
replication:
username: repl
password: "123456"
superuser:
username: postgres
password: "postgres"
basebackup:
max-rate: 100M
checkpoint: fast
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
# node2
scope: pgsql
namespace: /service/
name: pg2
restapi:
listen: 0.0.0.0:8008
connect_address: 172.51.216.82:8008
etcd:
host: 172.51.216.82:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
listen_addresses: "0.0.0.0"
port: 5432
wal_level: logical
hot_standby: "on"
wal_keep_segments: 100
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
initdb:
- encoding: UTF8
- locale: C
- lc-ctype: zh_CN.UTF-8
- data-checksums
pg_hba:
- host replication repl 0.0.0.0/0 md5
- host all all 0.0.0.0/0 md5
postgresql:
listen: 0.0.0.0:5432
connect_address: 172.51.216.82:5432
data_dir: /var/lib/pgsql/12/data
bin_dir: /usr/pgsql-12/bin
authentication:
replication:
username: repl
password: "123456"
superuser:
username: postgres
password: "postgres"
basebackup:
max-rate: 100M
checkpoint: fast
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
- 启动Patroni
先在node1上启动Patroni
systemctl start patroni
systemctl status patroni
systemctl enable patroni
systemctl restart patroni
systemctl stop patroni
- 测试
# patronictl -c /etc/patroni.yml list
# 执行手动切换
patronictl -c /etc/patroni.yml switchover
curl -s "http://172.51.216.81:8008/cluster1" | jq .
create table test (id int, name varchar(100));
insert into test values ( 1,'1');
select * from test;
# 在主机node1上停从库,如下:
[pg12@ydtf01 ~]$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
6.安装haproxy
- 安装haproxy
node1: 172.51.216.81
node2: 172.51.216.82
yum install -y haproxy
- 修改haproxy配置文件
#配置文件:/etc/haproxy/haproxy.cfg
vim /etc/haproxy/haproxy.cfg
global
maxconn 100
log 127.0.0.1 local2
defaults
log global
mode tcp
retries 2
timeout client 30m
timeout connect 4s
timeout server 30m
timeout check 5s
listen stats
mode http
bind *:7000
stats enable
stats uri /
listen pgsql
bind *:5000
option httpchk
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server postgresql_172.51.216.81_5432 172.51.216.81:5432 maxconn 100 check port 8008
server postgresql_172.51.216.82_5432 172.51.216.82:5432 maxconn 100 check port 8008
listen pgsql_read
bind *:6000
option httpchk GET /read-only
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server postgresql_172.51.216.81_5432 172.51.216.81:5432 maxconn 100 check port 8008
server postgresql_172.51.216.82_5432 172.51.216.82:5432 maxconn 100 check port 8008
- 启动haproxy
systemctl start haproxy
systemctl enable haproxy
# haproxy部署后,可以通过它的web接口查看统计数据
http://172.51.216.81:7000/
http://172.51.216.82:7000/
http://172.51.216.88:7000/
# master:
psql "host=172.51.216.81 port=5000 password=postgres"
# slave:
psql "host=172.51.216.81 port=6000 password=postgres"
7.安装keepalived
- 安装keepalived
node1: 172.51.216.81
node2: 172.51.216.82
# yum -y install keepalived
- 修改keepalived配置文件
# 配置文件: /etc/keepalived/keepalived.conf
vim /etc/keepalived/keepalived.conf
# node1
! Configuration File for keepalived
global_defs {
router_id haproxy1
}
vrrp_script check_haproxy {
script "/etc/keepalived/haproxy_chk.sh"
interval 5
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 80
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
172.51.216.88/24
}
track_script {
check_haproxy
}
}
# node2
! Configuration File for keepalived
global_defs {
router_id haproxy2
}
vrrp_script check_haproxy {
script "/etc/keepalived/haproxy_chk.sh"
interval 5
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 80
priority 50
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
172.51.216.88/24
}
track_script {
check_haproxy
}
}
- 健康检测脚本
# 主从配置相同
vim /etc/keepalived/haproxy_chk.sh
#!/usr/bin/env bash
# test haproxy server running
systemctl status haproxy.service &>/dev/null
if [ $? -ne 0 ];then
systemctl start haproxy.service &>/dev/null
sleep 5
systemctl status haproxy.service &>/dev/null
if [ $? -ne 0 ];then
systemctl stop keepalived
fi
fi
# 修改权限
chmod +x /etc/keepalived/haproxy_chk.sh
- 开启服务验证是VIP
systemctl start keepalived
systemctl enable keepalived
ip a
- 测试
http://172.51.216.88:7000/
psql "host=172.51.216.88 port=5000 password=postgres"
psql "host=172.51.216.88 port=6000 password=postgres"
create table test (id int, name varchar(100));
insert into test values ( 1,'1');
select * from test;
8.测试
- 查看集群状态
# 查看集群状态
# patronictl -c /etc/patroni.yml list
- 手动切换 switchover
# 手动切换
# patronictl -c /etc/patroni.yml switchover
- etcd
# 任意节点查看集群状态
etcdctl cluster-health
etcdctl member list
- 数据库操作
# haproxy
http://172.51.216.88:7000/
# psql
# 主
psql "host=172.51.216.88 port=5000 password=postgres"
# 从
psql "host=172.51.216.88 port=6000 password=123456"
create table test (id int, name varchar(100));
insert into test values ( 1,'1');
select * from test;
#停止数据库
$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
# PostgreSQL
https://iothub.org.cn/docs/middleware/
https://iothub.org.cn/docs/middleware/postgresql/postgres-ha/