Phil Vacca
Enterprise Data Architect for Mitel Networks
Outages, corruption, and human error happen in the real world
Untargeted (and targeted) bitlocker-style ransom attacks are increasingly common
Generally we are referring to a Primary node (often called master) and at least one standby node (replica, slave)
A standby server can be cold, warm, or hot
A very little, promise ;-)
The Write-Ahead Log ensures that transactional changes are written to disk only after being written to a transaction log
This is what makes your database recoverable, and another word for this is durable(the D in ACID)
A logical backup (from pg_dump
) won't be enough
Use the pg_basebackup
command to seed a binary replica of the running primary database
The command can write the backup directly to another remote host. See the docs for more options.
on primary
postgresql.conf
wal_level = archive
archive_mode = on
archive_command = 'cp %p /path_to/archive/%f'
on standby host
recovery.conf
standby_mode = true
restore_command = 'cp /path_to/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /path_to/archive %r'
on primary
CREATE ROLE replicator WITH LOGIN REPLICATION;
pg_hba.conf
# allow the standby host to connect
host replication replicator 192.168.0.42/32 md5
postgresql.conf
wal_level = hot_standby
full_page_writes = on
wal_log_hints = on
max_wal_senders = 5
wal_keep_segments = 32
on standby host
postgresql.conf
hot_standby = on
recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.0.23 port=5432 user=replicator'
What if there's a gap in processing - maybe a network problem - and the WAL files turn over before replication has read the transactions?
ERROR: requested WAL segment 00000001000000010000002D
has already been removed
With a streaming replica, you can always enable log-shipping as a fall-back plan.
Added in 9.4
They give the Primary a persistent record of the state of replicas
This means that the primary will keep ALL WAL records until slot is received from.
on primary
SELECT * FROM pg_create_physical_replication_slot('standby_replication_slot_alpha');
postgresql.conf
max_replication_slots = 5
on replica
recovery.conf
primary_slot_name = 'standby_replication_slot_alpha'
Consider another what if. . .
What if an outage occurs after a transaction is logged to the Primary, but before the replica has accepted it?
We need a guaranty that the transaction has been written to one (or more than one) additional host.
on primary
postgresql.conf
synchronous_commit = on
synchronous_standby_names = "1 (named_host1, named_host2)"
There is no formula to determine latency when syncronous replication is enabled. It is a function of your network conditions and the health and processing power of the available replicas
Basic monitoring is done on the Primary through pg_stat_replication
Additional checks for lag can be done by writing scripts, or using check_postgres.pl
From the introduction in the docs
This means that the subscriber can write to other tables while still receiving updates from the subscription
You can have any number of subscribers for a publication
TRUNCATE
is not replicatedDone through DDL on Publisher
CREATE PUBLICATION pub_one
-- FOR ALL TABLES
FOR TABLE accounts, users, order_*
WITH (publish = 'insert, update, delete')
;
Simply use ALTER PUBLICATION
ADD TABLE
, DROP TABLE
, OWNER TO
A subscription requires a replication slot, but this is usually created for you, unless you specify create_slot = false
CREATE SUBSCRIPTION sub_one
CONNECTION 'host=192.168.0.23 port=5432 user=somebody dbname=some_db'
PUBLICATION pub_one
-- WITH (enabled = false)
;
A conflict on the local node will HALT replication. It must be corrected manually
Trigger based logical: Slony, Bucardo, xDB
pgpool-ii streaming
File system
Handling a failover event
Cascading replicas
Taking advantage of replicas for read-only queries
Building a datamart or reporting IDW
This presentation was built in reveal.js