I am testing out Always On replication on SQL Server with a cool docker setup that includes primary and secondary https://github.com/rafaelrodrigues3092/docker-mssql-alwayson
When I created data on the primary there was never any lag on the secondary so I wanted to make sure my lag values were correct and wanted to introduce network latency between the primary and secondary to introduce lag time. For this there is a neat tool called “tc” that can introduced lag. For example on either or both of the docker containers , I can set up lag with
# add lag
tc qdisc add dev eth0 root netem delay 100ms
# delete lag
tc qdisc del dev eth0 root netem delay 100ms
#show lag
tc qdisc
Problem is, there was no “tc” on my docker containers and yum couldn’t find it (it is part of iproute2)
# yum install iproute2 Last metadata expiration check: 1:34:54 ago on Wed Apr 13 20:10:03 2022. No match for argument: iproute2 Error: Unable to find a match: iproute2
I figure there is something wrong with the way the yum repos are setup on the docker images and I poked around there for a while but couldn’t get it to work, so then I tried to install by hand which turned out more onerous than I thought, so here I’m just documenting what was needed
create directory tc_build cd tc_build curl -o iproute2.tz https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/snapshot/iproute2-5.17.0.tar.gz curl -o bison.gz http://ftp.gnu.org/gnu/bison/bison-3.8.tar.gz curl -o gettext.gz https://ftp.gnu.org/pub/gnu/gettext/gettext-0.21.tar.gz curl -o texinfo-6.8.tar.gz https://ftp.gnu.org/gnu/texinfo/texinfo-6.8.tar.gz wget https://github.com/westes/flex/releases/download/v2.6.4/flex-2.6.4.tar.gz yum install make yum install m4 yum install libtool yum install diffutils for i in *z; do echo $i ; tar xvf $i ; done mkdir trash mv *z trash cd textinfo* ./autogen.sh ./configure make make install cd ../gettext* make cd ../flex* make cd ../bison* make cd ../iproute2* make
The above isn’t exact but should get you mostly there. When it comes to “make” in each directory its some variation of
- autogen.sh
- configure
- make
- make install
Not every directory needed all 4 commands but unfortunately didn’t take detailed enough notes to say which needed which. The most onerous part was “gettext” which was needed just for “autopoint” but I didn’t find a way just to install “autopoint” and “gettext” is big and takes a while to compile and install. Once it was all done “tc” was there. Then I just tar’ed the “tc” commands in /usr/sbin in tc.tar.gz which I can copy to a docker container and voila it’s there.
Had to modify the docker-compose.yml to include
cap_add: - NET_ADMIN
then after starting the containers , on both the primary and secondary, I set network latency
tc qdisc add dev eth0 root netem delay 500ms
The docker primary comes with SALES database that is replicated to the secondary, do I go on the primary, use SALES and create a bunch of data:
use sales;
create table foo(id int, content varchar(2000)) ;
WITH mycte AS (
SELECT 1 DataValue
UNION all
SELECT DataValue + 1
FROM mycte
WHERE DataValue + 1 <= 100000
)
INSERT INTO foo(id,content)
SELECT DataValue,REPLICATE (NEWID() ,55) --
FROM mycte m
OPTION (MAXRECURSION 0)
;
There is a nice sql query for monitoring Always On here https://dba.stackexchange.com/questions/278324/log-send-queue-size-and-redo-queue-size-in-ag
Here’s a short version of the query
SELECT --AG info AGName = ag.name, ReplicaServer = ar.replica_server_name, DbName = adc.database_name, Failover_ready = ar.failover_mode_desc, -- State Info IsLocal = drs.is_local, IsPrimary = drs.is_primary_replica, SynchronizationState = drs.synchronization_state_desc, SynchronizationHealthState = drs.synchronization_health_desc, -- Queue Size & rates SendQueueSize = drs.log_send_queue_size, RedoQueueSize = drs.redo_queue_size, SendRateKb = drs.log_send_rate, RedoRateKb = drs.redo_rate, --Oh yeah, filestream, too FileStreamSendRate = drs.filestream_send_rate, drs.Secondary_lag_seconds FROM sys.dm_hadr_database_replica_states AS drs JOIN sys.availability_databases_cluster AS adc ON drs.group_id = adc.group_id AND drs.group_database_id = adc.group_database_id JOIN sys.availability_groups AS ag ON ag.group_id = drs.group_id JOIN sys.availability_replicas AS ar ON drs.group_id = ar.group_id AND drs.replica_id = ar.replica_id ORDER BY -- ag.name, ar.replica_server_name, adc.database_name; drs.log_send_queue_size + drs.redo_queue_size DESC;
primary
-[ RECORD 1 ]------------------------- AGName | AG1 ReplicaServer | db2 DbName | SALES Failover_ready | MANUAL IsLocal | 0 IsPrimary | 0 SynchronizationState | SYNCHRONIZED SynchronizationHealthState | HEALTHY SendQueueSize | 6904 RedoQueueSize | 9544 SendRateKb | 5113 RedoRateKb | 41343 FileStreamSendRate | 0 Secondary_lag_seconds | 0 -[ RECORD 2 ]------------------------- AGName | AG1 ReplicaServer | db1 DbName | SALES Failover_ready | MANUAL IsLocal | 1 IsPrimary | 1 SynchronizationState | SYNCHRONIZED SynchronizationHealthState | HEALTHY SendQueueSize | NULL RedoQueueSize | NULL SendRateKb | NULL RedoRateKb | NULL FileStreamSendRate | NULL Secondary_lag_seconds | NULL
secondary
-[ RECORD 1 ]------------------------- AGName | AG1 ReplicaServer | db2 DbName | SALES Failover_ready | MANUAL IsLocal | 1 IsPrimary | 0 SynchronizationState | SYNCHRONIZED SynchronizationHealthState | HEALTHY SendQueueSize | 60 RedoQueueSize | 29068 SendRateKb | 4421 RedoRateKb | 41462 FileStreamSendRate | 0 Secondary_lag_seconds | NULL
another thing you can do is read the logs for Always On info
EXEC xp_ReadErrorLog 0,1,"Always" -[ RECORD 1 ]------------------------- LogDate | 2022-04-13 18:47:56.440 ProcessInfo | spid27s Text | Always On: The availability replica manager is starting. This is an informational message only. No user action is required. -[ RECORD 2 ]------------------------- LogDate | 2022-04-13 18:47:56.590 ProcessInfo | spid27s Text | Always On: The availability replica manager is waiting for the instance of SQL Server to allow client connections. This is an informational message only. No user action is required. -[ RECORD 3 ]------------------------- LogDate | 2022-04-13 18:48:18.900 ProcessInfo | spid53 Text | Always On: The local replica of availability group 'AG1' is preparing to transition to the primary role. This is an informational message only. No user action is required. -[ RECORD 4 ...