做数据图表网站,wordpress外链mp4,世界工厂采购网站,网站设计制作电影现象从某个时间点开始#xff0c;从库的复制延迟持续增加#xff0c;且没有下降的趋势。数据库版本#xff1a;8.0.40#xff0c;事务隔离级别 RC#xff08;Read Committed#xff09;#xff0c;并行重放线程数#xff08;replica_parallel_workers#xff09;为 8。…现象从某个时间点开始从库的复制延迟持续增加且没有下降的趋势。数据库版本8.0.40事务隔离级别 RCRead Committed并行重放线程数replica_parallel_workers为 8。分析过程通过show slave status\G查看发现Relay_Master_Log_File和Exec_Master_Log_Pos都在变化只不过变化得比较慢。刚开始怀疑是主库写入量较大导致的后来通过mysql-binlog-time-extractor具体用法可参考分享一个 MySQL binlog 分析小工具分析发现主库的写入量在刚开始出现延迟时2025-09-01 09:30并不大反倒是写入量大的时间段2025-09-01 04:57:53 - 2025-09-01 05:02:42没有出现延迟。-------------------------------------------------------------------------------------------------------------------------------------| Log_name | File_size | Start_time | End_time | Duration | GTID |-------------------------------------------------------------------------------------------------------------------------------------| binary-log.005565 | 1302499830 (1.21 GB) | 2025-09-01 04:57:53 | 2025-09-01 04:58:22 | 00:00:29 | 1284696693-1284699126 || binary-log.005566 | 1105002721 (1.03 GB) | 2025-09-01 04:58:22 | 2025-09-01 04:58:23 | 00:00:01 | 1284699127-1284699312 || binary-log.005567 | 1273545902 (1.19 GB) | 2025-09-01 04:58:23 | 2025-09-01 05:02:33 | 00:04:10 | 1284699313-1284728539 || binary-log.005568 | 1287820910 (1.20 GB) | 2025-09-01 05:02:33 | 2025-09-01 05:02:42 | 00:00:09 | 1284728540-1284729282 |...| binary-log.005633 | 58514304 (55.80 MB) | 2025-09-01 09:12:53 | 2025-09-01 09:17:53 | 00:05:00 | 1286735216-1286786118 || binary-log.005634 | 58955596 (56.22 MB) | 2025-09-01 09:17:53 | 2025-09-01 09:22:53 | 00:05:00 | 1286786119-1286834568 || binary-log.005635 | 71508778 (68.20 MB) | 2025-09-01 09:22:53 | 2025-09-01 09:27:53 | 00:05:00 | 1286834569-1286880281 || binary-log.005636 | 107107179 (102.15 MB) | 2025-09-01 09:27:53 | 2025-09-01 09:32:53 | 00:05:00 | 1286880282-1286942223 || binary-log.005637 | 530205055 (505.64 MB) | 2025-09-01 09:32:53 | 2025-09-01 09:37:53 | 00:05:00 | 1286942224-1287246612 || binary-log.005638 | 546754562 (521.43 MB) | 2025-09-01 09:37:53 | 2025-09-01 09:42:53 | 00:05:00 | 1287246613-1287562930 || binary-log.005639 | 528677634 (504.19 MB) | 2025-09-01 09:42:53 | 2025-09-01 09:47:53 | 00:05:00 | 1287562931-1287868985 |-------------------------------------------------------------------------------------------------------------------------------------查看该实例的错误日志发现有大量的锁等待超时报错。需要注意的是这个实例的事务隔离级别是 RC。在该级别下MySQL 通常只会加记录锁。此外该实例启用了 WRITESET 并行复制MySQL 会根据事务修改的主键或唯一索引来判断是否可并行执行。换句话说如果两个事务在主键或唯一索引上存在冲突它们将无法并行重放。理论上在这种机制组合下从库在重放过程不应发生锁等待超时。随后使用binlog_summary.py具体用法可参考Binlog分析利器-binlog_summary.py对延迟开始时段的四个 binlog 文件 binary-log.005636 ~ binary-log.005639 进行了分析发现这些 binlog 的操作模式十分相似操作次数排名前两位的均为同一张表biz_schema.tbl_product_service_mapping01的 DELETE 与 INSERT 操作。# python3 binlog_summary.py -f binary-log.005636.txt -c opr --newTABLE_NAME DML_TYPE NUMSbiz_schema.tbl_product_service_mapping01 INSERT 71271biz_schema.tbl_product_service_mapping01 DELETE 67434...写了个简单的脚本测试了下发现对于相同的唯一索引值INSERT操作总是出现在对应的DELETE操作之后于是写了个脚本将 DELETE操作涉及的记录提取出来并插入到测试库中然后将相关 binlog 当作 relay log 进行重放。为了排除其它表的干扰在重放时设置了replicate-do-table biz_schema.tbl_product_service_mapping01只重放这一张表。下面是具体的重放步骤1. 初始化 relay logCHANGE MASTER TO MASTER_HOSTdummy;STOP SLAVE;RESET SLAVE ALL;执行上述命令后MySQL 会在当前数据目录下生成两个文件instance-20250903-0701-relay-bin.000001第一个 relay log 文件instance-20250903-0701-relay-bin.indexrelay log 索引文件其中instance-20250903-0701 是主机名。2. 替换掉 relay log用 binary-log.005636 替换掉 instance-20250903-0701-relay-bin.000001并修改该文件的属主。# cp binary-log.005636 /data/mysql/3306/data/instance-20250903-0701-relay-bin.000001# chown mysql.mysql /data/mysql/3306/data/instance-20250903-0701-relay-bin.0000013. 启动 SQL 线程进行重放CHANGE MASTER TO RELAY_LOG_FILEinstance-20250903-0701-relay-bin.000001, RELAY_LOG_POS1, MASTER_HOSTdummy;START SLAVE SQL_THREAD;结果发现能成功重放且重放过程中未出现任何报错。测试了三次重放时间分别为 362.74s、352.69s、361.75s平均耗时 359.06 秒。每次重放过程中错误日志中都出现了多次锁等待超时错误。2025-09-21T07:53:45.257279-00:00 260 [Warning] [MY-010584] [Repl] Slave SQL for channel : Worker 5 failed executing transaction 9206ff59-2d95-4a02-88cf-04d97adfdd65:1286917678 at master log , end_log_pos 63251784; Could not execute Write_rows event on table biz_schema.tbl_product_service_mapping01; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the events master log FIRST, end_log_pos 63251784, Error_code: MY-001205很显然锁等待超时是并行重放导致的。如果是单线程重放就能规避这个问题于是将replica_parallel_workers设置为 1重新执行相同的测试三次重放时间分别为 82.39s、83.40s、83.43s平均仅 83.07 秒。想不到单线程重放竟然比多线程快了四倍多。接下来重点分析下锁等待超时问题。为什么会出现锁等待以下是出现锁等待时sys.innodb_lock_waits的输出mysql select * from sys.innodb_lock_waits\G*************************** 1.row ***************************wait_started: 2025-10-1213:11:29wait_age: 08:00:33wait_age_secs: 28833locked_table: biz_schema.tbl_product_service_mapping01locked_table_schema: biz_schemalocked_table_name: tbl_product_service_mapping01locked_table_partition: NULLlocked_table_subpartition: NULLlocked_index: tbl_product_service_pklocked_type: RECORDwaiting_trx_id: 5221288waiting_trx_started: 2025-10-1213:11:22waiting_trx_age: 08:00:40waiting_trx_rows_locked: 35waiting_trx_rows_modified: 34waiting_pid: 10waiting_query: INSERT IGNORE INTO tbl_product ... (10512475, 1073743289) ,waiting_lock_id: 140256432120808:10011:67:240:263:140256317861168waiting_lock_mode: X,GAP,INSERT_INTENTIONblocking_trx_id: 5221291blocking_pid: 14blocking_query: INSERT IGNORE INTO tbl_product ... (10512476, 1073743289)blocking_lock_id: 140256432125848:9282:67:240:263:140256317891856blocking_lock_mode: S,GAPblocking_trx_started: 2025-10-1213:11:22blocking_trx_age: 08:00:40blocking_trx_rows_locked: 35blocking_trx_rows_modified: 34sql_kill_blocking_query: KILL QUERY 14sql_kill_blocking_connection: KILL 14*************************** 2.row ***************************wait_started: 2025-10-1213:11:29wait_age: 08:00:33wait_age_secs: 28833locked_table: biz_schema.tbl_product_service_mapping01locked_table_schema: biz_schemalocked_table_name: tbl_product_service_mapping01locked_table_partition: NULLlocked_table_subpartition: NULLlocked_index: tbl_product_service_pklocked_type: RECORDwaiting_trx_id: 5221291waiting_trx_started: 2025-10-1213:11:22waiting_trx_age: 08:00:40waiting_trx_rows_locked: 35waiting_trx_rows_modified: 34waiting_pid: 14waiting_query: INSERT IGNORE INTO tbl_product ... (10512476, 1073743289)waiting_lock_id: 140256432125848:9282:67:240:260:140256317892560waiting_lock_mode: X,GAP,INSERT_INTENTIONblocking_trx_id: 5221289blocking_pid: 12blocking_query: NULLblocking_lock_id: 140256432123832:9816:67:240:260:140256317879376blocking_lock_mode: S,GAPblocking_trx_started: 2025-10-1213:11:22blocking_trx_age: 08:00:40blocking_trx_rows_locked: 34blocking_trx_rows_modified: 33sql_kill_blocking_query: KILL QUERY 12sql_kill_blocking_connection: KILL 122rowsinset (0.01 sec)可以看出PID 10 的 INSERT 操作被 PID 14 持有的 S,GAP 锁阻塞。PID 14的 INSERT 操作又被 PID 12 持有的 S,GAP 锁阻塞。使用performance_schema.data_locks可以获取更详细的锁信息包括被锁定的数据行SELECTw.REQUESTING_ENGINE_TRANSACTION_ID AS waiting_trx_id,w.BLOCKING_ENGINE_TRANSACTION_ID AS blocking_trx_id,l1.LOCK_MODE AS waiting_lock_mode,l1.LOCK_DATA AS waiting_lock_data,l2.LOCK_MODE AS blocking_lock_mode,l2.LOCK_DATA AS blocking_lock_dataFROM performance_schema.data_lock_waits AS wJOIN performance_schema.data_locks AS l1ON w.REQUESTING_ENGINE_LOCK_ID l1.ENGINE_LOCK_IDJOIN performance_schema.data_locks AS l2ON w.BLOCKING_ENGINE_LOCK_ID l2.ENGINE_LOCK_ID;---------------------------------------------------------------------------------------------------------------------------------| waiting_trx_id | blocking_trx_id | waiting_lock_mode | waiting_lock_data | blocking_lock_mode | blocking_lock_data |---------------------------------------------------------------------------------------------------------------------------------| 5221288 | 5221291 | X,GAP,INSERT_INTENTION | 10512476, 1, 18158557178 | S,GAP | 10512476, 1, 18158557178 || 5221291 | 5221289 | X,GAP,INSERT_INTENTION | 10512477, 1, 18158557146 | S,GAP | 10512477, 1, 18158557146 |---------------------------------------------------------------------------------------------------------------------------------2 rows in set (0.00 sec)接下来从show processlist的输出中看看 PID 10、14、12 这三个线程的状态。mysql show processlist;----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| Id | User | Host | db | Command | Time | State | Info |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| 5 | event_scheduler | localhost | NULL | Daemon | 1813 | Waiting on empty queue | NULL || 8 | root | localhost | biz_schema | Query | 0 | init | show processlist || 9 | system user | | NULL | Query | 1588 | Waiting for dependent transaction to commit | NULL || 10 | system user | | biz_schema | Query | 3585949 | Applying batch of row changes (write) | INSERT IGNORE INTO tbl_product_service_mapping01(c1,c2 || 11 | system user | | NULL | Query | 3585949 | Waiting for preceding transaction to commit | NULL || 12 | system user | | NULL | Query | 3585949 | Waiting for preceding transaction to commit | NULL || 13 | system user | | NULL | Query | 3585949 | Waiting for preceding transaction to commit | NULL || 14 | system user | | biz_schema | Query | 3585949 | Applying batch of row changes (write) | INSERT IGNORE INTO tbl_product_service_mapping01(c1,c2 || 15 | system user | | NULL | Query | 3585949 | Waiting for preceding transaction to commit | NULL || 16 | system user | | NULL | Query | 3585949 | Waiting for an event from Coordinator | NULL || 17 | system user | | NULL | Query | 3585949 | Waiting for an event from Coordinator | NULL |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------11rowsinset, 1warning (0.00 sec)PID 12 的执行用户是system user说明它是并行重放的工作线程其状态为Waiting for preceding transaction to commit表示该线程正在等待它前面的事务提交完成。而 PID 10 和 PID 14 的状态均为Applying batch of row changes (write)。从字面上看似乎是在执行批量写入操作但实际上这两个线程正在等待锁。如果执行的是SHOW FULL PROCESSLISTInfo列的INSERT IGNORE操作中还可以看到具体要插入的唯一索引值。借助这些唯一索引值可以在 binlog 中精确定位对应的执行位置便于分析事务执行顺序和锁等待情况。不过对于 PID 12由于Info列为NULL无法直接看到具体的 DML 操作因此难以定位其执行内容。为了解决这个问题我在Slave_worker::slave_worker_exec_event函数中在调用ev-do_apply_event_worker(this)的前后分别添加了日志打印。这样就能清楚地看到每个工作线程正在执行的 event 的 binlog 位置点信息。int Slave_worker::slave_worker_exec_event(Log_event *ev) {...ulong thread_id thd-thread_id();ulong log_pos static_castulong(ev-common_header-log_pos);std::string msg Executing event: worker_thread_id std::to_string(thread_id) , master_log_pos std::to_string(log_pos);LogErr(INFORMATION_LEVEL, ER_CONDITIONAL_DEBUG, msg.c_str());ret ev-do_apply_event_worker(this);msg Done executing event: worker_thread_id std::to_string(thread_id) , master_log_pos std::to_string(log_pos);LogErr(INFORMATION_LEVEL, ER_CONDITIONAL_DEBUG, msg.c_str());return ret;}下面是锁等待发生时PID 10、12、14 正在执行的 binlog event 位置点信息# PID 10grep worker_thread_id10 /data/mysql/3306/data/mysqld.err | tail -12025-10-12T13:11:22.639120-00:00 10 [Note] [MY-013935] [Repl] Executing event: worker_thread_id10, master_log_pos63245428# PID 12grep worker_thread_id12 /data/mysql/3306/data/mysqld.err | tail -12025-10-12T13:11:22.725638-00:00 12 [Note] [MY-013935] [Repl] Executing event: worker_thread_id12, master_log_pos63248672# PID 14grep worker_thread_id14 /data/mysql/3306/data/mysqld.err | tail -12025-10-12T13:11:22.646870-00:00 14 [Note] [MY-013935] [Repl] Executing event: worker_thread_id14, master_log_pos63251784可以看到PID 10 对应的事务在 binlog 中的位置早于 PID 12。当参数replica_preserve_commit_order设置为 ON 时从库必须严格按照主库的提交顺序依次提交事务因此 PID 12 必须等待 PID 10 提交完成才能继续执行。结合锁依赖关系就形成了一个循环等待的局面PID 10 等待 PID 14 持有的 S,GAP 锁PID 14 等待 PID 12 持有的 S,GAP 锁PID 12 因提交顺序限制必须等待 PID 10 提交事务。最终这种环路导致三个线程相互阻塞直到锁等待超时MySQL 才会重新执行这些事务。模拟从库的重放操作根据获取到的 PID 10、12、14 对应的 event 位置点信息我们可以还原出锁等待发生时这三个线程正在执行的具体操作worker_thread_id10 master_log_pos63245428: insert c1 10512475 的所有记录如(10512475,1),(10512475,20)...worker_thread_id12 master_log_pos63248672: insert c1 10512477 的所有记录如(10512477,1),(10512477,20)...worker_thread_id14 master_log_pos63251784: insert c1 10512476 的所有记录如(10512476,1),(10512476,20)...这里的记录值对应的是表的联合唯一索引其中c1是联合索引的第一列。值得注意的是在这些INSERT操作之前binlog 中还存在针对相同c1值的DELETE操作delete c1 10512475 的所有记录如(10512475,1),(10512475,20)...delete c1 10512477 的所有记录如(10512477,1),(10512477,20)...delete c1 10512476 的所有记录如(10512476,1),(10512476,20)...也就是说业务实际上是通过 DELETE INSERT 的方式实现数据更新。为了进一步分析锁等待问题我打印了重放过程中每个INSERT操作的具体内容。2025-10-12T13:11:22.639267-00:00 10 [Note] [MY-013935] [Repl] Inserted row: (18158557112, 10512475, 1, ...)2025-10-12T13:11:22.640729-00:00 10 [Note] [MY-013935] [Repl] Inserted row: (18158557113, 10512475, 20, ...)2025-10-12T13:11:22.642004-00:00 10 [Note] [MY-013935] [Repl] Inserted row: (18158557114, 10512475, 26, ...)2025-10-12T13:11:22.643344-00:00 10 [Note] [MY-013935] [Repl] Inserted row: (18158557115, 10512475, 123, ...)2025-10-12T13:11:22.644262-00:00 12 [Note] [MY-013935] [Repl] Inserted row: (18158557146, 10512477, 1, ...)2025-10-12T13:11:22.644663-00:00 10 [Note] [MY-013935] [Repl] Inserted row: (18158557116, 10512475, 131, ...)2025-10-12T13:11:22.646250-00:00 12 [Note] [MY-013935] [Repl] Inserted row: (18158557147, 10512477, 20, ...)2025-10-12T13:11:22.647020-00:00 14 [Note] [MY-013935] [Repl] Inserted row: (18158557178, 10512476, 1, ...)2025-10-12T13:11:22.647192-00:00 10 [Note] [MY-013935] [Repl] Inserted row: (18158557117, 10512475, 133, ...)其中第一个值是自增主键后两个值是唯一索引列。从输出可以看到这三个事务的插入操作是交叉执行的。模拟从库重放过程下面通过一个实验来模拟从库的重放操作。首先在会话 1 中创建测试表并插入数据。session1 create table test.t1(id bigint auto_increment primary key,c1 int,c2 int,unique key(c1,c2));Query OK, 0 rows affected (0.06 sec)session1 insert into test.t1(c1,c2) values(10512475, 1),(10512475, 2),(10512476, 1),(10512476, 2),(10512477, 1),(10512477, 2);Query OK, 6 rows affected (0.04 sec)Records: 6 Duplicates: 0 Warnings: 0session1 select * from test.t1;--------------------| id | c1 | c2 |--------------------| 1 | 10512475 | 1 || 2 | 10512475 | 2 || 3 | 10512476 | 1 || 4 | 10512476 | 2 || 5 | 10512477 | 1 || 6 | 10512477 | 2 |--------------------6 rows in set (0.00 sec)其次在会话 2 中针对另外一张表执行FLUSH TABLES FOR EXPORT操作至于为什么要执行这个操作后续加锁分析部分会解释。session2 flush tables test.t2 for export;Query OK, 0 rows affected (0.01 sec)接着在会话 1 中删除表中数据。session1 delete from test.t1;Query OK, 6 rows affected (0.02 sec)接着分别创建三个新的会话执行如下操作session3 begin;Query OK, 0 rows affected (0.00 sec)session3 insert into test.t1(c1,c2,id) values(10512475,1,100);Query OK, 1 row affected (0.01 sec)session4 begin;Query OK, 0 rows affected (0.00 sec)session4 insert into test.t1(c1,c2,id) values(10512476,1,18158557178);Query OK, 1 row affected (0.00 sec)session5 begin;Query OK, 0 rows affected (0.00 sec)session5 insert into test.t1(c1,c2,id) values(10512477,1,18158557146);Query OK, 1 row affected (0.01 sec)继续在会话 3 和 会话 4 中插入数据。session3 set session innodb_lock_wait_timeout5000;Query OK, 0 rows affected (0.00 sec)session3 insert into test.t1(c1,c2) values(10512475, 2);-- 阻塞中...session4 set session innodb_lock_wait_timeout5000;Query OK, 0 rows affected (0.00 sec)session4 insert into test.t1(c1,c2) values(10512476, 2);-- 阻塞中...接着在会话 2 中执行UNLOCK TABLES操作释放表锁。session2 unlock tables;Query OK, 0 rows affected (0.00 sec)在会话 5 中查看锁等待信息。---------------------------------------------------------------------------------------------------------------------------------| waiting_trx_id | blocking_trx_id | waiting_lock_mode | waiting_lock_data | blocking_lock_mode | blocking_lock_data |---------------------------------------------------------------------------------------------------------------------------------| 23228 | 23229 | X,GAP,INSERT_INTENTION | 10512477, 1, 18158557146 | S,GAP | 10512477, 1, 18158557146 || 23225 | 23228 | X,GAP,INSERT_INTENTION | 10512476, 1, 18158557178 | S,GAP | 10512476, 1, 18158557178 |---------------------------------------------------------------------------------------------------------------------------------2 rows in set (0.01 sec)可以看到该结果与重放过程中出现锁等待时的输出完全一致。加锁分析接下来我们重点分析一下为什么在 RCRead Committed事务隔离级别下会产生 GAP 锁毕竟在大多数人的印象中RC 隔离级别下只会存在记录锁而不会出现间隙锁。事实上这与 INSERT 操作之前执行的 DELETE 操作 有直接关系。在前面的例子中我们在 binlog 中发现在执行 INSERT 操作之前存在针对相同记录的 DELETE 操作。在 MySQL 中DELETE 操作并不会立即物理删除数据而是将记录标记为“已删除”delete-marked等待后台的 purge 线程异步清理。这意味着在逻辑删除之后这些记录仍然可能暂时保留在索引页中。当随后执行 INSERT 操作时如果待插入的记录在唯一索引上与某条“已标记删除但尚未清除”的记录键值相同MySQL 会执行如下加锁行为对该索引项加上 S 锁同时对该索引项的间隙即该记录与下一条记录之间的范围加上 S,GAP 锁。下面我们通过一个简化的实验来验证这一点。实验验证在前面的重放示例中我们执行了FLUSH TABLES FOR EXPORT操作。执行这个操作的目的是为了暂停 purge 线程从而保留 delete-marked 记录便于重现这种锁行为。# 会话 1创建测试表并插入数据session1 create table test.t1(id bigint auto_increment primary key,c1 int,c2 int,unique key(c1,c2));Query OK, 0 rows affected (0.07 sec)session1 insert into test.t1(c1,c2) values(10512476, 1),(10512476, 2);Query OK, 2 rows affected (0.04 sec)Records: 2 Duplicates: 0 Warnings: 0session1 select * from test.t1;--------------------| id | c1 | c2 |--------------------| 1 | 10512476 | 1 || 2 | 10512476 | 2 |--------------------2 rows in set (0.00 sec)# 会话 2暂停 purge 线程session2 flush tables test.t2 for export;Query OK, 0 rows affected (0.03 sec)# 会话 1删除数据session1 delete from test.t1;Query OK, 2 rows affected (0.01 sec)# 会话 3开启事务并插入数据session3 begin;Query OK, 0 rows affected (0.00 sec)session3 insert into test.t1(c1,c2,id) values(10512476,1,18158557178);Query OK, 1 row affected (0.01 sec)# 查看锁信息session3 select object_schema, object_name, index_name, lock_type, lock_mode, lock_status, lock_data from performance_schema.data_locks;-----------------------------------------------------------------------------------------------------| object_schema | object_name | index_name | lock_type | lock_mode | lock_status | lock_data |-----------------------------------------------------------------------------------------------------| test | t1 | NULL | TABLE | IX | GRANTED | NULL || test | t1 | c1 | RECORD | S | GRANTED | 10512476, 1, 1 || test | t1 | c1 | RECORD | S,GAP | GRANTED | 10512476, 2, 2 || test | t1 | c1 | RECORD | S,GAP | GRANTED | 10512476, 1, 18158557178 |-----------------------------------------------------------------------------------------------------4 rows in set (0.00 sec)锁行为解释在插入 (10512476, 1, 18158557178) 这条记录时由于 (10512476, 1) 仍存在于索引中虽然被标记删除MySQL 会对该记录加上 S 锁同时对(10512476, 1) 的下一条记录 (10512476, 2) 加上 S,GAP 锁防止该间隙范围内被其他事务插入新记录此外插入的新记录 (10512476, 1, 18158557178) 还会继承下一条记录 (10512476, 2) 的 GAP 锁。其中前两种锁的加锁逻辑是在row_ins_scan_sec_index_for_duplicate()中实现的锁继承的逻辑是在lock_rec_add_to_queue()中实现的。当执行UNLOCK TABLES后purge 线程恢复运行会清理掉之前的 delete-marked 记录对应的锁也会被释放。但可以看到新插入记录自身的 GAP 锁仍然保留session2 unlock tables;Query OK, 0 rows affected (0.05 sec)session3 select object_schema, object_name, index_name, lock_type, lock_mode, lock_status, lock_data from performance_schema.data_locks;-----------------------------------------------------------------------------------------------------| object_schema | object_name | index_name | lock_type | lock_mode | lock_status | lock_data |-----------------------------------------------------------------------------------------------------| test | t1 | NULL | TABLE | IX | GRANTED | NULL || test | t1 | c1 | RECORD | S,GAP | GRANTED | 10512476, 1, 18158557178 |-----------------------------------------------------------------------------------------------------2 rows in set (0.01 sec)优化方案针对上面分析的锁等待案例优化主要可以从应用侧和数据库侧两方面入手。1. 应用侧优化应用层面的改进主要集中在索引设计与更新逻辑上将原本的唯一索引改为普通二级索引。将自增主键去掉直接用原来的唯一索引列作为主键。与普通唯一索引不同主键在插入时即使遇到已经删除的记录也不会额外加 S,GAP 锁。优化更新逻辑。尽量避免通过DELETE INSERT的方式更新数据可以考虑使用UPDATE或者其他业务逻辑调整以减少对间隙锁的触发。2. 数据库侧优化将replica_preserve_commit_order设置为 OFF允许从库在遇到事务等待环路时独立提交事务而无需等待其他事务完成。不过需要注意的是如果使用的 Group Replication会要求该参数必须为 ON。下表展示了不同方案下的从库重放性能对比方案 三次平均执行时间秒唯一索引 replica_parallel_workers 8 359.06唯一索引 replica_parallel_workers 1 83.07普通索引 replica_parallel_workers 8 33.50唯一索引 replica_parallel_workers 8 replica_preserve_commit_order OFF 21.11