InnoDB: broken FIL_PAGE_NEXT or FIL_PAGE_PREV links 异常处理_综合

##问题描述

研发兄弟反馈说，他们的数据库中有一些表不能正常查询。

开发同学在线cats 操作复制测试表，导致linux 服务器目录空间满，然后他们自己强制重启了mysql。。。。。。

mysql-lixora > select * from test_part;
ERROR 2013 (HY000): Lost connection to MySQL server during query

##故障日志：

2021-12-01T01:28:14.280281Z 8 [ERROR] [MY-013051] [InnoDB] In pages [page id: space=48, page number=161767] and [page id: space=48, page number=936] of index `idx_time_code` of table `test`.`em_data_indicator_copy1`
InnoDB: broken FIL_PAGE_NEXT or FIL_PAGE_PREV links
2021-12-01T01:28:18.572400Z 8 [ERROR] [MY-013051] [InnoDB] In pages [page id: space=48, page number=160379] and [page id: space=48, page number=947] of index `idx_time_code` of table `test`.`em_data_indicator_copy1`
InnoDB: broken FIL_PAGE_NEXT or FIL_PAGE_PREV links
2021-12-01T01:28:18.572468Z 8 [ERROR] [MY-013051] [InnoDB] In pages [page id: space=48, page number=160379] and [page id: space=48, page number=947] of index `idx_time_code` of table `test`.`em_data_indicator_copy1`
InnoDB: records in wrong order on adjacent pages
InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 6; hex 323032313035; asc 202105;;
1: len 30; hex 64373134386566642d313364372d343264612d393632622d346363303165; asc d7148efd-13d7-42da-962b-4cc01e; (total 36 bytes);

InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 6; hex 323032313035; asc 202105;;
1: len 30; hex 64363038623761622d333631632d343164342d396331652d336233333061; asc d608b7ab-361c-41d4-9c1e-3b330a; (total 36 bytes);

2021-12-01T01:28:18.579572Z 8 [ERROR] [MY-011853] [InnoDB] Corruption of an index tree: table `test`.`em_data_indicator_copy1` index `idx_time_code`, father ptr page no 160379, child page no 947
PHYSICAL RECORD: n_fields 2; compact format; info bits 0
0: len 6; hex 323032313035; asc 202105;;
1: len 30; hex 64363038623761622d333631632d343164342d396331652d336233333061; asc d608b7ab-361c-41d4-9c1e-3b330a; (total 36 bytes);
PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 6; hex 323032313035; asc 202105;;
1: len 30; hex 64363038623761622d333631632d343164342d396331652d336233333061; asc d608b7ab-361c-41d4-9c1e-3b330a; (total 36 bytes);
2: len 4; hex 0002727b; asc r{;;
2021-12-01T01:28:18.579799Z 8 [ERROR] [MY-011854] [InnoDB] [FATAL] You should dump + drop + reimport the table to fix the corruption. If the crash happens at database startup. Please refer to http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html for information about forcing recovery. Then dump + drop + reimport.
2021-12-01T01:28:18.582323Z 8 [ERROR] [MY-013183] [InnoDB] Assertion failure: ut0ut.cc:552 thread 140383893661440
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
01:28:18 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7fad84008410
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fadac1d1d90 thread_stack 0x46000
/usr/libexec/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x5610680bbdf1]
/usr/libexec/mysqld(handle_fatal_signal+0x32b) [0x561067138cdb]
/lib64/libpthread.so.0(+0x12b20) [0x7fadc3a0bb20]
/lib64/libc.so.6(gsignal+0x10f) [0x7fadc0e797ff]
/lib64/libc.so.6(abort+0x127) [0x7fadc0e63c35]
/usr/libexec/mysqld(+0xe675f6) [0x561066e9b5f6]
/usr/libexec/mysqld(ib::fatal::~fatal()+0xd8) [0x5610683a7978]
/usr/libexec/mysqld(+0x23a58cb) [0x5610683d98cb]
/usr/libexec/mysqld(+0x23a6dfe) [0x5610683dadfe]
/usr/libexec/mysqld(btr_validate_index(dict_index_t*, trx_t const*, bool)+0x3fd) [0x5610683db56d]
/usr/libexec/mysqld(ha_innobase::check(THD*, HA_CHECK_OPT*)+0x3d6) [0x56106819f8d6]
/usr/libexec/mysqld(handler::ha_check(THD*, HA_CHECK_OPT*)+0x15b) [0x56106724cfcb]
/usr/libexec/mysqld(+0x141623a) [0x56106744a23a]
/usr/libexec/mysqld(Sql_cmd_check_table::execute(THD*)+0x9d) [0x56106744b19d]
/usr/libexec/mysqld(mysql_execute_command(THD*, bool)+0x20bd) [0x5610670003dd]
/usr/libexec/mysqld(mysql_parse(THD*, Parser_state*)+0x384) [0x561067003cc4]
/usr/libexec/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x162b) [0x56106700577b]
/usr/libexec/mysqld(do_command(THD*)+0x1a4) [0x561067006d84]
/usr/libexec/mysqld(+0x10f5ff0) [0x561067129ff0]
/usr/libexec/mysqld(+0x25d6cb8) [0x56106860acb8]
/lib64/libpthread.so.0(+0x814a) [0x7fadc3a0114a]
/lib64/libc.so.6(clone+0x43) [0x7fadc0f3ef23]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fad84e3ac38): is an invalid pointer
Connection ID (thread ID): 8
Status: NOT_KILLED

#故障分析：

InnoDB: broken FIL_PAGE_NEXT or FIL_PAGE_PREV links

看到innodb 数据页双向链表出现问题(比如FILE_PAGE_NEXT FIL_PAGE_PREV）

根据报错显示：In pages [page id: space=48, page number=161767] and [page id: space=48, page number=936] of index `idx_time_code` of table `test`.`em_data_indicator_copy1，异常page 为辅助索引的page

以page 936，161767为例

current page:00000936 00027277 0000bd3c ---》160375，48444
current page:00161767 00027277 000003a8 ---》160375，936

发现page 936，161767的PAGE_PREV 都指向了page 160375，所以mysql 报了数据页双向链表出现问题;

我们看下page 160375 这个pege 中记录的PAGE_PREV，PAGE_NEXT是什么，那么我们是不是可以知道到底是那个page 的PAGE_PREV出了问题？

[root@lixora-home ~]#innodb-pageed -c page.cnf

PAGEED: Release 8.0.20.0.0 - LimitedProduction on Thu Dec 01 18:51:47 2021

Copyright (c) 2020, 2099, lixora. All rights reserved.

************* !!! For MySQL Internal Useonly !!! ***************

PPED>list

file1: em_data_indicator_copy1.ibd

PPED>file em_data_indicator_copy1.ibd

current file:em_data_indicator_copy1.ibd

PPED>page 160375

current page:00160375
PPED>p FIL_Header

............

current page:00160375--Offset:00008--PREV:160350--NEXT:161767

............

我们发现page 160375的下一个page 指针为161767，而page161767 的前一个block为160375，

即page 160375是正常的，而page 936为根本异常page ；

初步可以看出是mysql 对page 936 数据更新时出了问题，问题的根本原因我猜测是因为，cats 复制表时造成服务器空间满，这时在对page 936的部分写丢失了，其他的page 异常估计也是这个问题。（这个还需要待验证）

#处理方式:

方式1：

可以看到损坏的的一些page 都是在普通辅助索引上，把他们删除，重建即可；

show create table em_data_indicator_copy1;

drop index idx_time_code on table em_data_indicator_copy1;

.......

方式2：

使用pped 修改page N PAGE_PREV 值；

适用场景异常page 在主键上的，或者无主键表或者短时间内强制拉（损坏的表为大表）数据库；

略；