当计算表中的所有行时,将使用什么索引?好吧,MySQL文档文档对此提供了一个直接的答案,引用:
InnoDB 通过遍历最小的可用二级索引来处理 SELECT COUNT(*) 语句除非索引或优化器提示指示优化器使用
不同的索引。如果不存在二级索引,则 InnoDB 通过扫描聚集索引来处理 SELECT COUNT(*) 语句。
通过一个简单的 sysbench 表示例,首先让我们检查一下查询计划:
mysql > explain select count(*) from sbtest1;
+----+-------------+---------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | sbtest1 | NULL | index | NULL | k_1 | 4 | NULL | 986400 | 100.00 | Using index |
+----+-------------+---------+------------+-------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
正如预期的那样,查询优化器选择了二级索引并打算对其执行扫描。运行查询将相应地增加处理程序,并在慢速日志中报告相同的情况:
mysql > flush status;
Query OK, 0 rows affected (0.00
mysql > select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.17 sec)
mysql > show status like 'Handler_read%';
+-----------------------+---------+
| Variable_name | Value |
+-----------------------+---------+
| Handler_read_first | 1 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 1000000 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+-----------------------+---------+
7 rows in set (0.01 sec)
第一个惊喜是,在上述执行后,慢日志没有报告任何rows_examined
:
# Time: 2025-02-07T10:14:28.548037Z
# User@Host: msandbox[msandbox] @ localhost [] Id: 9
# Schema: db1 Last_errno: 0 Killed: 0
# Query_time: 0.173279 Lock_time: 0.000006 Rows_sent: 1 Rows_examined: 0 Rows_affected: 0 Bytes_sent: 62
use db1;
SET timestamp=1736696128;
select count(*) from sbtest1;
如果没有可用的辅助密钥怎么办?第二个示例表仅定义了主键:
mysql > desc sbtest2;
+-------+-----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| k | int | NO | | 0 | |
| c | char(120) | NO | | | |
| pad | char(60) | NO | | | |
+-------+-----------+------+-----+---------+----------------+
4 rows in set (0.01 sec)
mysql > explain select count(*) from sbtest2;
+----+-------------+---------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | sbtest2 | NULL | index | NULL | PRIMARY | 4 | NULL | 986400 | 100.00 | Using index |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
第二个惊喜来了:处理程序不会改变!
mysql > flush status;
Query OK, 0 rows affected (0.00 sec)
mysql > select count(*) from sbtest2;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.04 sec)
mysql > show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 0 |
| Handler_read_last | 0 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+-----------------------+-------+
7 rows in set (0.00 sec)
在另一个表中,相同的查询成本是否会有所不同,这次没有定义任何索引
mysql > desc sbtest3;
+-------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| id | int | NO | | 0 | |
| k | int | NO | | 0 | |
| c | char(120) | NO | | | |
| pad | char(60) | NO | | | |
+-------+-----------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql > explain select count(*) from sbtest3;
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------+
| 1 | SIMPLE | sbtest3 | NULL | ALL | NULL | NULL | NULL | NULL | 985734 | 100.00 | NULL |
+----+-------------+---------+------------+------+---------------+------+---------+------+--------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql > select count(*) from sbtest3;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.05 sec)
mysql > show status like 'Handler_read%';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 0 |
| Handler_read_last | 0 |
| Handler_read_next | 0 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+-----------------------+-------+
7 rows in set (0.00 sec)
除了观察执行时间(与具有主键的表相当)之外,很难判断,因为状态处理程序既不会递增,也不会从慢速日志中Rows_examined
信息。此外,全局 InnoDB 计数器(如 Innodb_rows_read
)以及性能架构表统计信息在使用聚集索引(显式或隐式)进行计数时也不会更改!
mysql > select id from sbtest1 limit 1; select id from sbtest2 limit 1; select id from sbtest3 limit 1;
+--------+
| id |
+--------+
| 731065 |
+--------+
1 row in set (0.00 sec)
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.01 sec)
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.00 sec)
mysql > select table_schema,table_name,rows_fetched from sys.schema_table_statistics where table_schema="db1";
+--------------+------------+--------------+
| table_schema | table_name | rows_fetched |
+--------------+------------+--------------+
| db1 | sbtest1 | 1 |
| db1 | sbtest2 | 1 |
| db1 | sbtest3 | 1 |
+--------------+------------+--------------+
3 rows in set (0.01 sec)
mysql > select count(*) from sbtest1; select count(*) from sbtest2; select count(*) from sbtest3;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.21 sec)
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.03 sec)
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.03 sec)
mysql > select table_schema,table_name,rows_fetched from sys.schema_table_statistics where table_schema="db1";
+--------------+------------+--------------+
| table_schema | table_name | rows_fetched |
+--------------+------------+--------------+
| db1 | sbtest1 | 1000001 |
| db1 | sbtest2 | 1 |
| db1 | sbtest3 | 1 |
+--------------+------------+--------------+
3 rows in set (0.01 sec)
当不涉及辅助键时,在上面的运行中显然会被忽略!那么我们如何检查在 count 查询执行期间实际读取了什么索引,以及必须获取多少数据呢?我只能想到一种方法 – 在新重新启动的空闲服务器上,我们可以检查 Buffer Pool 内容(前提innodb_buffer_pool_load_at_startup = OFF
)
建议只在空闲时间进行检查!!!!!
mysql > select TABLE_NAME,INDEX_NAME,count(*) from INFORMATION_SCHEMA.INNODB_BUFFER_PAGE where TABLE_NAME like "`db1`.%" GROUP BY TABLE_NAME,INDEX_NAME;
Empty set (0.19 sec)
mysql > select count(*) from sbtest1; select count(*) from sbtest2; select count(*) from sbtest3;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.22 sec)
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.20 sec)
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.87 sec)
mysql > select TABLE_NAME,INDEX_NAME,count(*) from INFORMATION_SCHEMA.INNODB_BUFFER_PAGE where TABLE_NAME like "`db1`.%" GROUP BY TABLE_NAME,INDEX_NAME;
+-----------------+-----------------+----------+
| TABLE_NAME | INDEX_NAME | count(*) |
+-----------------+-----------------+----------+
| `db1`.`sbtest1` | k_1 | 833 |
| `db1`.`sbtest2` | PRIMARY | 13713 |
| `db1`.`sbtest3` | GEN_CLUST_INDEX | 14302 |
+-----------------+-----------------+----------+
3 rows in set (0.26 sec)
根据上述结果,我们可以看到加载了多少个索引页面来执行计数,其中小型二级索引胜出,因为它不包含数据。在这种情况下,它并没有直接转化为执行时间,具体原因等我明白了再给大家开篇贴来说