深入解析MySQL：构建高性能索引（案例分析篇）

JAVA后端架构 03-20 71

前言：

而今大家对“mysqlselect效率”大体比较珍视，姐妹们都需要了解一些“mysqlselect效率”的相关文章。那么小编也在网络上汇集了一些对于“mysqlselect效率””的相关文章，希望我们能喜欢，小伙伴们快快来学习一下吧！

索引的十大原则

1、正确理解和计算索引字段的区分度，文中有计算规则，区分度高的索引，可以快速得定位数据，区分度太低，无法有效的利用索引，可能需要扫描大量数据页，和不使用索引没什么差别。

2、正确理解和计算前缀索引的字段长度，文中有判断规则，合适的长度要保证高的区分度和最恰当的索引存储容量，只有达到最佳状态，才是保证高效率的索引。

3、联合索引注意最左匹配原则：必须按照从左到右的顺序匹配，MySQL会一直向右匹配索引直到遇到范围查询(>、<、between、like)然后停止匹配。

如 depno=1 and empname>'' and job=1 ，如果建立(depno,empname,job)顺序的索引，job是用不到索引的。

4、应需而取策略，查询记录的时候，不要一上来就使用*，只取需要的数据，可能的话尽量只利用索引覆盖，可以减少回表操作，提升效率。

5、正确判断是否使用联合索引（上面联合索引的使用那一小节有说明判断规则），也可以进一步分析到索引下推（IPC），减少回表操作，提升效率。

6、避免索引失效的原则：禁止对索引字段使用函数、运算符操作，会使索引失效。这是实际上就是需要保证索引所对应字段的”干净度“。

7、避免非必要的类型转换，字符串字段使用数值进行比较的时候会导致索引无效。

8、模糊查询'%value%'会使索引无效，变为全表扫描，因为无法判断扫描的区间，但是'value%'是可以有效利用索引。

9、索引覆盖排序字段，这样可以减少排序步骤，提升查询效率

10、尽量的扩展索引，非必要不新建索引。比如表中已经有a的索引，现在要加(a,b)的索引，那么只需要修改原来的索引即可。

查询优化分析器 - explain

explain命令大家应该很熟悉，具体用法和字段含义可以参考官网explain-output，这里需要强调rows是核心指标，绝大部分rows小的语句执行一定很快，因为扫描的内容基数小。

所以优化语句基本上都是在优化降低rows值。

慢查询优化基本步骤

1.先运行查看实际耗时，判断是否真的很慢（注意设置SQL_NO_CACHE）。

2.高区分度优先策略：where条件单表查，锁定最小返回记录表的条件。

就是查询语句的where都应用到表中返回的记录数最小的表开始查起，单表每个字段分别查询，看哪个字段的区分度最高。区分度高的字段往前排。

3.explain查看执行计划，是否与1预期一致（从锁定记录较少的表开始查询）

4.order by limit 形式的sql语句让排序的表优先查

5.了解业务方的使用场景，根据使用场景适时调整。

6.加索引时参照建上面索引的十大原则

7.观察结果，不符合预期继续从第一步开始分析

查询案例分析

下面几个例子详细解释了如何分析和优化慢查询。

复杂查询条件的分析

一般来说我们编写SQL的方式是为了是实现功能，在实现功能的基础上保证MySQL的执行效率也是非常重要的，这要求我们对MySQL的执行计划和索引规则有非常清晰的理解，分析下面的案例：

1 mysql> select a.*,b.depname,b.memo from emp a left join 2 dep b on a.depno = b.depno where sal>100 and a.empname like 'ab%'  and a.depno=106 order by a.hiredate desc ;3 +---------+---------+---------+---------+-----+---------------------+------+------+-------+------------+----------+4 | id      | empno   | empname | job     | mgr | hiredate            | sal  | comn | depno | depname    | memo     |5 +---------+---------+---------+---------+-----+---------------------+------+------+-------+------------+----------+6 | 4976754 | 4976754 | ABijwE  | SALEMAN |   1 | 2021-01-23 16:46:24 | 2000 | 400  |   106 | kDpNWugzcQ | TYlrVEkm |7 ......8 +---------+---------+---------+---------+-----+---------------------+------+------+-------+------------+----------+9 744 rows in set  (4.958 sec)

总共就查询了744条数据，却耗费了4.958的时间，我们看一下目前表中现存的索引以及索引使用的情况分析

 1 mysql> show index from emp; 2 +-------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 | Table | Non_unique | Key_name      | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | 4 +-------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 5 | emp   |          0 | PRIMARY       |            1 | id          | A         |     4952492 | NULL     | NULL   |      | BTREE      |         |               | 6 | emp   |          1 | idx_emo_depno |            1 | depno       | A         |          18 | NULL     | NULL   |      | BTREE      |         |               | 7 +-------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 8 2 rows in set 9 10 mysql> explain select a.*,b.depname,b.memo from emp a left join 11 dep b on a.depno = b.depno where sal>100 and a.empname like 'ab%'  and a.depno=106 order by a.hiredate desc ;12 +----+-------------+-------+------+---------------+---------------+---------+-------+--------+-----------------------------+13 | id | select_type | table | type | possible_keys | key           | key_len | ref   | rows   | Extra                       |14 +----+-------------+-------+------+---------------+---------------+---------+-------+--------+-----------------------------+15 |  1 | SIMPLE      | a     | ref  | idx_emo_depno | idx_emo_depno | 3       | const | 974898 | Using where; Using filesort |16 |  1 | SIMPLE      | b     | ref  | idx_dep_depno | idx_dep_depno | 3       | const |      1 | NULL                        |17 +----+-------------+-------+------+---------------+---------------+---------+-------+--------+-----------------------------+18 2 rows in set

可以看出，目前在emp表上除了主键只存在一个索引 idx_emo_depno ，作用在部门编号字段上，该索引的目标是过滤出具体部门编号下的数据。

通过explain 分析器可以看到 where条件后面是走了 idx_emo_depno 索引，但是也比较了 97W的数据，说明该字段的区分度并不高，根据高区分度优先原则，我们对这个表的三个查询字段分别进行区分度计算。

1 mysql> select count(distinct empname)/count(*),count(distinct depno)/count(*),count(distinct sal)/count(*) from emp; 2 +----------------------------------+--------------------------------+------------------------------+3 | count(distinct empname)/count(*) | count(distinct depno)/count(*) | count(distinct sal)/count(*) |4 +----------------------------------+--------------------------------+------------------------------+5 | 0.1713                           | 0.0000                         | 0.0000                       |6 +----------------------------------+--------------------------------+------------------------------+7 1 row in set

这是计算结果，empname的区分度最高，所以合理上是可以建立一个包含这三个字段的联合索引，顺序如下：empname、depno、sal；

并且查询条件重新调整了顺序，符合最左匹配原则；另一方面根据应需而取的策略，把b.memo字段去掉了。

1 mysql> select a.*,b.depname from emp a left join 2 dep b on a.depno = b.depno where  a.empname like 'ab%'  and a.depno=106 and a.sal>100 order by a.hiredate desc ;3 +---------+---------+---------+---------+-----+---------------------+------+------+-------+------------+4 | id      | empno   | empname | job     | mgr | hiredate            | sal  | comn | depno | depname    |5 +---------+---------+---------+---------+-----+---------------------+------+------+-------+------------+6 | 4976754 | 4976754 | ABijwE  | SALEMAN |   1 | 2021-01-23 16:46:24 | 2000 | 400  |   106 | kDpNWugzcQ |7 ......8 +---------+---------+---------+---------+-----+---------------------+------+------+-------+------------+9 744 rows in set  (0.006 sec)

这边还有一个问题，那就是联合索引根据最左匹配原则：必须按照从左到右的顺序匹配，MySQL会一直向右匹配索引直到遇到范围查询(>、<、between、like)然后停止匹配。

所以语句中执行到a.empname 字段，因为使用了like，后面就不再走索引了。在这个场景中，独立的empname字段上的索引和这个联合索引效率是差不多的。

另外排序字段hiredate也可以考虑到覆盖到索引中，会相应的提高效率。

无效索引的分析

有一个需求，使用到了用户表 userinfo 和消费明细表 salinvest ，目的想把2020年每个用户在四个品类等级（A1、A2、A3、A4）上的消费额度进行统计，所以便下了如下的脚本：

 1 select (@rowNO := @rowNo+1) AS id,bdata.* from  2 ( 3 select distinct a.usercode,a.username, 4 @A1:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A1'  5 and c.logdate between '2020-01-01' and '2020-12-31'),0) as A1, 6 @A2:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A2'  7 and c.logdate between '2020-01-01' and '2020-12-31'),0) as A2, 8 @A3:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A3'  9 and c.logdate between '2020-01-01' and '2020-12-31'),0) as A3,10 @A4:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A4' 11 and c.logdate between '2020-01-01' and '2020-12-31'),0) as A4,12 ,(@A1+@A2+@A3+@A4) as allnum 13 from userinfo a 14 inner JOIN `salinvest` b on a.usercode = b.usercode  15 where b.logdate between '2020-01-01' and '2020-12-31'16 order by allnum desc17 ) as bdata,(SELECT @rowNO:=0) b;

这个查询看起来貌似没什么问题，虽然用到了复合查询、子查询，但是如果索引做的正确，也不会有什么问题。那我们来看看索引，有一个联合索引，符合我们最左匹配原则和高区分度优先原则：

 1 mysql> show index from salinvest; 2 +------------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 | Table      | Non_unique | Key_name               | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | 4 +------------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 5 | lnuminvest |          0 | PRIMARY                |            1 | autocode    | A         |           5 | NULL     | NULL   |      | BTREE      |         |               | 6 | lnuminvest |          1 | idx_salinvest_complex |            1 | usercode      | A         |           2 | NULL     | NULL   | YES  | BTREE      |         |               | 7 | lnuminvest |          1 | idx_salinvest_complex |            2 | gravalue    | A         |           2 | NULL     | NULL   | YES  | BTREE      |         |               | 8 | lnuminvest |          1 | idx_salinvest_complex |            3 | logdate     | A         |           2 | NULL     | NULL   | YES  | BTREE      |         |               | 9 +------------+------------+------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+10 4 rows in set

那我们来看看它的执行效率：

mysql> select (@rowNO := @rowNo+1) AS id,bdata.* from (select (@rowNO := @rowNo+1) AS id,bdata.* from (select distinct a.usercode,a.username,@A1:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A1' and c.logdate between '2020-01-01' and '2020-12-31'),0) as A1,@A2:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A2' and c.logdate between '2020-01-01' and '2020-12-31'),0) as A2,@A3:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A3' and c.logdate between '2020-01-01' and '2020-12-31'),0) as A3,@A4:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A4' and c.logdate between '2020-01-01' and '2020-12-31'),0) as A4,,(@A1+@A2+@A3+@A4) as allnum from userinfo a inner JOIN `salinvest` b on a.usercode = b.usercode  where b.logdate between '2020-01-01' and '2020-12-31'order by allnum desc) as bdata,(SELECT @rowNO:=0) b;+----+------------+---------+------+------+------+------+------+--------+| id | usercode     | username | A1     | A2   | A3   | A4   |allnum+----+------------+---------+------+------+------+------+------+--------+|  1 | 063105015    | brand    | 789.00 | 1074.50 | 998.00 | 850.00 |  ......+----+------------+---------+------+------+------+------+------+--------+6217 rows in set  (12.745 sec)

我这边省略了查询结果，实际上结果输出6000多条数据，在约50W的数据中进行统计与合并，输出6000多条数据，花费了将近13秒，这明显是不合理的。

我们来分析下是什么原因：

 1 mysql> explain select (@rowNO := @rowNo+1) AS id,bdata.* from  2 ( 3 select distinct a.usercode,a.username, 4 @A1:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A1'  5 and c.logdate between '2020-01-01' and '2020-12-31'),0) as A1, 6 @A2:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A2'  7 and c.logdate between '2020-01-01' and '2020-12-31'),0) as A2, 8 @A3:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A3'  9 and c.logdate between '2020-01-01' and '2020-12-31'),0) as A3,10 @A4:=IFNULL((select sum(c.ltimenum) from `salinvest` c where c.usercode=a.usercode AND c.gravalue='A4' 11 and c.logdate between '2020-01-01' and '2020-12-31'),0) as A4,12 ,(@A1+@A2+@A3+@A4) as allnum 13 from userinfo a 14 inner JOIN `salinvest` b on a.usercode = b.usercode  15 where b.logdate between '2020-01-01' and '2020-12-31'16 order by allnum desc17 ) as bdata,(SELECT @rowNO:=0) b;18 +----+--------------------+------------+------------+--------+------------------------+------------------------+---------+-----------------------+------+----------+-----------------------------------------------------------+19 | id | select_type        | table      | partitions | type   | possible_keys          | key                    | key_len | ref                   | rows | filtered | Extra                                                     |20 +----+--------------------+------------+------------+--------+------------------------+------------------------+---------+-----------------------+------+----------+-----------------------------------------------------------+21 |  1 | PRIMARY            | <derived8> | NULL       | system | NULL                   | NULL                   | NULL    | NULL                  |    1 |      100 | NULL                                                      |22 |  1 | PRIMARY            | <derived2> | NULL       | ALL    | NULL                   | NULL                   | NULL    | NULL                  |    2 |      100 | NULL                                                      |23 |  8 | DERIVED            | NULL       | NULL       | NULL   | NULL                   | NULL                   | NULL    | NULL                  | NULL | NULL     | No tables used                                            |24 |  2 | DERIVED            | b          | NULL       | index  | idx_salinvest_complex | idx_salinvest_complex | 170     | NULL                  |    5 |       20 | Using where; Using index; Using temporary; Using filesort |25 |  7 | DEPENDENT SUBQUERY | c          | NULL       | ALL    | idx_salinvest_complex | NULL                   | NULL    | NULL                  |    5 |       20 | Using where                                               |26 |  6 | DEPENDENT SUBQUERY | c          | NULL       | ALL    | idx_salinvest_complex | NULL                   | NULL    | NULL                  |    5 |       20 | Using where                                               |27 |  5 | DEPENDENT SUBQUERY | c          | NULL       | ALL    | idx_salinvest_complex | NULL                   | NULL    | NULL                  |    5 |       20 | Using where                                               |28 |  4 | DEPENDENT SUBQUERY | c          | NULL       | ALL    | idx_salinvest_complex | NULL                   | NULL    | NULL                  |    5 |       20 | Using where                                               |29 +----+--------------------+------------+------------+--------+------------------------+------------------------+---------+-----------------------+------+----------+-----------------------------------------------------------+30 9 rows in set

看最后四条数据，看他的possible_key和实际的key，预估是走 idx_salinvest_complex 索引，实际是走了空索引，这个是为什么呢？看前面的select_type 字段，值是 DEPENDENT SUBQUERY，了然了。

官方对 DEPENDENT SUBQUERY 的说明：子查询中的第一个SELECT，取决于外面的查询。

什么意思呢？它意味着两步：

第一步，MySQL 根据 select distinct a.usercode,a.username 得到一个大结果集 t1，这就是我们上图提示的6000用户。

第二步，上面的大结果集 t1 中的每一条记录，等同于与子查询 SQL 组成新的查询语句： select sum(c.ltimenum) from `salinvest` c where c.usercode in (select distinct a.usercode from userinfo a) 。

也就是说，每个子查询要比较6000次，几十万的数据啊……即使这两步骤查询都用到了索引，但还是会很慢。

这种情况下，子查询的执行效率受制于外层查询的记录数，还不如拆成两个独立查询顺序执行呢。

这种慢查询的解决办法，网上有很多方案，最常用的办法是用联合查询代替子查询，可以自己去查一下。

总结

上面给出了两种典型的问题，一种是没有使用索引使用原则进行索引构建，一种是遇到坑导致索引无效。我们在实际的应用中遇到过很多问题。比如：

1、不多的数据结果集，但是涉及到超多个表join的低效脚本。

2、无效的联表查询，就是其中一张联表没有任何使用，但是数据基数极大。（这也可能是某个时候业务变更导致的sql脚本忘了调整）。

3、varchar类型字段等值比较没有写单引号，巨量基数笛卡尔积查询直接把从库搞死，在索引字段上做计算导致索引失效的。

所有的这些案例都只是一些经验积累，只有熟悉查询优化器、索引的内部原理，了解索引优化的策略，才能定位这些问题的原因并加以解决。

为帮助开发者们提升面试技能、有机会入职BATJ等大厂公司，特别制作了这个专辑——这一次整体放出。

大致内容包括了： Java 集合、JVM、多线程、并发编程、设计模式、Spring全家桶、Java、MyBatis、ZooKeeper、Dubbo、Elasticsearch、Memcached、MongoDB、Redis、MySQL、RabbitMQ、Kafka、Linux、Netty、Tomcat等大厂面试题等、等技术栈！

欢迎大家关注公众号【Java烂猪皮】，回复【666】，获取以上最新Java后端架构VIP学习资料以及视频学习教程，然后一起学习，一文在手，面试我有。

每一个专栏都是大家非常关心，和非常有价值的话题，如果我的文章对你有所帮助，还请帮忙点赞、好评、转发一下，你的支持会激励我输出更高质量的文章，非常感谢！

本文地址：http://www.longkongtuishu.com/cafd8BARsA1oHClQ.html

标签： #mysqlselect效率