Performance_schema success stories : replication SQL thread tuning

A lot of customers have lagging slaves. It could be one of the top issues at support, due to the infamous row based replication without primary key issue :

Bug #53375	RBR + no PK => High load on slave (table scan/cpu) => slave failure

If you use binlog_format = statement or mixed , there are several ways of monitoring the SQL thread. The most ancient is the log-slow-slave-statements variable. From 5.6.11, it is a dynamic variable, before that you had to restart the slave mysqld to enable it.

Once on, you can trace what’s going on in the SQL thread and analyze the slow query log. Of course, as the SQL thread could be running long write queries or a lot of fast queries, so it is crucial to set long_query_time = 0 for say 60 seconds to catch the fast writes. Fast does not mean that they can not be optimized :-).

From MySQL 5.6.14, there is even better. After this bug fix : Bug 16750433 – THE STATEMENT DIGEST DOES NOT SHOW THE SLAVE SQL THREAD STATEMENTS, the Performance_schema statement digests are showing the SQL thread statements as well !

So that ps_helper / mysql_sys can be used to see what’s going on in the SQL thread.

Now an real example :

If you see this kind of things : show global status like ‘Handler%’ or in the show engine innodb status :

Handler_read_next: 1,573,551,696 (551,543/sec)
Number of rows inserted 172099, updated 685966, deleted 137734, read 1574302296
32.14 inserts/s, 208.76 updates/s, 28.84 deletes/s, 547765.56 reads/s

Handler_read_next: 1,573,551,696 (551,543/sec)

Number of rows inserted 172099, updated 685966, deleted 137734, read 1574302296

32.14 inserts/s, 208.76 updates/s, 28.84 deletes/s, 547765.56 reads/s

500k index records scan per second is a bit too much for a (single) thread. No wonder that

Seconds_Behind_Master: 76162

is large and increasing.

mysql> select * from statement_analysis                             ;
+-------------------------------------------------------------------+-----------+-----------+------------+-----------+------------+---------------+-------------+-------------+--------------+-----------+---------------+---------------+-------------------+------------+-----------------+-------------+-------------------+----------------------------------+---------------------+---------------------+
| query                                                             | db        | full_scan | exec_count | err_count | warn_count | total_latency | max_latency | avg_latency | lock_latency | rows_sent | rows_sent_avg | rows_examined | rows_examined_avg | tmp_tables | tmp_disk_tables | rows_sorted | sort_merge_passes | digest                           | first_seen          | last_seen           |
+-------------------------------------------------------------------+-----------+-----------+------------+-----------+------------+---------------+-------------+-------------+--------------+-----------+---------------+---------------+-------------------+------------+-----------------+-------------+-------------------+----------------------------------+---------------------+---------------------+
| UPDATE t ...  c1 = ? AND c2 = ?                                   |     prod  |           |      13904 |         0 |       4896 | 00:06:06.71   | 4.75 s      | 26.37 ms    | 1.26 s       |         0 |             0 |       1492198 |               107 |          0 |               0 |           0 |                 0 | a36a0c617b2f696cb4889ee015541b65 | 2014-03-05 08:02:25 | 2014-03-05 08:13:30 |
| UPDATE t ...  c1 = ? AND c2 = ?                                   |     prod  |           |       2024 |         0 |          0 | 00:01:34.49   | 2.57 s      | 46.69 ms    | 147.32 ms    |         0 |             0 |        373785 |               185 |          0 |               0 |           0 |                 0 | 8dd4ee22a216fe5fcc03b03a34f3a3fd | 2014-03-05 08:02:36 | 2014-03-05 08:13:30 |
| UPDATE t ...  c1 = ? AND c2 = ?                                   |     prod  |           |       1662 |         0 |          0 | 00:01:31.54   | 3.31 s      | 55.08 ms    | 65.96 ms     |         0 |             0 |        317512 |               191 |          0 |               0 |           0 |                 0 | 27a13609455481e156ca9eb2e5c54d6d | 2014-03-05 08:02:53 | 2014-03-05 08:13:29 |
| UPDATE t ...  ? , c3  = IF ( ...                                  |     prod  |           |       1987 |         0 |       1064 | 41.06 s       | 1.11 s      | 20.66 ms    | 206.36 ms    |         0 |             0 |        206869 |               104 |          0 |               0 |           0 |                 0 | f812e322c0e3dfaf905fe1d624978564 | 2014-03-05 08:02:42 | 2014-03-05 08:13:29 |
| UPDATE t ...  c1 = ? AND c2 = ?                                   |     prod  |           |        680 |         0 |          0 | 39.11 s       | 1.91 s      | 57.51 ms    | 62.61 ms     |         0 |             0 |        144791 |               213 |          0 |               0 |           0 |                 0 | a9266204238e66806ca4c073650b72e5 | 2014-03-05 08:05:37 | 2014-03-05 08:13:22 |

mysql> select * from io_global_by_file_by_latency                   ;
+-------------------------------------------------------------------------------------+------------+---------------+------------+--------------+-------------+---------------+------------+--------------+
| file                                                                                | count_star | total_latency | count_read | read_latency | count_write | write_latency | count_misc | misc_latency |
+-------------------------------------------------------------------------------------+------------+---------------+------------+--------------+-------------+---------------+------------+--------------+
| @@datadir/prod/t.ibd                                                                |     244505 | 00:05:58.94   |     244319 | 00:05:58.83  |         114 | 4.56 ms       |         72 | 105.85 ms    |

mysql> select * from statement_analysis ;

+-------------------------------------------------------------------+-----------+-----------+------------+-----------+------------+---------------+-------------+-------------+--------------+-----------+---------------+---------------+-------------------+------------+-----------------+-------------+-------------------+----------------------------------+---------------------+---------------------+

| UPDATE t ... c1 = ? AND c2 = ? | prod | | 13904 | 0 | 4896 | 00:06:06.71 | 4.75 s | 26.37 ms | 1.26 s | 0 | 0 | 1492198 | 107 | 0 | 0 | 0 | 0 | a36a0c617b2f696cb4889ee015541b65 | 2014-03-05 08:02:25 | 2014-03-05 08:13:30 |

| UPDATE t ... c1 = ? AND c2 = ? | prod | | 2024 | 0 | 0 | 00:01:34.49 | 2.57 s | 46.69 ms | 147.32 ms | 0 | 0 | 373785 | 185 | 0 | 0 | 0 | 0 | 8dd4ee22a216fe5fcc03b03a34f3a3fd | 2014-03-05 08:02:36 | 2014-03-05 08:13:30 |

| UPDATE t ... c1 = ? AND c2 = ? | prod | | 1662 | 0 | 0 | 00:01:31.54 | 3.31 s | 55.08 ms | 65.96 ms | 0 | 0 | 317512 | 191 | 0 | 0 | 0 | 0 | 27a13609455481e156ca9eb2e5c54d6d | 2014-03-05 08:02:53 | 2014-03-05 08:13:29 |

| UPDATE t ... ? , c3 = IF ( ... | prod | | 1987 | 0 | 1064 | 41.06 s | 1.11 s | 20.66 ms | 206.36 ms | 0 | 0 | 206869 | 104 | 0 | 0 | 0 | 0 | f812e322c0e3dfaf905fe1d624978564 | 2014-03-05 08:02:42 | 2014-03-05 08:13:29 |

| UPDATE t ... c1 = ? AND c2 = ? | prod | | 680 | 0 | 0 | 39.11 s | 1.91 s | 57.51 ms | 62.61 ms | 0 | 0 | 144791 | 213 | 0 | 0 | 0 | 0 | a9266204238e66806ca4c073650b72e5 | 2014-03-05 08:05:37 | 2014-03-05 08:13:22 |

mysql> select * from io_global_by_file_by_latency ;

+-------------------------------------------------------------------------------------+------------+---------------+------------+--------------+-------------+---------------+------------+--------------+

| @@datadir/prod/t.ibd | 244505 | 00:05:58.94 | 244319 | 00:05:58.83 | 114 | 4.56 ms | 72 | 105.85 ms |

The slave was up for 12 min and nearly 10 min was spent on the updates ! 18000 updates in 10 minutes, that’s fast 🙂 So do we need to buy better hardware ?

No. Average rows examined is between 100 – 200. Maybe we can optimize it ?

mysql> explain update t set c3 = 1 where c1=6577849127655212045 and c2 = '2014-03-04'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: index_merge
possible_keys: idx_c1, idx_c2
          key: idx_c1, idx_c2
      key_len: 8,3
          ref: NULL
         rows: 3
        Extra: Using intersect(idx_c1,idx_c2); Using where
1 row in set (0.00 sec)

mysql> explain update t set c3 = 1 where c1=6577849127655212045 and c2 = '2014-03-04'\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: t

type: index_merge

possible_keys: idx_c1, idx_c2

key: idx_c1, idx_c2

key_len: 8,3

ref: NULL

rows: 3

Extra: Using intersect(idx_c1,idx_c2); Using where

1 row in set (0.00 sec)

Indeed, adding an index on (c1,c2) fixed the issue. And cut by 100 the rows examined average. After one hour, the slave that was lagging 1 day had caught up.

This is the MEM graph for the rows_examined.

Thank you to Hauns Froehlingsdorf who worked with me on this issue.

Things to note here :

the performance_schema rocks for statement replication. I truly hope that row based replication will get the instrumentation it deserves. Maybe row events could appear as statements ? That’s how they are processed by the parser when you do point-in-time recovery for example …
index merge used for updates is much more expensive that it seems : it is not certainly not 3 rows_examined, not 100-200 much more here. Index entries are locked for 2 index ranges, so more locks in memory in REPEATABLE READS isolation at least. Here are the 2 bugs for completeness :
Bug 14226987 – ROWS_EXAMINED IS NOT CORRECT FOR INDEX MERGE QUERIES
Bug 14226171 – EXCESSIVE ROW LOCKING WITH UPDATE IN 5.5.25

Performance_schema success stories : replication SQL thread tuning

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta