{"id":142,"date":"2014-03-27T23:44:52","date_gmt":"2014-03-27T23:44:52","guid":{"rendered":"http:\/\/aadant.com\/blog\/?p=142"},"modified":"2014-03-28T10:28:37","modified_gmt":"2014-03-28T10:28:37","slug":"performance_schema-success-stories-replication-sql-thread-tuning","status":"publish","type":"post","link":"http:\/\/aadant.com\/blog\/2014\/03\/27\/performance_schema-success-stories-replication-sql-thread-tuning\/","title":{"rendered":"Performance_schema success stories :  replication SQL thread tuning"},"content":{"rendered":"<p>A lot of customers have lagging slaves. It could be one of the top issues at support, due to the infamous row based replication without primary key issue :<\/p>\n<table id=\"bugheader\">\n<tbody>\n<tr id=\"title\">\n<th><a href=\"http:\/\/bugs.mysql.com\/bug.php?id=53375\">Bug\u00a0#53375<\/a><\/th>\n<td colspan=\"3\">RBR + no PK =&gt; High load on slave (table scan\/cpu) =&gt; slave failure<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>If you use binlog_format = statement or mixed , \u00a0there are several ways of monitoring the SQL thread. The most ancient is the <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/5.6\/en\/replication-options-slave.html#option_mysqld_log-slow-slave-statements\">log-slow-slave-statements<\/a>\u00a0 variable. From 5.6.11, it is a dynamic variable, before that you had to restart the slave mysqld to enable it.<\/p>\n<p>Once on, you can trace what&#8217;s going on in the SQL thread and analyze the slow query log. Of course, as the SQL thread could be running long write queries or a lot of fast queries, so it is crucial to set long_query_time = 0 for say 60 seconds to catch the fast writes. Fast does not mean that they can not be optimized :-).<\/p>\n<p>From MySQL 5.6.14, there is even better. After this bug fix :\u00a0<b>Bug 16750433 &#8211; THE STATEMENT DIGEST DOES NOT SHOW THE SLAVE SQL THREAD STATEMENTS, <\/b>the Performance_schema statement digests are showing the SQL thread statements as well !<\/p>\n<p>So that <a href=\"https:\/\/github.com\/MarkLeith\/dbahelper\">ps_helper<\/a> \/ <a href=\"https:\/\/github.com\/MarkLeith\/mysql-sys\">mysql_sys<\/a> can be used to see what&#8217;s going on in the SQL thread.<\/p>\n<p>Now an real example :<\/p>\n<p>If you see this kind of things : show global status like &#8216;Handler%&#8217; or in the show engine innodb status :<\/p>\n<pre class=\"lang:default decode:true\">Handler_read_next: 1,573,551,696 (551,543\/sec)\r\nNumber of rows inserted 172099, updated 685966, deleted 137734, read 1574302296\r\n32.14 inserts\/s, 208.76 updates\/s, 28.84 deletes\/s, 547765.56 reads\/s<\/pre>\n<p>500k index records scan per second is a bit too much for a (single) thread. No wonder that<\/p>\n<p>Seconds_Behind_Master: 76162<\/p>\n<p>is large and increasing.<\/p>\n<pre class=\"lang:default mark:5-7,9 decode:true\">mysql&gt; select * from statement_analysis                             ;\r\n+-------------------------------------------------------------------+-----------+-----------+------------+-----------+------------+---------------+-------------+-------------+--------------+-----------+---------------+---------------+-------------------+------------+-----------------+-------------+-------------------+----------------------------------+---------------------+---------------------+\r\n| query                                                             | db        | full_scan | exec_count | err_count | warn_count | total_latency | max_latency | avg_latency | lock_latency | rows_sent | rows_sent_avg | rows_examined | rows_examined_avg | tmp_tables | tmp_disk_tables | rows_sorted | sort_merge_passes | digest                           | first_seen          | last_seen           |\r\n+-------------------------------------------------------------------+-----------+-----------+------------+-----------+------------+---------------+-------------+-------------+--------------+-----------+---------------+---------------+-------------------+------------+-----------------+-------------+-------------------+----------------------------------+---------------------+---------------------+\r\n| UPDATE t ...  c1 = ? AND c2 = ?                                   |     prod  |           |      13904 |         0 |       4896 | 00:06:06.71   | 4.75 s      | 26.37 ms    | 1.26 s       |         0 |             0 |       1492198 |               107 |          0 |               0 |           0 |                 0 | a36a0c617b2f696cb4889ee015541b65 | 2014-03-05 08:02:25 | 2014-03-05 08:13:30 |\r\n| UPDATE t ...  c1 = ? AND c2 = ?                                   |     prod  |           |       2024 |         0 |          0 | 00:01:34.49   | 2.57 s      | 46.69 ms    | 147.32 ms    |         0 |             0 |        373785 |               185 |          0 |               0 |           0 |                 0 | 8dd4ee22a216fe5fcc03b03a34f3a3fd | 2014-03-05 08:02:36 | 2014-03-05 08:13:30 |\r\n| UPDATE t ...  c1 = ? AND c2 = ?                                   |     prod  |           |       1662 |         0 |          0 | 00:01:31.54   | 3.31 s      | 55.08 ms    | 65.96 ms     |         0 |             0 |        317512 |               191 |          0 |               0 |           0 |                 0 | 27a13609455481e156ca9eb2e5c54d6d | 2014-03-05 08:02:53 | 2014-03-05 08:13:29 |\r\n| UPDATE t ...  ? , c3  = IF ( ...                                  |     prod  |           |       1987 |         0 |       1064 | 41.06 s       | 1.11 s      | 20.66 ms    | 206.36 ms    |         0 |             0 |        206869 |               104 |          0 |               0 |           0 |                 0 | f812e322c0e3dfaf905fe1d624978564 | 2014-03-05 08:02:42 | 2014-03-05 08:13:29 |\r\n| UPDATE t ...  c1 = ? AND c2 = ?                                   |     prod  |           |        680 |         0 |          0 | 39.11 s       | 1.91 s      | 57.51 ms    | 62.61 ms     |         0 |             0 |        144791 |               213 |          0 |               0 |           0 |                 0 | a9266204238e66806ca4c073650b72e5 | 2014-03-05 08:05:37 | 2014-03-05 08:13:22 |\r\n\r\nmysql&gt; select * from io_global_by_file_by_latency                   ;\r\n+-------------------------------------------------------------------------------------+------------+---------------+------------+--------------+-------------+---------------+------------+--------------+\r\n| file                                                                                | count_star | total_latency | count_read | read_latency | count_write | write_latency | count_misc | misc_latency |\r\n+-------------------------------------------------------------------------------------+------------+---------------+------------+--------------+-------------+---------------+------------+--------------+\r\n| @@datadir\/prod\/t.ibd                                                                |     244505 | 00:05:58.94   |     244319 | 00:05:58.83  |         114 | 4.56 ms       |         72 | 105.85 ms    |<\/pre>\n<p>The slave was up for 12 min and nearly 10 min was spent on the updates ! 18000 updates in 10 minutes, that&#8217;s fast \ud83d\ude42 So do we need to buy better hardware ?<\/p>\n<p>No. Average rows examined is between 100 &#8211; 200. Maybe we can optimize it ?<\/p>\n<pre class=\"lang:mysql mark:6,12,8 decode:true\">mysql&gt; explain update t set c3 = 1 where c1=6577849127655212045 and c2 = '2014-03-04'\\G\r\n*************************** 1. row ***************************\r\n           id: 1\r\n  select_type: SIMPLE\r\n        table: t\r\n         type: index_merge\r\npossible_keys: idx_c1, idx_c2\r\n          key: idx_c1, idx_c2\r\n      key_len: 8,3\r\n          ref: NULL\r\n         rows: 3\r\n        Extra: Using intersect(idx_c1,idx_c2); Using where\r\n1 row in set (0.00 sec)<\/pre>\n<p>Indeed, adding an index on (c1,c2) fixed the issue. And cut by 100 the rows examined average. After one hour, the slave that was lagging 1 day had caught up.<\/p>\n<p><a href=\"http:\/\/aadant.com\/blog\/wp-content\/uploads\/2014\/03\/rows_examined.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone  wp-image-161\" alt=\"rows_examined\" src=\"http:\/\/aadant.com\/blog\/wp-content\/uploads\/2014\/03\/rows_examined-300x83.png\" width=\"600\" height=\"166\" srcset=\"http:\/\/aadant.com\/blog\/wp-content\/uploads\/2014\/03\/rows_examined-300x83.png 300w, http:\/\/aadant.com\/blog\/wp-content\/uploads\/2014\/03\/rows_examined-1024x283.png 1024w, http:\/\/aadant.com\/blog\/wp-content\/uploads\/2014\/03\/rows_examined.png 1600w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>This is the <a href=\"http:\/\/www.mysql.com\/products\/enterprise\/monitor.html\">MEM<\/a> graph for the rows_examined.<\/p>\n<p>Thank you to\u00a0Hauns\u00a0Froehlingsdorf who worked with me on this issue.<\/p>\n<p>Things to note here :<\/p>\n<ul>\n<li>the performance_schema rocks for statement replication. I truly hope that row based replication will get the instrumentation it deserves. Maybe row events could appear as statements ? That&#8217;s how they are processed by the parser when you do point-in-time recovery for example &#8230;<\/li>\n<li>index merge used for updates is much more expensive that it seems : it is not certainly not 3 rows_examined, not 100-200 much more here. \u00a0Index entries are locked for 2 index ranges, so more locks in memory in REPEATABLE READS isolation at least. Here are the 2 bugs for completeness :<\/li>\n<li><b>Bug 14226987 &#8211; ROWS_EXAMINED IS NOT CORRECT FOR INDEX MERGE QUERIES\u00a0<\/b><\/li>\n<li><b>Bug 14226171 &#8211; EXCESSIVE ROW LOCKING WITH UPDATE IN 5.5.25\u00a0<\/b><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>A lot of customers have lagging slaves. It could be one of the top issues at support, due to the infamous row based replication without primary key issue : Bug\u00a0#53375 RBR + no PK =&gt; High load on slave (table &hellip; <a href=\"http:\/\/aadant.com\/blog\/2014\/03\/27\/performance_schema-success-stories-replication-sql-thread-tuning\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-142","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"http:\/\/aadant.com\/blog\/wp-json\/wp\/v2\/posts\/142","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/aadant.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/aadant.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/aadant.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/aadant.com\/blog\/wp-json\/wp\/v2\/comments?post=142"}],"version-history":[{"count":21,"href":"http:\/\/aadant.com\/blog\/wp-json\/wp\/v2\/posts\/142\/revisions"}],"predecessor-version":[{"id":164,"href":"http:\/\/aadant.com\/blog\/wp-json\/wp\/v2\/posts\/142\/revisions\/164"}],"wp:attachment":[{"href":"http:\/\/aadant.com\/blog\/wp-json\/wp\/v2\/media?parent=142"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/aadant.com\/blog\/wp-json\/wp\/v2\/categories?post=142"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/aadant.com\/blog\/wp-json\/wp\/v2\/tags?post=142"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}