Performance_schema success stories : replication SQL thread tuning

A lot of customers have lagging slaves. It could be one of the top issues at support, due to the infamous row based replication without primary key issue :

Bug #53375 RBR + no PK => High load on slave (table scan/cpu) => slave failure

If you use binlog_format = statement or mixed ,  there are several ways of monitoring the SQL thread. The most ancient is the log-slow-slave-statements  variable. From 5.6.11, it is a dynamic variable, before that you had to restart the slave mysqld to enable it.

Once on, you can trace what’s going on in the SQL thread and analyze the slow query log. Of course, as the SQL thread could be running long write queries or a lot of fast queries, so it is crucial to set long_query_time = 0 for say 60 seconds to catch the fast writes. Fast does not mean that they can not be optimized :-).

From MySQL 5.6.14, there is even better. After this bug fix : Bug 16750433 – THE STATEMENT DIGEST DOES NOT SHOW THE SLAVE SQL THREAD STATEMENTS, the Performance_schema statement digests are showing the SQL thread statements as well !

So that ps_helper / mysql_sys can be used to see what’s going on in the SQL thread.

Now an real example :

If you see this kind of things : show global status like ‘Handler%’ or in the show engine innodb status :

500k index records scan per second is a bit too much for a (single) thread. No wonder that

Seconds_Behind_Master: 76162

is large and increasing.

The slave was up for 12 min and nearly 10 min was spent on the updates ! 18000 updates in 10 minutes, that’s fast 🙂 So do we need to buy better hardware ?

No. Average rows examined is between 100 – 200. Maybe we can optimize it ?

Indeed, adding an index on (c1,c2) fixed the issue. And cut by 100 the rows examined average. After one hour, the slave that was lagging 1 day had caught up.

rows_examined

 

This is the MEM graph for the rows_examined.

Thank you to Hauns Froehlingsdorf who worked with me on this issue.

Things to note here :

  • the performance_schema rocks for statement replication. I truly hope that row based replication will get the instrumentation it deserves. Maybe row events could appear as statements ? That’s how they are processed by the parser when you do point-in-time recovery for example …
  • index merge used for updates is much more expensive that it seems : it is not certainly not 3 rows_examined, not 100-200 much more here.  Index entries are locked for 2 index ranges, so more locks in memory in REPEATABLE READS isolation at least. Here are the 2 bugs for completeness :
  • Bug 14226987 – ROWS_EXAMINED IS NOT CORRECT FOR INDEX MERGE QUERIES 
  • Bug 14226171 – EXCESSIVE ROW LOCKING WITH UPDATE IN 5.5.25 
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *