I discussed my previous post on O_DIRECT_NO_FSYNC with the InnoDB team and they fixed my understanding of a few parts of the code that contribute to the stalls I have been reporting. We also discussed a problem I have been ignoring. The InnoDB code that does an fsync for a tablespace (fil_flush) can make a thread sleep when there are concurrent attempts to do the fsync. The amount of time to sleep, 20 milliseconds, was probably chosen in 1995. It is too long for fast storage including HW RAID cards with battery backed write cache and flash-based devices.
I created bug 68588 for the call to sleep and today I have performance numbers to share from MySQL 5.6.10. The workload is the same as described in the previous post. Three binaries were tested:
- O_DIRECT,condvar – use innodb_flush_method=O_DIRECT and patch the binary to use condition variable waits rather than sleep when there are concurrent threads in fil_flush.
- O_DIRECT,sleep – use innodb_flush_method=O_DIRECT
- O_DIRECT_NO_FSYNC – use innodb_flush_method=O_DIRECT_NO_FSYNC
updates/second for update 1 row by PK via sysbench
8 16 32 64 128 256 concurrent clients
18234 23513 22542 21967 21941 22135 O_DIRECT,condvar
18382 25290 10464 9868 10059 10917 O_DIRECT,sleep
18237 26332 30318 29695 29633 29380 O_DIRECT_NO_FSYNC
There is still a benefit from using O_DIRECT_NO_FSYNC but the difference is less significant. I didn’t see any obvious stalls when using PMP with the O_DIRECT,condvar binary. However, at test end the average rate for innodb_pages_written was about 19,000/second for O_DIRECT,condvar versus 24,000/second for O_DIRECT_NO_FSYNC. My explanation at this point is that the page cleaner thread is able to write pages faster when it doesn’t have to wait to call fsync and go through fil_flush().