MySQL 5.6 has a new option for previous post. QPS below is much better when the new option is used. I used MySQL 5.6 with 8 buffer pool instances and the binlog disabled. The only difference was the value of innodb_flush_method.

updates/second for update 1 row by PK via sysbench
    8      16      32      64     128     256   concurrent clients
18234   24359   10379    9795    9843   10283   O_DIRECT
17996   26853   30265   28923   29293   29477   O_DIRECT_NO_FSYNC

This is the problem thread stack when there are stalls from O_DIRECT:

os_thread_sleep,fil_flush,fil_flush_file_spaces,buf_flush_sync_datafiles,buf_flush_single_page_from_LRU,buf_LRU_get_free_block,buf_page_init_for_read,buf_read_page_low,..

This is the my.cnf from the test. The server has fast storage that can do ~150k disk reads/second. I think innodb_io_capacity and innodb_lru_scan_depth were large enough. I don’t think the host was mis-tuned.

sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
innodb_log_file_size=1900M
innodb_max_dirty_pages_pct=80
innodb_file_format=barracuda
innodb_file_per_table
table-definition-cache=1000
table-open-cache=2000
max_connections=2000
key_buffer_size=200M
innodb_io_capacity=1000
innodb_flush_log_at_trx_commit=2
innodb_doublewrite=0
query_cache_size=0
query_cache_type=0
innodb_thread_concurrency=0
innodb_flush_method=O_DIRECT_NO_FSYNC
metadata_locks_hash_instances=256
innodb_checksum_algorithm=CRC32
innodb_thread_concurrency=32
innodb_buffer_pool_size=4g
innodb_io_capacity=8192
innodb_buffer_pool_instances=8
innodb_adaptive_hash_index=1
loose-table_open_cache_instances=1
innodb_lru_scan_depth=8192