Introducing new type of benchmark

Traditionally the most benchmarks are focusing on throughput. We all get used to that, and in fact in our benchmarks, sysbench and tpcc-mysql, the final result is also represents the throughput (transactions per second in sysbench; NewOrder transactions Per Minute in tpcc-mysql). However, like Mark Callaghan mentioned in comments, response time is way more important metric to compare.

I want to pretend that we pioneered (not invented, but started to use widely) a benchmark methodology when we measure not the final throughput, but rather periodic probes (i.e. every 10 sec).
It allows us to draw “stability” graphs, like this one

where we can see not only a final result, but how the system behaves in dynamic.

What’s wrong with existing benchmarks?

Well, all benchmarks are lie, and focusing on throughput does not get any closer to reality.

Benchmarks, like sysbench or tpcc-mysql, start N threads and try to push the database as much as possible, bombarding the system with queries with no pause.

That rarely happens in real life. There are no systems that are pushed to 100% load all time.

So, how we can model it? There are different theories, and the one which describes user’s behavior, is Queueing theory. In short we can assume that users send requests with some arrival rate (which can be different in the different part of day/week/month/year though). And what is important for an end user is response time, that is how long the user has to wait on results. E.g. when you go to your Facebook profile or Wikipedia page, you expect to get response within second or two.

How we should change the benchmark to base on this model ?
There are my working ideas:

Benchmark starts N working threads, but they all are idle until asked to handle a request
Benchmarks generates transactions with a given rate, i.e. M transactions per second and puts into a queue. The interval between arrivals is not uniform, but rather distributed by Exponential distribution, with λ = M. That how it goes if to believe to the Poisson process.
For example, if our target is arrival rate 2 queries per second, then exponential distribution will give us following intervals (in sec) between events: 0.61, 0.43, 1.55, 0.18, 0.01, 0.76, 0.09, 1.26, …
Or if we represent graphically (point means even arrival):

As you see interval is far from being strict 0.5 sec, but 0.5 is the mean of this random generation function. On the graph you see 20 events arrived within 9 seconds.
Transactions from the queue are handled by one of free threads, or are waiting in the queue until one of threads are ready. The time waiting in the queue is added to a total response time.
As a result we measure 95% or 99% response times.

What does it give to us? It allows to see:

What is the response time we may expect having a given arrival rate
What is the optimal number of working threads (the one that provides best response times)

When it is useful?

At this moment I am looking to answer on questions like:
– When we add additional node to a cluster (e.g. XtraDB Cluster), how does it affect the response time ?
– When we put a load to two nodes instead of three nodes, will it help to improve the response time ?
– Do we need to increase number of working threads when we add nodes ?

Beside cluster testing, it will also help to see an affect of having a side on the server. For example, the famous problem with DROP TABLE performance. Does DROP TABLE, running in separate session, affect a response time of queries that handle user load ? The same for mysqldump, how does it affect short user queries ?

In fact I have a prototype based on sysbench. It is there lp:~vadim-tk/sysbench/inj-rate/. It works like a regular sysbench, but you need to specify the additional parameter tx-rate, which defines an expected arrival rate (in transactions per second).

There are some results from my early experiments. Assume we have an usual sysbench setup, and we target an arrival rate as 3000 transactions per second (regular sysbench transactions). We vary working threads from 1 to 128.

There are results for 16-128 threads (the result is 99% response time, taken every 10 sec. the less is better)

We can see that 16 threads give best 99% response time (15.74ms final), 32 threads: 16.75 ms, 64 threads: 25.14ms.
And with 128 threads we have pretty terrible unstable response times, with 1579.91ms final.
That means that 16-32 threads is probably best number of concurrently working threads (for this kind of workload and this arrival rate).

Ok, but what happens if we have not enough working threads? You can see it from following graph (1-8 threads):

The queue piles up, waiting time grows, and the final response time grows linearly up to ~30 sec, where benchmark stops, because the queue is full.

I am looking for your comments, do you find it useful?

Follow @VadimTk

PlanetMySQL Voting: Vote UP / Vote DOWN

Introducing new type of benchmark

Trending Articles

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Angry father ordered to compensate daughter’s male friend

Moondru Mudichu 20-07-2016 – Polimer tv Serial

Download: Rich Bizzy -Panono Ukwenda (Cover)

Sniper: Ghost Warrior 3: Трейнер/Trainer (+17) [1.0 - 1.02] {FLiNG}

IN COURT: Full list of people sentenced at Northampton Magistrates’ Court

GERVASE JOHN

Gordian S01e01-73 [H264 - Ita Jap Ac3 - SoftSub Ita]

Ndebele names

Hyper-V replication "Enabling Replication Failed"

FLASHBACK WITH SIRASA FM AT GALGAMUWA 2022

Prison officer charged!

Jessica Carradero Lopez Arrested by Miami-Dade County Corrections on Dec 17,...

Anthony Wahome Biography, Family, Wife and Children

Who’s been sentenced at Northampton Magistrates’ Court

Reply: Betrayal at House on the Hill:: Rules:: Re: Haunt #6 - Spoilers Within

Jamani mm nauliza hivi second selection za form five zinatoka lini?

(NOTES & Audio) The 12 Sources for Islamic Shariah Parts 1 & 2

Madonna – Behind Me (feat. Guido Dos Santos) – Single [iTunes Plus M4A]

Laura Pausini - Platinum Collection (3Cd) (2009) .mp3 - 320 Kbps