Go Back

Performance & Concurrency | Build Fuel Tune

Posted by Simar Paul Singh on 2019-02-26

Performance can be Built, Fueled or Tuned.

Built (Implementation and Techniques)

  • Binary Search O(log n) is more efficient than Linear Search O(n)
  • Caching can improve Disk I/O significantly boosting performance.

Fuelled (More Resources)

  • Simply get a machine with more CPU(s) and Memory if constrained.
  • Implement RAID to improve Disk I/O

Tuned (Settings and Configurations)

  • Tune Garbage Collection to optimize Java Processes
  • Tune Oracle parameters to get optimum database performance

Capacity and Load

Load is an Expectation out of system

  • It is the rate of work that we put on the system.
  • It is an factor external to the system.
  • Load may vary with time and events.
  • It has no upper cap, can increase infinitely

Capacity is a Potential of the system

  • It is the max rate of work, the system supports efficiently, effectively & infinitely
  • It is a factor, internal to the system. Maximum capacity of a system is finite and stays fairly constant. We often call Throughput as the System’s Capacity for Load.

Chemistry between Load & Capacity

  • LOAD = CAPACITY? Good Expectation matches the potential. (Unreal)
  • LOAD > CAPACITY? Bad Expectations is more than potential. (Reality)
  • LOAD < CAPACITY? Ugly Expectations is less than potential. (Waste)

Performance and Capacity Measurement

Response Time or LatencyMeasures of System’s Capacity

  • Measures time spent executing a request (Round-trip time (RTT) for a Transaction)
  • Good for understanding user experience
  • Least scalable, Developers focus on how much time each transaction takes


  • Measures the number of transactions executed over a period of time (Output Transactions per second (TPS))
  • A measure of the system’s capacity for load
  • Depending upon the resource type, It could be hit rate (for cache)

Resource Utilization

  • Measures the use of a resource (Memory, disk space, CPU, network bandwidth)
  • Helpful for system sizing, is generally the easiest measurement to Understand
  • Throughput and Response Time can conflict, because resources are limited
  • Locking, resource contention, container activity

Handle mismatch between capacity and load (Throttling & Buffering Techniques)

No one stops us to load a system more than its capacity (Max Throughput).

Transactions Per Seconds -Misconception, Real traffic may be in bursts

  • Received 3600 transactions in a hour, not sure if every second only 60 were pumped
  • Probably we received in bursts — all in first 10 minutes and for nothing last 50 minutes
  • So we really cant say, at what tps? We can regulate bursts with throttling and buffering

Throttling — (Implemented by producer to smoothen output)

  • Spreads bursts over time to smoothen output from a process
  • We may add throttles to control output rate from threads to each external interface Throttle of 10 tps ensures max output is 10 tps regardless of the load & capacity. Throttling is scheme for producers ( Check production to rate the consumer can accept)

Buffering — (Implemented by consumer to smoothen input)

  • Spreads burst over time to smoothen input from an external interface
  • We add buffering to control input rate to threads from each external interface Application processes input at 10 tps, load above it will be buffered & processed later Buffering is a scheme for consumers (Take whatever is produced, consume at our own)^

Supply Chain Principle (Apply it to define a optimum Thread Pool Size)

Thread is an abstract CPU unit resource here.

The more throughput you want, more will be the resource consumption.

You may apply this principle to define the optimum thread-pool size for a system/application.

— To support a Throughput (t) transactions per second- (t) = 20 tps

— Where each transaction takes (d) seconds to complete- (d) = 5 seconds

— We need (dt) threads at least (min size of the thread pool)- (dt) = 100 threads

To support a Throughput (t) of 20 tps Where each transaction takes(d) 5 seconds We need 100 (d*t) threads at least

100 threads kept busy executing 5 batches of 20 (100 transactions) where each takes 5 seconds to complete ( A batch of 20 coming in and leaving every second)

Quantify Resource Consumption

Utilization & Saturation

Resource Utilization

  • Utilization measures how busy a resource is.

Resource Saturation

  • Saturation is often a measure of work that has queued waiting for the resource
  • It can be measured as both
  • As an average over time

For some resources that do not queue, saturation may be synthesized by error counts. Example Page-Faults reveal memory saturation.

Load (input rate of requests) is an independent/external variable, user of a system can at any point be overloading or under consuming the system.

Resource consumption, Throughput (out-put rate of response) are a function of load and dependent on internal variables (threads for cpu, queues for memory)

How Load, Resource Consumption and

Throughput related?

  • As load increases, throughput increases, until maximum resource utilization on the bottleneck device is reached. At this point, maximum possible throughput is reached, Saturation occurs.
  • Then, queuing (waiting for saturated resources) starts to occur.
  • Queuing typically manifests itself by degradation in response times.
  • This phenomenon is described by Little’s Law:

L = X * R


  • As L increases, X increases (R also increases slightly, because there is always some level of contention at the component level).
  • At some point, X reaches Xmax — the maximum throughput of the system. At this point, as L continues to increase, the response time R increases in proportion and through-put may then start to decrease, both due to resource contention.

Performance pattern of a Concurrent Process

How Throughput and Resource Consumption are related? (Example)

Throughput & Latency can have an inverse or direct relationship. Concurrent tasks (Threads) often contend for resources (locking & contention)

Single-Threaded — Higher Throughput = Lower Latency

  • Consistent throughput, does not increase with incoming load & resources
  • Processes serially, Good for batch jobs
  • Response Time linearly varies with request order.

Multi-Threaded — Higher Throughput = Higher Latency (Most of the time)

  • Throughput may increase linearly with load, it starts to drop after threshold
  • Process Concurrently, Good for interactive modules (Web Apps)
  • Near consistent Response Time, doesn’t vary much with order but load. Single Threaded — 10 CPU(s) Multi Threaded — 10 CPU(s) Threads = 1 Latency = .1 seconds Throughput = 1/.1 = 10 tx/sec

Non linear vertical scaling with threads due to cpu latency from context switching

Producer Consumer Principle

  • The Utilization Law: Ui = T * Di
  • Where Ui is the percentage of utilization of a device in the application, T is the application throughput, and Di is the service demand of the application device.
  • The maximum throughput of an application Tmax is limited by the maximum service demand of all of the devices in the application.
  • EXAMPLE — A load test reports 200 kb/sec average throughput:

CPUavg = 80% | Dcpu = 0.8 / 200 kb/sec = 0.004 sec/kb

Memoryavg = 30% | Dmemory = 0.3 / 200 kb/sec = 0.0015 sec/kb

Diskavg = 8% | Ddisk = 0.08 / 200 kb/sec = 0.0004 sec/kb

Network I/Oavg = 40% | Dnetwork I/O = 0.4 / 200 kb/sec = 0.002 sec/kb

  • In this case, Dmax corresponds to the CPU. So, the CPU is the bottleneck device.
  • We can use this to predict the maximum throughput of the application by setting the CPU utilization to 100% and dividing by Dcpu. In other words, for this example: Tmax = 1 / Dcpu = 250 kb/sec
  • In order to increase the capacity of this application, it would first be necessary to increase CPU capacity. Increasing memory, network capacity or disk capacity would have little or no effect on performance until after CPU capacity has been increased sufficiently.

Work Pools (Queues) & Thread Pools Working Together

Work Pools are queues of work to be performed by a software application or component.

  • If all threads in thread pool are busy, incoming work can be queued in work pool
  • Threads from thread pool, when freed can execute them later

Work Pools cover up congestion & smoothen bursts

  • A queue consisting of units of work to be performed
  • CONGESTION, by allowing the current (client) threads to submit work and return
  • BURST, over capacity transaction can buffered in work pool and executed later
  • Allow for caching of units of work to reduce system intensive calls —( Ex. we can perform a bulk fetch form a database instead of fetching on record at a time)

Queuing Tasks may be risky

  • One task could lock up another that would be able to continue if the queued task were to run.
  • Queuing can smoothen in-coming traffic burst limited in time (depending upon the rate of traffic and size)
  • Fails if traffic arrives on average faster than they can be processed.
  • In general, Work Pools are in memory so it is important to understand what the impact of restarting a system is, as in memory elements will be lost.

Bounded & Unbounded Pools (Load Shedding)

If not bounded, pools can grow freely but can cause system to exhaust resources.

  • Work Pool / Queue Unbounded — (May overload Memory / Heap & crash) — Each work object in the queue stays holding the space until consumed
  • Thread Pool Unbounded — (May overload CPU / Native Space and Crash) — Each thread asks to be scheduled on CPU and consumes native stack space

If queue size is bounded, incoming execute requests block when it is full. We can apply different Policies to handle t, for example

— Reject if there is no space (Can have side affects)

— Remove based on Priority — (Ex priority may be function of time — Timeouts)

Thread Pools can have different policies when Work Pools is full:

  • Block till there is available space — Starve (VERY BAD — Sometimes Needed)
  • Run in Current Thread (Very Dangerous!)

Work pool & thread pool sizes can often be traded off for each other

Large Work-Pool and small thread pools

  • Minimizes CPU usage, OS resources, and context-switching overhead.
  • Can lead to artificially low throughput especially if tasks frequently block (ex I/O bound)

Small Work pool generally require larger thread pool sizes

  • Keeps CPUs busier
  • May cause scheduling overhead (Context Switching) and may lessen throughput. Especially if the number of CPUs are less.

This article is also aviable on my Medium publication. If you like the artile, or have any comments and suggestions, please clap or leave comments on Medium.