Typical server workloads include lots of I/O operations as data is transferred across network connections and written to disk. Analyzing the thread profile of these workloads usually results in a chart that looks something like this (generally with one thread per CPU core):
the longer length of the red i/o segments relative to the green cpu segments is intended as a not-to-scale illustration of the fact that typical disk or network i/o operations are thousands of times slower than typical cpu and ram operations. without asynchronous i/o (aio) in most server operating systems and environments, a thread that invokes an i/o operation yields its cpu resource (usually a single cpu core) for the duration of the i/o operation and resumes cpu processing once the i/o operation has completed.
in a single-threaded environment, this means that cpu cores spend the majority of their time in an idle state since the typical duration of i/o operations is so much higher than cpu operations. multi-threading addresses that problem by allowing a single cpu core to service multiple threads in interleaved fashion. the thread profile of a multi-threaded server workload often looks much like this:
now each cpu core can pick up processing work from a new thread such as t2 once its current thread has invoked an i/o operation and yielded. of course, the new t2 thread is likely to invoke i/o operations of its own, too, so to keep cpu core utilization high, each cpu core must be able to pull from a large pool of threads with pending cpu work.
that is where the scaling problem arises because in today’s healthcare hardware environment with the proliferation of multi-core server processors, hundreds or thousands of threads could be required to maintain reasonable levels of cpu utilization. unfortunately, maintaining large numbers of threads is expensive and diverts computational resources from the user’s workload to the operating system’s internal “book-keeping” tasks for active threads.
aio has been around in various forms for a number of years in both windows and unix environments, but the competing aio implementations on unix platforms do not match the uniform and complete aio implementation that windows provides. the windows implementation of aio is one that is especially well-suited to addressing the problem of scaling up workloads with lots of i/o operations.
the key to solving the scaling problem is aio’s ability to sever the 1:1 connection between a thread and a work item. with synchronous i/o operations, any thread that initiated an i/o operation would be suspended for the duration of the i/o operation so that it could be awakened upon completion of the i/o operation to resume processing. with aio, any thread that invokes an i/o operation is immediately released as available for other work. once that i/o operation completes, the operating system acquires any thread that is available for work and schedules it to resume work with the results of the i/o operation.
as a specific example, in the illustration above, if arbitrarily-numbered thread t1 is processing work item w5 and initiates i/o operation i7, it is immediately marked as available for work and may pick up any other available work item such as w9. once i/o operation i7 completes, the operating system tasks the first available thread t3 with resuming work item w5 with the results of i/o operation i7. this approach essentially segments each work item into stages delineated by i/o operations and to allow for the processing of each stage of a work item on a different thread.
multiply the illustration by the number of cpu cores in a typical server and one begins to see that the net effect of that segmentation is a relatively small pool of threads is usually sufficient to handle the processing of a large number of work items while maintaining high cpu utilization. the scalability improvement on server workloads that are heavy on i/o with lots of cpu cores can be dramatic.
at corepoint health, we began introducing aio in corepoint integration engine in the v5.0 release and have continued to expand the range of i/o operations that are performed asynchronously through the 6.0 release. as a result of those technical changes, we have seen a steady and substantial increase in the engine’s scalability. we’ve seen existing servers handle increased workloads and the computational resources in larger servers are more efficiently utilized for message processing workloads because of our adoption of aio in the engine.
as an important aside – because of its potential for increasing scalability, aio has gotten lots of press in software engineering circles. developers who are constantly looking for ways to improve their products may wish to test the performance of aio on their particular workloads and in their particular environments. one pitfall to avoid, though, is the idea that aio improves single-threaded performance.
in general, aio operations perform comparably or perhaps slightly slower than their synchronous counterparts in single-threaded workloads. as illustrated above, the scalability improvements with aio come from reducing the number of threads that are required to process highly concurrent workloads, so make sure you do your performance testing in a concurrent environment if you want an accurate indicator of its scalability potential.