A Bit is a Terrible Thing to Waste

[Today’s random Sourcerer profile: https://sourcerer.io/adnanrahic]

Get Off My Lawn!

That’s what you might expect from assembly language guru when asked about the latest JavaScript library (this week’s release, not last week) or awkwardly named Ruby gem.

They will inevitably make the argument that it all has to translate into assembly language at some point. And they’re right.

Is extreme optimization an art or an obsession? Or both? Is there a healthy balance? Those are the questions I’ll attempt to answer.

Fly Me To the Moon

You’ve probably heard that we flew men to the moon with computers that are less powerful than today’s solar calculators at the dollar store. The comparison is absolutely accurate, but perhaps a tad misleading. There was no need for a general purpose computer like the ones that sit on our desks or in our pockets today. The computer on-board the Apollo spacecraft was designed to perform the tasks required, not to run multiple operating systems, connect to the internet, and play MP4 videos.

Even adjusting for the quantum leaps in technological advancement, the scope of computing has broadened significantly. Rather than run direct instructions programmed for a specific task, computers are expected to run an operating system with virtual memory management and an execution platform for other programs, written in both compiled and interpreted languages.

To Compile or Not, Is That the Question?

Before we delve into that, we need to define terms. Yes, many developers will know the difference between compiled and interpreted languages, but since this article is both for new and seasoned developers alike, I’d like to make sure we’re on the same page before we go further.

Compiled languages, like C/C++, take moderate to low level source code and convert it to directly executable binary files. While C++ includes a runtime, this paper-thin layer is generally far less intrusive and memory consuming than interpreted approaches. All other things being equal, they will generally provide the best performance on the same hardware.

Interpreted languages like Ruby, Python, BASIC, Java, etc., run on virtual machines — pieces of software that provide a predictable and contained execution environment no matter the underlying operating system. These virtual environments are like mini-computers, responding to and filtering output from the program they are running. This layer provides OS-abstraction, allows for high language design features, and provide exception handling capabilities.

It’s easy to resort to a compiled versus interpreted argument, but the quest for performant software design doesn’t stop there. A well-written interpreted program can easily outperform a poorly written assembly program. Developer skill and experience plays an undeniably large role in the quality of the application, which will directly relate to performance.

But it goes deeper than that. When optimizing a program, developers will often profile it and try to improve the sections of code that take the longest time to execute. Concentrating on the most time-consuming part of the program makes sense, but doesn’t necessarily constitute a complete picture of the optimization process. Eliminating performance bottlenecks is important, but there is so much more to optimization.

Time is Money, Friend

If we broaden our scope to the overall topic of optimization, we must consider the big picture. Resource consumption, including memory, CPU time, disk space, and network bandwidth, are vital metrics. If your program performs tasks quickly but consumes half of system memory, can it truly be considered performant?

Google Chrome is an excellent case in point. Google engineers spent lots of time adding features under Chrome’s hood that increase security and perceived performance. Because of this, many considered it the “fast browser” and dumped Firefox and Internet Explorer.

As websites became more complex, memory usage steadily climbed. Sandboxing tabs, which provides increased security and reliability, and pre-rendering pages, which increases apparent performance by having a page ready when a user clicks it, costs tremendous amounts of RAM. On a well-equipped or lightly loaded system, this isn’t a problem. But when resources are low, these apparent performance features create a big problem.

All By Myself

Programs are given a completely private virtual memory space on modern operating systems. This has tremendous security and reliability benefits, and I doubt you’d find many developers who think virtual memory is a bad idea. This memory alchemy built into the x86 architecture is incredibly helpful.

Back when programs were in segmented memory space (DOS and before), any program could read and write to memory owned by another process. The operating system would dole out memory addresses, but there was nothing stopping a program from poking its nose where it shouldn’t belong. This was a horrible organization for multitasking, no doubt.

But this tight arrangement, paired with the lack of memory in those days, forced programmers to not only consider apparent performance but to mind their memory usage. The most efficient programs were careful not to waste any bytes. One could argue that programs were less complex at that time. In terms of features, I’d agree. But the depth of architecture design and skill evident in code crafted from that era is certainly on par with today’s software engineers.

DOOM 64

Before you think I’m simply waxing nostalgic, consider the incredible graphical and musical displays in Commodore 64 demos that are still made to this day. Complex video effects that are easily years ahead of the platform are accomplished by adjusting video timing and performing on the fly pallet swaps. Hobbyists have even ported DOOM to the Commodore, a software title published for DOS a full 11 years after the introduction of the beloved C64.

DOOM needed so much RAM to run that it used a DOS extender, allowing programs to use more than 640 KB of RAM. Yet the C64 has only 64 KB, not all of which is available. Of course, graphics had to be modified and features paired down, but this incredible feat shows what is possible with extreme optimization.

The 80/20 Rule

The Pareto principle, better known as the 80/20 rule, states that eighty percent of accomplishments come from twenty percent of the efforts. The rest of the time, generally eighty percent, is spent on the remaining bits. Laying down the core features of a program is of paramount importance and thus constitute the bulk of the apparent work. But often, the bulk of the real work is spent on optimization and bug fixing.

In the above mentioned C64 demos and C64 DOOM, an argument could be made that such obsessive optimizations go far beyond the 80/20 rule. Developers likely spend the overwhelming majority of their time on squeezing every last ounce of performance from this antiquated hardware. And for what purpose? There is no monetary gain in optimizing software for ancient systems.

Though I don’t know a C64 demo author personally, I would wager that the motivation is to create a piece of refined art by pushing the machines years ahead of their limits. The results of a guru meditation unfold in front of our eyes when we see or use a piece of well-polished, highly optimized software.

Practical Magic

Rather than explore extreme hardware-specific optimization that can pigeonhole your system requirements, I’d like to discuss some compatible yet performant strategies for optimizing your programs.

Worker Processes

If your application has to perform many long-running tasks (especially web applications), consider starting a worker process and executing these tasks in the background. You can use a simple database (MariaDB for multi-server applications, or even sqlite3 or text files for simpler applications) to keep track of assigned jobs, which process ID is serving them, and their status so you can periodically report back to the user.

Specific Use of Assembly

For server processes and desktop applications, using assembly language (especially inline assembly in C programs) can offer a tremendous benefit while keeping the bulk of your application in an easier to manage language. Here’s a trivial example to get the ball rolling. Here, I add 3 to the argument “a” and return the result.

#include <stdio.h>
int fast_function(int a) {
__asm__ __volatile__(
"addl %%ebx, %%eax"
:"=a"(a)
:"a"(a), "b"(3)
);
return a;
}
int main() {
int a = 3;
int b;
b = fast_function(a);
printf("The result is: %d\n", b);
return 0;
}

NOTE: This example was written with GNU C which uses the AT&T assembler syntax. Your compiler may vary, so please check your compiler’s documentation on inline assembly.

If using inline assembly isn’t possible in your chosen language, you can likely include an external library that has assembly language. Baring that, you can create a separate binary executable to perform that highly specific and performance-sensitive task.

Let the User Decide

Not all computers, workflows, or scenarios will be best served by a specific optimization strategy. If practical, include a switch or configuration option that allows the user to choose the optimization strategy that works best for them.

For example, a user may be willing to use a lot more RAM for increased performance. Or, on a memory constrained system, preservation of resources may be the paramount concern.

Wall Time Will Tell

To the larger philosophical question in software development, we are left to ask: is this obsessive optimization worth it? Are performance-obsessed developers merely savaging for scraps of time that are no longer cost-effective to save? Or is the art of pushing a machine, and the developer, beyond their stated limits, an unquantifiable stroke of creativity?

This undoubtedly must be answered on a case by case, and ultimately developer by developer basis. Regardless, the practical benefits of at least some optimization insanity are undeniable.


Follow me and other software engineers on Sourcerer Blog