Mature Optimization - Part 1

Optimization has become a tricky subject in the world of software development. The term "premature optimization" is thrown around a lot. Sometimes you hear stories about developers who spend all their time worrying about performance problems they don't actually have. Other times you hear about sluggish, resource hungry systems that have to be heavily reworked or completely scrapped because performance wasn't considered at all. So how does one keep their projects from such fates? Over the next few posts I'll discuss how to go about optimization in a sane, safe, and effective way.

A delicate topic

{{ start vocabulary rant }}

Frankly I don't find the term "premature optimization" very helpful. Undertones of male inadequacy aside, it's a truism. Of course you don't want to optimize prematurely; "too soon" is always the wrong time to be doing something. Furthermore, it doesn't tell you anything about the right time to optimize. It turns out there's a lot that goes into answering that question--much more than a pithy quip could embody. It comes down to having a good reason to take on the burdens that come with optimization. It's less about getting the timing right as it is about justifying the changes. I think the term "unjustified optimization" captures the spirit a little better. But I digress...

{{ end vocabulary rant }}

Optimize at your own risk

So why not just optimize the hell out of everything from the start? Who doesn't want software that runs faster, uses less resources, and can scale better? All that sounds pretty good. Well, the problem is that optimizations usually come at a price. They tend to make your code more complex, buggy, and difficult to successfully modify.

A good example of this would be adding a cache. Caches can have a huge impact on performance, but they can also have a huge impact on the complexity and number of bugs you have to deal with. Because they essentially involve creating a secondary data store, if the cache is not carefully kept in sync with the primary data store, major bugs happen and everyone gets sad. Adding a cache without making sure you really need one is very don't-do-that.

Another common optimization-related issue you see today is over-allocation of cloud computing resources. This one is more about over-compensating for perceived performance issues. It's the software industry's version of buying a giant, loud truck with three foot wheels. Amazon's EC2 service offers computers with a very wide range of abilities. You can spin up tiny little machines with about as much computational power as your phone, all the way up to machines at near supercomputer levels. I once spun up a box with 32 cores and 128 GB of RAM. It was epic. Of course the beefier boxes come with beefier prices. Amazon sure doesn't make it simple to figure out exactly how much your resources are going to cost you, but you can be sure that an m4.10xlarge is going to be expensive. So it becomes pretty important to understand exactly how resource intensive your software is going to be. Sure it will work on a screaming box, but do you really need all that power?

Don't optimize at your own risk

Most developers are well aware of the trade-offs involved with optimizing code and, because of this, there tends to be some hesitation about doing it. Some people seem so concerned with committing the premature optimization sin that they are afraid to even consider optimizations of any kind. Too often this ends up meaning that you wait until your system is in production and buckling under the weight of its users before you decide to optimize something. Unfortunately, at this point you're under a lot of pressure to get a fix out before users start jumping ship. Since not all optimizations can be done quickly or easily, this usually forces developers to cut corners, avoid addressing the real issue, and releasing something that is full of bugs or doesn't go far enough to fix the problem.

I once worked on a system with a SQL Server back end. A lot of the data access code was embodied in stored procedures (barf) that made heavy use of SQL Server's XML features to turn row data into XML documents (double barf). Things worked fine until the user base started ramping up and we had more people working on the system at the same time. Certain actions that queried a lot of data, like searches, tended to spike the CPU to 100%. We were having trouble understanding why database access was pegging the CPU until we discovered that SQL Severs XML operations tend to require a lot of CPU power to support all the parsing. Unfortunately for us, the users discovered this issue before we did and they were not happy about it. The system was under so much pressure at peak times that some users were having trouble logging in! Oh boy. Several very pressure filled days later we had reworked and retested the data access logic in a few key areas of the system so they no longer relied on the XML parsing. This had a dramatic improvement on performance but we had already taken a major black eye from the incident. User confidence was shaken. Surely there must be a better way.

There is a better way

There's a process behind keeping on track of your system's performance. The idea is to give yourself enough information to make good decisions about potential performance issues coming down the line and how to best address them.

1. Goals - Set some performance goals early in the life of the project and continue to revisit them and adjust them as needed. This is the foundation of your system's performance. Without a reasonable set of goals in place, you're just making guesses at how your system should perform. These are metrics like total number of users, maximum concurrent users, supporting resource intensive activities, the amount of data being stored, and how data will be accessed.

2. Testing - You'll need to test your system to see if it can meet the current set of performance goals in place. This type of testing can take many forms, but common ones are load testing, stress testing, raw speed testing, and scalability testing. You'll probably spend a lot of time on this step. In fact, if you're not careful you can spend way too much time here. You have to make careful choices about what to test, how you carry out those tests, and how you record the results. This must be an efficient process so you can gather all the data you need as quickly as possible.

3. Monitoring - It's critical that you monitor usage patterns and make sure you know how your users actually interface with your software. This is key because it can help you predict when your users are going to start pushing the boundaries of your system. There are a lot of great tools out there that can help you monitor what's happening with your software and the machines on which it's running. What you have available to you will depend partially on the technology stack that you are using.

4. Optimizations - Finally, we get to actual optimizations! Notice how many steps are required to get here. It fits together like this:

When testing and monitoring have determined that performance is or will soon be out of line with your goals, it's time to optimize.

Of course, the optimization that you will be engaging in will depend totally on the particulars of the circumstances. However, the basic formula is still the same. If you've set things up correctly, you have a set of tests that show which goals your system is not currently meeting. Now you put an optimization in place, re-run your tests, and see where that leaves you. Rinse and repeat until your system meets its current set of performance goals.

The idea is to have a clear picture, backed by real data, of what your system is capable of. That will help you see problems coming and give you time to address them before things get out of hand. Notice that I said "backed by real data". It's very important that you collect enough data about your system to make informed decisions. Guessing will not do. Hard decisions require hard facts.

Next time

So that's basically the process. It's not magic, but it is a fair amount of work and something that must kept up throughout the life of the system. You may not ever end up putting any optimizations in place but at least you'll know that you don't need to rather than wondering and hoping that everything works out.

In my next post I'll dig a bit deeper into each of these activities (goals, testing, monitoring, optimizing) and give some pointers on how to go about putting them into practice.

#optimization #testing #maintainability #performance

Recent Posts