Wednesday, November 28, 2007

The Single Point of Failure

To many engineers and even non-technical people the concept of a single point of failure seems to be quite simple and easy to comprehend. However, when it comes to software design and architecture, this concept seems to be pushed back on priorities.

The following is just an observation. By no means I want to start a war of words, platforms and concepts between Windows, Mac or Linux camps. This reflection is meant to be more of an intellectual exercise and a peaceful and useful discussion. I hope and wish that this reflection will lead to smart conclusions and more stable and reliable software products in the future.

So, when it comes to a single point of failure what comes to mind are these examples of software designs: Windows Registry file vs. /etc configuration fies in Linux/Unix systems and a single Outlook PST file vs. mbox vs. maildir formats for desktop email applications.

Windows Registry vs. /etc/ files

Windows
Windows and a great majority of applications written for this OS keep their configuration settings in a single and large database-like file, named Windows Registry. While there may be a lot of engineering reasons why authors of this OS decided to keep ''all eggs in one basket'', considering a single point of failure is definitely not one of them. If a Windows machines is not powered off properly or if the registry file gets corrupted, what happens with the system? The obvious answer - there is a very good chance that the system even will not boot up. I am sure there are a lot of frustrated users and sysadmins who went through this stage of 'registry recovery' and can attest to this statement. However, my point is not in arguing that Windows is a bad OS and everyone should move to Linux or Mac, but rather do we keep building mission critical software with such a single point of failure or we address it in most cases? After all, operating systems, regardless where they are installed on NASA space ships or your laptop, is a mission critical piece of software without which nothing else will matter and everything else depends on it. The only difference between these two extremes are the costs of failure - a loss of lives, reputation, jobs and millions of dollars or just your most important documents and photos of your loved ones.

To Microsoft credit, the registry and overall Windows reliability became much better in the recent years, but registry issues continue to happen in Vista. Besides, when you consider that if you server runs the same Windows and keeps all its critical configuration parameters in the same single file, how can you truly rely on such a server for your mission critical applications?

So, what is an alternative approach here?

Unix/Linux/Mac
It's easy and it's been with us since the IT world began. Instead of keeping everything in a single database or a single file, Unix, Linux and Mac OS X systems and software written for these platforms tend to keep their configuration settings in a set of individual files. For example, Apache Web server on Linux systems keeps its configuration settings in a single or a few files (/etc/httpd/conf/httpd.conf or /etc/... apache.conf), while network configuration parameters for the system is kept in a set of different files (/etc/sysconfig/network, /etc/sysconfig/network-scripts/ifcfg-eth0 and others). On Mac OS X, these configuration files are stored in .plist files (property list configuration files), such as (com.apple.mail.plist or com.apple.addressbook.plist, etc.) Moreover, some software for these platforms (including Apache web server) allow the sysadmin to decide not only where to keep these configuration files, but also whether to keep them in a single file or a small set of these files (e.g. ssl.conf and conf.d/ configuration folder).

Now, what happens if the this Apache configuration file gets corrupted or accidentally deleted? Apache won't start while the rest of your system services and software will keep running as nothing happened. And of course, your system will boot up and will be partially working.

It is understandable that there is no such thing as an ideal tool for all problems and all possible application/usage contexts. It's also clear that multiple configuration files have their disadvantages also. Even when you have multiple files a system still has many other single points of failures to worry about, such as a power supply, a motherboard, hard disks, a boot loader, a kernel, file system partition tables, file allocation tables (FAT) or file tree structures, such as i-node structures on Linux and Unix file systems. Besides, backing up and restoring a lot of configuration files in different folders can also be more time-consuming than backing up and restoring a single configuration file that keeps all the settings for everything.

Besides reliability Windows machines become significantly slower within time. One of the causes of this performance deterioration is an ever-increasing Windows registry filesize. When Windows updates are installed or when new software is added, the registry file keeps growing and growing, adding more and more configuration settings. Removing unneeded and unused software has almost no impact on the filesize since most configuration settings will be kept there. Also, the fact that Windows registry keeps a record of all (or most of) DLL libraries ("DLL hell") does not help it at all. The larger registry file becomes, the greater amount of computer memory (RAM), hard disk I/O, and CPU power is used when Windows or other applications access the registry, making the whole system performs a lot slower. The system boot-up process also suffers taking more time than before.

Outlook PST vs mbox vs maildir

Outlook
Now, this comparison is even more interesting. It does not only deal with reliability of your email application, but also with its performance. It seems that Microsoft really sticks with its 'single file' approach for many of its software designs and applications. But why?

Maybe, one of the reasons the single file approach worked for Microsoft and many other software engineers and vendors, who chose this design concept, is based on the fact that amount of information stored was insignificant and it was easy to transfer and back up a single mailbox file, and program around it. Ten years ago average mailbox sizes were less than 200 MBs for most of business users and therefore the single file concept worked 'ok'.

Today, however, email became more important and heavily used communication then before. A number of email users and an average message size grew significantly while a daily volume email messages and an average attachment filesize skyrocketed. Ten years ago predominant email message format was plain text, today is either rich-text or HTML, adding extra 'weight' to the message size because of all HTML and rich text tags. Ten years ago, digital photography and graphics heavy documents were not that widespread. Today, sending 5MB pictures became standard. And this is just a start. If you peek into the future, you will see that soon an average photo filesize will be 20MB or maybe even 100MB (just watch this demo where Steve Jobs playing with 4GB Library of Congress photo).

Regardless of the reason why Microsoft went ahead with a single database-like PST file for their killer email application, Outlook, will no longer work unless there is some dramatic changes are made to its architecture.

Just like for OS, keeping 'mission critical data' in a single file for email application is also a very risky and erroneous design decision. Besides lowering reliability, performance it also lacks the same level of scalability as its alternatives.

Just like Windows registry, when your Outlook mailbox size gets older and usually significantly larger, Outlook is more likely to crash. When it starts, it takes forever (because it does some sort of re-indexing) and it occasionally hangs up when you run searches or move messages between folders. What elevates this problem even further is the fact that Outlook starts to have a monstrous appetite for your CPU, hard disk and RAM. Making nearly impossible to run another resource-intensive application on the same machine.

Usually, majority of users start to experience problems with Outlook when it's filesize grows to over 2-3GB. As a temporary solution you can archive your mailbox and this will decrease amount of information stored in your main PST file since archived content will be moved to a different 'archive' PST file. Eventually you will run into this problem again.

mbox and maildir
The main alternatives to a single file approach that Outlook uses are mbox and maildir formats. The mbox format (that Thunderbird uses) stores contents of all messages from a single folder in a single mbox file, such as Inbox and stores index for that folder in a different file (Inbox.msf). This architecture improves reliability, performance and scalability for modern-size mailboxes. Instead of having a single file, an email application, such as Thunderbird, operates a set of different smaller files. If one file gets corrupted or deleted, the rest of mailbox does not suffer. Moreover, if index files are deleted (e.g. Inbox.msf) Thunderbird will re-build the file next time it starts. This format performs better and has a capacity to handle a significantly larger mailbox because message content is distributed in different folder files that are smaller than a single PST file.

Unlike Outlook PST file and mbox folder files, maildir format stores each email message in a separate file. This design concept brings performance, reliability and scalability to the new and higher level than the single file or mbox approaches*. Additionally, since email messages are stored as individual files other software, such as search engine (or search engine crawler to be exact) can work with these messages without requiring a special filter or plugin to be built to filter an application specific format.

Outlook, Thunderbird, Apple Mail
This whole post or
reflection is not based only on pure theoretical analysis of the single point of failure design concept and how its performs in comparison to its alternatives. The author of this post used extensively Outlook (97, 2000, 2003), Thunderbird (version 1.5.x and 2.x on Windows, Linux and Mac) and Apple Mail (2.0 and 3.0) in the business settings for 9 years.

Although the following comparison of email programs does not depend only on the design concept used for mailbox storage but also on application specifics, the performance aspect still heavily depends on mail storage architecture.

My real-life mailbox migrated from Outlook to Thunderbird and later to Apple Mail. When it was last used in Outlook, it was 3GB in size. I successfully used TB when the mailbox grew to 4GB. Apple Mail enjoyed its growth to 5GB. Apple Mail beats Outlook and even TB in overall performance (especially in search). TB had a few of crashes (mostly on Windows). Apple Mail had no crash until version 3.0 came out. That did lead to a couple crashes but mainly due to Growl and WideScreen plugin. The shortest start-up time award also goes to Apple Mail.

If it was not for Outlook's strong business focus and a successful and tight integration of contacts, calendar, notes and mailing functionality, and several other features and usability, TB or Apple Mail would have been a good complete replacements long time ago. TB stands the best chance to become an 1-for-1 alternative to Outlook thanks to its amazing cross-platform capability and some new features. In order to TB to succeed in the business world it needs expand its mail-only focus and provide a tight integration with contacts, notes, calendar functionality and MS Exchange.

------------------------------
* Of course, this statement requires more extensive benchmark tests. It is also possible that on some old filesystems (such as FAT16) or outdated PCs, desktop email applications that use maildir format may not have the same performance gain as on modern and more powerful systems.

4 comments:

Anonymous said...

Great article with lots of useful information, easy to read and understand. A huge "Thank you" to its author.

Brendan Scott said...

As I understand it, one of the design goals for the Registry (in Win95ff) was to inhibit the installation of software other than through an installer (thereby shifting control of s/ware installation to Microsoft). The registry system therefore permits the creation of hidden keys so that installers can check whether the software has already been installed (eg time limited sample s/ware). Unless you have a single file (=point of failure) this "benefit" of the registry would be defeated.

Alex said...

Brendan,

Thank you for your input. The more ideas we gather on why Microsoft went with this single file approach not only in registry, but also in Outlook and other software, the better we understand what design goals the vendor was trying to accomplish and compare it against stability/reliability cost/benefit.

Alex.

Max said...

A very interesting comparison of Maildir vs other formats is given here (using objective benchmarks)

http://www.courier-mta.org/mbox-vs-maildir/