Are All Database Engines Just Different Flavors of the Same Thing?
Everywhere you look nowadays, you find a database: mailing lists, customer profiles, wedding organization, routing toll free telephone calls, tracking inventory… Some are small and simple. Some are huge and complicated. Each database runs in a database management system (DBMS), sometimes called a database engine.
People who have worked with computers long enough remember some of the earliest desktop engines. Most of the early desktop engines were shoved aside long ago by Microsoft Access®, although several are still available. At various times I’ve worked with desktop database engines such as Paradox®, dBase® and Alpha Five®.
For larger databases, you are probably aware of MySQL®, IBM DB2®, Oracle®, and Microsoft SQL Server®.
Many database engines are also available for environments with special priorities such as very high speed or non-relational organization of the data.
If you thought databases are just for looking something up, think again. The systems I work with are used to make money instead of count it, and there is always some kind of database at the foundation. Clever database systems can dispatch technicians with efficient schedules and routes, tell shop floor people which items from the manufacturing line are top grade and which should go in a discounted lower-grade crate, and decide when to fire up extra boilers to compensate for a cold front that is beginning to blow in.
In my consulting practice, I often meet people who have no idea how many different engines are available, let alone how differently the engines operate “under the hood” and what their strengths or weaknesses are. For many years, I have been helping them sort through this so they can choose a database engine intelligently before they build a database.
After this short primer, you may still need help to choose the right engine for your purposes, but at least you will be able to ask smart questions and get the answers you need.
Low-Priced Desktop Databases
Your dictionary might suggest a spreadsheet is a type of database. In the computing world, a spreadsheet does not even come close. Only one person at a time can open and make changes to a spreadsheet.
If the data needs to be used by a group of people and it is in a spreadsheet, eventually somebody will get frustrated about not being able to make changes when somebody else is making changes. They will make a private copy of the file, put their changes into it, and replace the group spreadsheet with their version whenever they have a chance. That wipes out any changes made by another person while the frustrated person was working in a private version. Valuable information is lost this way.
If more than one person needs to work with data, it should not be in a spreadsheet. It should be in a database.
Even the cheapest, least capable database engine will allow everyone to look at the data at the same time. When someone makes a change, only that small portion of the data (the equivalent of a row on a spreadsheet) is locked during the change to make sure another person does not make different changes to it at the same time. As soon as the change is finished, that “row” is unlocked and made available for other changes by anyone in the group. People who are only looking at the data, or who are changing other “rows” of data, are undisturbed.
If you want it to look like a spreadsheet, you can set up a spreadsheet to use special interfaces with a database to let everyone collaborate on the same data while keeping the familiar, simple look and feel of the spreadsheet.
A database can hold more than data. It can also hold scripts (programs) to carry out actions, including operations that affect some of its data or tell hardware to do something such as send alert messages or close a physical valve. Scripts can be set up to run automatically when specific conditions occur. Using the data this way is generally more efficient than using an external software program for the same actions. For one of my clients, a complex system built this way handled their factory with ease where external software written by a programmer could not keep up with the shop floor.
You Get What You Pay For
The adage “you get what you pay for” is just as true in computing as it is anywhere else. Desktop database engines can satisfy modest goals. You might be able to start out with one of these.
As your database grows larger or your needs become more sophisticated, expect to need to migrate to a more substantial engine. As an example, let’s say you run a small business with 5 employees. You have a few hundred customers. Their orders, your order fulfillment, and administrative procedures are simple. A database in Microsoft Access can handle that with no problem.
Now let’s say your business grows. By the time you hire employee number 50, you will probably either be migrating to another database engine or already be using another engine that can handle the heavier load better.
Most desktop databases in the low end of the price range are designed to work well with a few simultaneous users. As the number of users and the amount of data grow, these databases struggle. You can buy yourself a little time by putting your database on a more powerful computer, but ultimately a serious database needs a serious engine. Running a major database in a desktop engine is just as silly as entering a Formula One race with an ordinary street car.
MySQL as a Next Step
Popular relational DBMS behind many websites
MySQL has become a popular next step up the ladder among relational databases. It is often available free with website hosting services. It runs on several flavors of UNIX as well as on Windows® and OpenVMS®. Many free or cheap web applications (and some more costly ones) run on it. It can handle more data and more users with better, more tunable security than typical desktop engines.
You can do a lot with this, handling a substantial amount of data and many more simultaneous users than a desktop database can handle. However, the adage about what you pay for will strike again when you reach the next threshold.
Exactly which threshold you reach is not the same for every database. You may need better security or disaster redundancy because your data is sensitive. You may need faster performance. You may need to run on an operating system that is not UNIX or Windows® or OpenVMS®. You may be dissatisfied with the tools available to help you work with your database. Any of these is a good reason not to choose MySQL.
As an example of the kinds of dilemmas MySQL can involve, one of my clients wanted to use it for a new production system. The latest version can manage data in a way that protects against corruption if entry of some new data fails in mid-transaction, just like the heftiest database engines do. Unfortunately, we found that we had to choose between performance speed and using that protective feature.
To get the protective feature and still go fast enough, we could use a different database engine. For most people, the best known alternatives are DB2®, SQL Server® and Oracle®. They are designed to be able to cope with large, sophisticated, enterprise class databases and loads.
Microsoft SQL Server®
Microsoft’s relational DBMS for heftier needs
SQL Server® is often mistakenly called SQL. SQL stands for Structured Query Language—it is only a scripting language for interacting with databases and is not a database engine itself. Each SQL-compatible database engine has its own dialect of the SQL language.
SQL Server® is a database engine, and its dialect of SQL is called Transact-SQL® or T-SQL. The database administration toolset for SQL Server® is heavily graphical (point, click, fill in the blanks), as distinct from some other engines that are managed primarily through scripts. If you hate writing little programs to do everything with the database, SQL Server® is appealing.
As a Microsoft® product, SQL Server® is only at home on Windows®. Each version is released with multiple editions ranging from small (oriented toward the desktop and developer) to enterprise (very large and complex with heavy usage). As a general rule, the bigger the edition, the greater the capabilities and the higher the cost.
This is often the next step up from Microsoft Access because it runs on the same operating system, and because it is easy to use Access as just an interface between the user and the new database. By using Access to present the user with screens for SQL Server® data, a business can migrate upward without needing to retrain database users. The users still get the same look and feel they had in the small Access database.
This is the right solution for some databases, but not for others. Some of my clients run large databases on operating systems other than Windows or UNIX. Some want to move their database to another operating system in the future without needing to change to a new database engine at the same time. Some find that T-SQL does not offer all the scripting capabilities necessary for applications that need to run on their databases. Each of those is a viable reason to choose another engine such as Oracle.
Note: If you use SQL Server “in the cloud” you’ll want to pay attention to which cloud provider is hosting the database. In Microsoft’s Windows Azure cloud, SQL Azure is a special variation of SQL Server®. It is designed to work especially well in Azure cloud services, but its capabilities are slightly different from those of other SQL Server® flavors, which is important for database developers to know.
Cross-platform relational database for hefty needs
Oracle® and SQL Server® constantly compete for benchmark test results that outdo each other for performance. Either can be fast, manage large amounts of complex data, and handle substantial usage loads.
But Oracle runs on a wide variety of operating systems. If you want the flexibility to move to another operating system, this is obviously desirable.
Oracle is especially entwined with the OpenVMS® operating system to deliver as much performance as possible on that platform. If you cannot tolerate frequent reboots for security patches and turn pale at the mere thought of a system crash, you know it is common for OpenVMS® to run for years between reboots. You also know OpenVMS® can make multiple computers behave as though they are a single computer when they are up to 500 miles apart, offering great protection against disasters. A setup like that is certainly not cheap, but it is easy to see why some organizations demand it.
Work that requires a separate compiled software program in SQL Server® can sometimes be done with mere scripting in Oracle®. Oracle’s dialect of SQL is called PL/SQL and has capabilities T-SQL lacks.
IBM’s relational DBMS for hefty needs
Some organizations prefer IBM® mainframes instead of Windows® or UNIX servers. For these, DB2 is a frequent choice. Unlike SQL Server® and Oracle®, it makes no attempt to squeeze itself into small computers. It is meant for heavy duty use with large amounts of data and high availability—extremely serious databases. Large companies favor it for corporate databases.
The more commonly used an engine is, the easier it is to find people who have experience working with it. Because DB2® is so focused on very large databases, fewer people know how to work with it. Part of the price of using DB2® is needing to look harder for the right people to help you set it up and use it. For some organizations, that is almost as important a factor as the technical merits of a database engine.
When you have demanding requirements, you may need an engine that is not the most common. If so, experts who work with it are harder to find and may cost more because of their comparative scarcity.
Not All Databases Live on Disks
The engines mentioned so far keep data primarily in disk files. The more sophisticated of these engines may allow you to configure them so the most frequently used data is in computer RAM (random access memory), where performance is much faster, but the data in memory is sent to disk as soon as possible so it will not be lost if the system crashes.
That trick speeds up database performance, but not enough for some purposes. Telephony, spacecraft telemetry, manufacturing automation, and many other applications need to keep up with equipment instead of with comparatively slow human beings. The next step up in performance is a database that lives entirely in memory, which is much faster than disk storage.
Memory resident databases, if well designed and built, can be blazingly fast. Of course, speed comes with a trade-off. Memory resident databases disappear if the system crashes. As a fallback, this type of database saves a copy of itself to disk occasionally, like a snapshot of its condition at an instant in time. When restarted, the database uses the copy on disk to repopulate itself.
If it is gracefully shut down and restarted, the disk copy is up to date and nothing is lost. But if the system crashes, everything that occurs between the last pre-crash snapshot and the restart is lost.
Notice that I said these databases are very fast when well built. I have seen highly proprietary memory resident databases in telecommunications with mind-boggling speed. I have seen another that is commercially available, runs on multiple operating systems, and is so reliable that it is used by nuclear power stations. But I have also seen one that is commercially available, sold for automation purposes, and unable to meet the performance of leading disk resident database engines. The fact that a database is memory resident does not guarantee it will run at the speed of light.
Not All Data is Relational and Sorted into Lists
Up to now, all the engines mentioned have been relational. They organize data into tables, which you can think of as being similar to spreadsheets. Each table has fields similar to the columns in a spreadsheet, and records similar to the rows in a spreadsheet.
Tables can connect with each other through keys. If you have a table of customer contact information and another table of orders placed by customers, the customer identification code that appears in both tables will relate each customer’s contact information with all the orders that customer has placed. The customer identification code is a key.
Not all data can be readily organized that way. Sometimes you need a spaghetti bowl of interconnections. I see this most often in memory resident databases with a profound need for speed. Some of them use a maze of interconnections. A relational database may have to trace several relationships to pull together the same data, where a maze of interconnections allows a database to grab related data more directly.
Superfast databases also use a variety of methods to locate desired pieces of data more rapidly. Sorting in alphabetical, numerical or date order is natural for people, but looking up data in that type of list is not the most efficient way for a computer to do it. Some databases use hash tables, balanced binary searches or exotic mathematical formulas for extremely fast lookups.
However, the rise of web applications has accelerated other needs for less rigidity in the structure of data that goes into databases. Non-relational databases are useful where data structure is in flux. Although you might expect this to be needed mostly for exotic purposes, I know of systems at a major airport and in a large property management operation where non-relational databases are in use.
Some Databases Cast Data in Stone
A variety of other features can make one database stand out from others for a specific purpose. Most people think of information in databases as easy to find and update. But for an example of a special purpose where the data must not be updated, you need not look any farther than your medicine cabinet.
Pharmaceutical manufacturers are required to keep vast amounts of information on file about every batch of medicine they make. If drug regulators notice an alarming pattern and want to check whether something went wrong in a batch, the manufacturer needs to be able to produce that data immediately—and that data must be pristine. It must never have been changed since the batch of medicine was made and the data was captured.
For this reason, some database engines can be set so that they will not allow any information in the database to be changed. New data can be inserted, but from that moment on, the data is cast in stone.
A few database engines can keep track of the source of everything done to their data in an audit trail. Most databases require that type of tracking to be done by writing special software that users must go through to touch the data, but it is possible to get an engine which has that capability built in. This is not necessarily practical in a database that has to keep up with machinery. But in a hospital, it may be crucial to know which person made each change or addition in a patient’s records.
Options are Abundant
The engines mentioned by name in this article are not the only ones available in their categories, let alone in the entire world of databases. Before you build a database, it is worth your while to think about what you need. You will put a lot of effort and expense into creating your database, so it should be in an engine that suits what it has to do.
As an example of good versus bad choices, consider two large companies in regulated industries that need to keep specific data for years and be ready to show it to government auditors on a moment’s notice. Both chose a memory resident engine because their systems need to collect data from machinery. As mentioned earlier, if the system crashes, all data between the last “snapshot” and the crash is permanently lost.
One of these companies copies the regulated data into a disk resident engine right away. The other does not, so if the system crashes while a batch of medication is being made, the batch has to be discarded because the data about it is not complete.
If you are not sure which engine to use, ask a professional. By now, you know just saying you want a new database is not enough. Use this article to help you explain what you need. Be as specific as you can when you describe what you need to do. When you tell an expert that you want “a database to run on my favorite operating system, with minimal data loss in the event of a crash, moderate speed, high security and 200 simultaneous users,” that is a great start.
Books About Databases
Do you want to design and build databases yourself? Take care of databases after they are built? Here’s where you can start learning how to do it!
Simple enough for beginners, but informative enough for more advanced readers. This is a teaching text offering:
- An easy way to understand databases
- A quiz at the end of each chapter
- A final exam at the end of the book
- No unnecessary technical jargon
- A time-saving approach
Brief, understandable explanations of how to design a database right. Steer clear of bear traps that catch many database designers.
Techies recognize the Wrox Programmer to Programmer series for its consistently high quality. If you want to do serious work with anything regarding computers, look for Wrox books about what you’re using.
SQL comes in many dialects. It is difficult for any book about it to be engaging. This one is more visually oriented than most and gets top marks from readers and instructors. The MySQL database engine (widely available for free at website hosts) is used for examples, so readers can easily try out what is explained in the book.
The Sams series is not as consistent as the Wrox series for quality, and it is written for less-expert readers. This volume is highly rated. If you are looking for an introduction to SQL scripting, this takes a more standard approach than Head First SQL.