Cory Maass

Subscribe to Cory Maass: eMailAlertsEmail Alerts
Get Cory Maass: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Java EE Journal

J2EE Journal: Article

Database Overload

Database Overload

The primary purpose of a database is to store and retrieve data. Obvious? Sure, but every once in while it's nice to return to the basics.

For a long time we've known that good database design is essential to success. It's much easier to fix a poorly written application built on a good data model than a well-written application based on a bad data model. Massive effort has been put forth to ensure we can properly model and normalize our data. Programmers, analysts, and architects have gotten very good at solving these problems; maybe a little too good. In some cases the datacentric approach to analysis has become a database-centric solution mindset where the database server is the answer to every problem. As a result, companies are throwing the kitchen sink into the database.

Why is this database-centric approach a poor idea? It's said that those who don't know history are doomed to repeat it. So, part of the answer lies in where we've been.

A Condensed History
There are three logical parts to every application: user interface (UI), data, and business logic. The UI is traditionally the least taxing on hardware resources. Business logic processing and data management are usually much more demanding. How these partitions are allocated has evolved over the years.

With mainframes we had what amounted to a one-tier model. The UI, data, and business logic were all managed by the mainframe (see Figure 1). They could handle massive amounts of data and processing; but what happened when we ran out of storage or horsepower? We either added another mainframe or we limped along without. Each option was costly in its own way.

Fig 1

The advent of relatively cheap high-speed networking and powerful (remember the 386?) PCs opened new possibilities. Client/server is a two-tier model where the UI and business logic have shifted to the client; data remains on a centralized server. It addressed the core problem of having data and business logic together on the same tier. Figure 2 illustrates the change. Processing was distributed across multiple cheap clients, not centralized on expensive mainframes. Instead of paying $250,000+ for a mainframe, a company could spend $25,000 on a database server and $3,000 for each client. When you did the math, the per-user costs weren't always better. The real benefits, however, were increased flexibility and scalability. Clients could be added or upgraded as needed, rather than all at once. Of course, this model had its own problems: network traffic was now a development consideration and there were more physical machines to manage.

Fig 2

Fortunately people were already asking questions. Why does the business logic have to share space at all? If we have three partitions, why can't we have three tiers? That's the concept behind n-tier applications: a database server to manage the data, an application server to process the business logic, and a client to interface with the user (see Figure 3). Now companies could really target capacity upgrades. Need more storage? Add more disk space to the database server. Need more processing capability? Upgrade or cluster the app servers. Each tier has a dedicated purpose. However, the physical and software architectures are more complex and require greater upfront planning for success.

Fig 3

While n-tier gets a lot of attention these days and mainframes quietly continue to fill vital roles, client/server is still the dominant architecture in corporate America.

The Current State
The specific problem I've seen is the substantial relocation of business logic to the database server. While some processing remains on the client, the bulk of the business logic ends up in stored procedures and triggers. The application has an infrastructure that is physically client/server, but a partition distribution where the data and business logic are once again combined. This approach combines the drawbacks of both the client/server and mainframe models.

What's the big deal? Why does it matter if you put everything in the database? There are several reasons:

  • Scalability and performance: By far the biggest casualty. The database server becomes primarily an application server; something it's not really intended to be. We're forced to upgrade as our processing needs grow, not just as our storage needs grow. While the database vendors are constantly adding features in an effort to compete, a database is first and foremost meant to store and retrieve data.
  • Portability: Database languages such as PL/SQL and Transact-SQL are proprietary to their DBMS. You can't change your back end without major recoding. One company I've worked with produces a commercial PowerBuilder product. Because a significant portion of their code is in Sybase stored procedures, they have been unable to accommodate customers wanting to leverage existing Oracle or SQL Server installations.
  • Global impact: Poorly written code affects performance for everyone rather than just the client that invokes it.
  • Lack of language features: None of the database languages, PL/SQL, Transact-SQL, etc., are as robust as Perl, C#, Java, or PowerScript. The lack of inheritance, arrays, and collections, or limited flow-control statements and looping constructs means developers spend more time coding around the inherent limitations.

    The first step to fixing a problem is recognizing it. How can you tell if your company has fallen into this mindset? I've listed a few real-world examples that indicate a problem may exist:

  • Company practice does not allow DataWindows direct access to tables. They can be used only to access front-end stored procedures.
  • EJBs or other distributed components exist for the sole purpose of calling stored procedures and returning the results. The companies that do this go through the effort and expense of implementing J2EE application servers but still use the database server for processing.
  • Triggers that are in excess of 100 lines or are calling large procedures. I've seen some over 5,000 lines long calling multiple procedures just as long. Imagine doing an update on that table!
  • Stored procedures that do not manipulate or retrieve data from the database. For example, if the only purpose of a procedure is to parse a string for a client, it doesn't belong in the database.
  • Your database server spends more time processing business logic than managing data.

    In a common scenario, the company realizes there is a problem with database performance. The amount of data is moderate but the server's resource utilization is constantly high. Simple retrieves and updates take much longer than they should. An investigation is launched to find the cause, often with outside experts. Something amiss invariably turns up: a missing or fragmented index, procedures with poor transaction management, or incorrect configuration settings. The experts submit their recommendations, along with a hefty bill. We pay the bill, make the changes, and kick ourselves for our mistakes. Performance improves but not as much as we had hoped. If the data model is solid, the server configuration is correct, the stored procedures are clean, and the server is relatively powerful, how can we still have poor performance? How did the problem get missed? The analysis was done in the wrong place. It focused on the application construction, not the application architecture. It's like a building inspector making sure the walls are plumb and the wiring is grounded but failing to notice the house was built on quicksand.

    Now we know what the problem is and how to recognize it. But how did we get here? Companies come to this database-centric approach in a number of seemingly rational ways. The overall architects may have a mostly database background and it never occurs to them to look beyond the database for a solution. The desire to put processing as close to the source as possible is another reason. Logically, nothing is closer than in the database. Other times the power of the server is seductive. It's a beefy box; it can handle one more new process, right? Sure, the first couple of times. The need to centralize processing is another common cause. If I have a PowerBuilder application and a JSP Web site that use the same data, it's very tempting to put common processing in the database. That way I don't have to maintain both a PowerBuilder and a Java version of my code. The most frequent reason, though, is because it's easy. It takes almost nothing to drop a procedure onto a database server. The first few times we do any of these things there is seemingly no impact. It works, so we're lulled into thinking it's the right thing to do all the time. Unfortunately, we often don't realize there is a problem until the worst possible moment. And like many, I believe Murphy was an optimist.

    What to Do About It
    The simple answer is to get the processing out of the database. How you do it depends on your environment.

    In a client/server environment your only option is to move processing to the client. Architects, particularly database architects, tend to overlook that the client is a computer too. In today's world the clients are sometimes more powerful than the servers they access. Put them to work! Even if the client PCs can't match the server one-to-one, there are more of them. Fifty weak clients can each process a single result set faster than a lone server can process fifty sets.

    In a PowerBuilder environment you want to make sure you employ good application partitioning practices. Partitioning is too large a topic to fully cover here but basically it means using nonvisual objects (NVOs) to encapsulate your business logic. Resist the urge to put the code in windows or to make your NVOs dependent upon visual objects. This will severely limit reusability. Use encapsulation to make your objects as independent as possible. Protect their properties by making them private and wrapping them in accessor and mutator (getter and setter) functions. Group related processes in common objects. If you find you have a series of procedures to manage an order, create an order NVO. You will find that a well-partitioned application is highly reusable. It's also an important step to take if you want to move to an n-tier architecture in the future.

    In an n-tier environment, take advantage of your application server. These servers are designed for massive processing with multiple parallel clients. Object instance and database connection pooling greatly increase scalability. If you are just now moving to n-tier, the implementation is more complex than you are used to. It also requires a little bit of a mind shift because you are doing component development instead of object development. Much is the same, but now you're concerned with object life cycles, instance pooling, and component state. Michael Barlotta has written several excellent books on EAServer (Jaguar) development. While these books focus on PowerBuilder 7, the upcoming (released by time you read this) PowerBuilder 9: Internet and Distributed Application Development promises to update this topic for PowerBuilder 9. In my opinion, one of the most exciting new features of version 9 is its support for third-party J2EE app servers such as IBM WebSphere and BEA WebLogic.

    Wrap Up
    Obviously, one size does not fit all. Sometimes a trigger or stored procedure really is the best solution. It's their overuse that must be guarded against. The consequences of loading up a database server aren't always readily apparent. Look beyond the database to the overall application architecture. Processing can be moved to the client or shunted to its own space in the middle tier. Either choice will give your database server some breathing room and expand its useful life. We should all be so lucky!

  • Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.