Welcome!

Cloud Expo Authors: Noel Wurst, Dana Gardner, Liz McMillan, Elizabeth White, Unitiv Blog

Related Topics: Cloud Expo, Java, SOA & WOA, Web 2.0, Big Data Journal, SDN Journal

Cloud Expo: Article

Can We Finally Find the Database Holy Grail? | Part 2

A truly distributed DBMS with ACID transactional guarantees could address the key pain points in modern database management

In Part One of this three part series I talked about distributed transactional databases being the Holy Grail of database systems. Among other things the promise of such systems is to provide on-demand capacity, continuous availability and geographically distributed operation. But the historical approaches to building distributed transactional databases have involved unacceptable trade-offs, and as a result general purpose databases systems predicated on a single-server architecture have dominated the industry for decades.

In our quest for the Holy Grail of databases we acknowledge that there are folks who have given up on the highly desirable characteristic of transactional consistency in favor of a distributed operation. That is a trade-off that may be attractive if you can't find a way to scale-out transactions, but it is a drastic choice that moves a lot of complexity and cost up the application stack. If there is a way to scale out transactional databases then that is clearly a much better outcome. Our Holy Grail discussion is specifically about distributed transactional databases because if such a thing can be built without a substantial downside then no one would want any other distributed data store.

Is there a way to build one of these things? I am excited by the potential of a new design that we call a Durable Distributed Cache, but I'd like to first lay out the three categories of designs that have been proposed historically.

Shared-Disk Databases
The idea of Shared-Disk designs is fairly self-explanatory. What you do is to have several machines in a cluster all loading and storing pages of data from the same shared disk subsystem. A good example of a Shared-Disk system is Oracle Real Application Clusters. The database system tracks updated pages, writing them to disk, tracks "stale" pages, invalidating them, and implements some kind of a network lock manager that coordinates concurrency conflicts on a distributed basis. Per our example above you need to be very aware of the nature of your workload.

Workloads that require minimal coordination can scale fairly well on Shared-Disk systems. In the extreme case consider an all-read workload. This is likely to be I/O-bound before it is limited by inter-machine coordination delays. The I/O-bound issue can become a problem, because you don't increase I/O when you add more machines. If your database performance is limited by disk I/O and not by computational bandwidth then there is no point adding more machines.

But coordination is often required between machines, and in these cases Shared-Disk systems are often limited by lock-based coordination protocols. The core database management system (DBMS) design is typically built around the idea of efficiently managing tightly coupled memory and disk pages. Consequently, the introduction of synchronous distributed locking to guarantee page ownership introduces both latency problems and thrashing problems that can easily bring these systems to their knees. Careful allocation of transactions to machines to maximize affinity can ameliorate these issues to some extent, but this increases the fragility of the systems and adds substantially to long-term DBA expenses.

Shared-Disk systems are at their best for workloads that are highly read dominated or in which the data can effectively be partitioned by machine.

Shared-Nothing Databases
These are both arguments in favor of Shared-Nothing designs. Shared-Nothing is really a cute way of saying "partitioning." In other words you essentially run multiple databases, carefully arranging that part of the dataset that resides in each partition. To take a trivial example of a database containing all US citizens, you might put everyone with last names starting with A-L in one partition and everyone else in the other partition. Or you could put all male citizens in one and all female citizens in the other. For each incoming transaction you detect which data it needs to access and then send it to the relevant partition.

You might ask why, if you are essentially just running multiple databases with the data split between them, then why it is a DBMS issue at all? Surely the application can just do that itself, using any traditional client/server database system? The answer is that of course you can do that, and it is called "Sharding." Big Internet applications, like Facebook, do this, and in some cases they run many thousands of shards. Shared-Nothing distributed databases try to do this transparently for you, in particular striving to help you with the really hard problems of how to optimally partition the data, how to perform transactions efficiently when they touch data in multiple partitions, how to deliver transactional semantics across partitions, how to rebalance the partitions as the database grows, and so on.

Once again, partitioned stores can be very effective for certain kinds of workloads but you have to be very careful to arrange your queries and data layout to make sure it is optimized. For some cases there is no good way to partition the data. In all cases there is a limit to how many partitions you can profitably create, which means there is a limit to the degree that the system can scale out. If you are running a workload that has any appreciable proportion of cross-partition transactions, you will most likely need specialized low-latency interconnects between your servers. And the partitioning model is not very dynamic, so it does not address the desire for on-demand capacity management.

One last comment on partitioned stores: They are not all disk-based. In-memory systems typically also use partitioning in order to exploit multiple database servers in parallel.

Synchronous Commit Databases
There is a third traditional way of building distributed databases. You could call it the obvious way, the brute force way or, more technically, Synchronous Commit (Google calls it "Synchronous Replication").

In a transactional DBMS an application makes changes to the data before committing or rolling back the transaction. If the transaction is successfully committed then the state of the database has been (atomically) updated with the full set of committed changes. If the commit fails then the canonical state of the database is unchanged. And if the application performs a rollback then the uncommitted changes are discarded. The Synchronous Commit approach has been adopted in various forms over the years, not least in the form of Two Phase Commit (2PC) protocols.

The idea of Synchronous Commit is simply that when an application makes a request for the DBMS to commit a transaction, the DBMS performs the Commit in multiple locations before returning success. The best modern example of Synchronous Commit is Google F1. F1 is the database system that Google uses to support Google AdWords, and which will support many more Google applications in due course. Self-evidently it works for Google, and for a very demanding application. Google specifically notes that they could not address the AdWords challenge using NoSQL technologies or MySQL approaches.

Although Google have made it work, the disadvantage of Synchronous Commit is stated here in their own whitepaper: The Google F1 White Paper: Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design". Also F1 relies on state-of-the-art Google networking to reduce inter-node latency, and the existence of atomic clocks on every participating machine in order to support a common notion of true time. The obvious challenge with Synchronous Commit is latency (in other words delays). There are some applications for which throughput is more important than transaction latency but for most modern applications the reverse is true.

A less obvious challenge relating to Synchronous Commit is managing error and failure conditions. Designing distributed synchronous commit protocols is relatively straightforward if you can assume 100% reliability. It is recovery detection and recovery procedures that are the hard part of the problem, and the designer is generally left with a trade-off between maintaining transactional guarantees for a large number of recovery scenarios, avoiding significant performance degradation, and limiting implementation complexity. In consequence of the latency issues and failure management challenges the Synchronous Commit approach has had limited success historically.

Summary
It is clear that a truly distributed DBMS with ACID transactional guarantees could address the key pain points in modern database management, notably on-demand capacity, continuous availability and geo-distributed operation. The three traditional designs unfortunately deliver much less than those promised advantages and involve costs, complexities and/or functional limitations that limit their usefulness and general applicability.

In my next post I aim to introduce a new design approach, one that we call Durable Distributed Cache. By stepping back and rethinking database design from the ground up Jim Starkey has come up with an innovative solution that makes very different trade-offs. The net effect is a system that scales-out/in dynamically on commodity machines and virtual machines, has no single point of failure, and delivers full ACID transactional semantics.

More Stories By Barry Morris

Barry Morris is CEO & Co-Founder of NuoDB, Inc. An accomplished software CEO with over 25 years of industry experience in the USA and Europe, running private and public companies ranging in scale from early startup phase to 1,000+ employees, he loves to build companies around industry-changing paradigm-shifts in technology. Morris was previously CEO of StreamBase and Iona Technologies.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Cloud Expo Breaking News
All too many discussions about DevOps conclude that the solution is an all-purpose player: developer and operations guru, complete with pager for round-the-clock duty. For most organizations that is not the way forward. In his session at DevOps Summit, Bernard Golden, Vice President of Strategy at ActiveState, will discuss how to achieve the agility and speed of end-to-end automation without requiring an organization stocked with Supermen and Superwomen.
The Internet of Things promises to transform businesses (and lives), but navigating the business and technical path to success can be difficult to understand. In his session at 15th Internet of @ThingsExpo, Chad Jones, Vice President, Product Strategy of LogMeIn's Xively IoT Platform, will show you how to approach creating broadly successful connected customer solutions using real world business transformation studies including New England BioLabs and more.
“The Internet of Things is a wave that has arrived and it’s growing really fast. The concern at Aria Systems is making sure that people understand the ramifications of their attempts to monetize whatever it is they build on the Internet of Things," explained C Brendan O’Brien, Co-founder and Chief Architect at Aria Systems, in this SYS-CON.tv interview at the Internet of @ThingsExpo, held June 10-12, 2014, at the Javits Center in New York City. Internet of @ThingsExpo 2014 Silicon Valley, November 4–6, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading IoT industry players in the world.
“About two years ago Brother launched a new group called Brother Online. We thought it was a good idea for a hardware company to get into the cloud services market and our first step into that market was web conferencing,” explained Courtney Behrens, Senior Marketing Manager at Brother International, in this SYS-CON.tv interview at the 14th International Cloud Expo®, held June 10-12, 2014, at the Javits Center in New York City. Cloud Expo® 2014 Silicon Valley, November 4–6, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading Cloud industry players in the world.
"So many teams have been stuck in their own practices for the last ten years and it's really hard to sell all the latest and greatest tools, but the reality is all the tools around operations have evolved so much you are starting to see a big shift toward upgrading that tool set, focusing more on automation, more on cloud," explained Dustin Whittle, Developer Evangelist at AppDynamics, in this SYS-CON.tv interview at the DevOps Summit, held June 10-12, 2014, at the Javits Center in New York City.
“We provide disaster recovery services as well as solutions. We also provide back-up solutions that work across your internal on-premise assets as well as in the public and private cloud," stated Joel Ferman, Vice President of Marketing at InMage Systems, in this SYS-CON.tv interview at the 14th International Cloud Expo® (http://www.CloudComputingExpo.com/), held June 10-12, 2014, at the Javits Center in New York City. Cloud Expo® 2014 Silicon Valley, November 4–6, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading Cloud industry players in the world.
SYS-CON Events announced today that Harbinger Systems will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Harbinger Systems is a global company providing software technology services. Since 1990, Harbinger has developed a strong customer base worldwide. Its customers include software product companies ranging from hi-tech start-ups in Silicon Valley to leading product companies in the US and large in-house IT organizations.
“Distrix fits into the overall cloud and IoT model around software-defined networking. There’s a broad category around software-defined networking that’s focused on data center, and we focus on the WAN,” explained Jay Friedman, President of Distrix, in this SYS-CON.tv interview at the Internet of @ThingsExpo, held June 10-12, 2014, at the Javits Center in New York City. Internet of @ThingsExpo 2014 Silicon Valley, November 4–6, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading IoT industry players in the world.
“Dell Cloud Manager is a cloud management environment for the consumption of cloud resources and we provide a consistent interface, both in terms of API and in terms of user interface for doing a wide variety of activities core to deploying and operating software in cloud," explained James Urquhart, Technologist & Director of Cloud Management Solutions at Dell, in this SYS-CON.tv interview at the 14th International Cloud Expo®, held June 10-12, 2014, at the Javits Center in New York City. Cloud Expo® 2014 Silicon Valley, November 4–6, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading Cloud industry players in the world.
SYS-CON Events announced today that the Cloud Standards Customer Council (CSCC ) has been named “Media Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. The Cloud Standards Customer Council is an end user advocacy group dedicated to accelerating cloud's successful adoption, and drilling down into the standards, security and interoperability issues surrounding the transition to the cloud. The Council will provide cloud users with the opportunity to drive client requirements into standards development organizations and deliver materials such as best practices and use cases to assist other enterprises.
“Ambernet Technologies is an innovative software company and we’ve been very focused on building enterprise grade cloud management and cloud brokerage software," explained Troy Halford, CSO of Ambernet Technologies, in this SYS-CON.tv interview at the 14th International Cloud Expo®, held June 10-12, 2014, at the Javits Center in New York City. Cloud Expo® 2014 Silicon Valley, November 4–6, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading Cloud industry players in the world.
“We introduced our product called AgilData. It’s basically the first Agile view, a streaming view of looking at a database and data as a streaming real-time dynamic capability," explained Cory Isaacson, CEO/CTO of CodeFutures Corporation, in this SYS-CON.tv interview at the 14th International Cloud Expo®, held June 10-12, 2014, at the Javits Center in New York City. Cloud Expo® 2014 Silicon Valley, November 4–6, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading Cloud industry players in the world.
“As the move to the cloud started, we stayed ahead of that by providing security solutions to our enterprise customers, financial customers, and now a whole new range of customers, which are application developers," explained John Gunn, VP of Corporate Communications for VASCO Data Security, in this SYS-CON.tv interview at the 14th International Cloud Expo®, held June 10-12, 2014, at the Javits Center in New York City. Cloud Expo® 2014 Silicon Valley, November 4–6, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading Cloud industry players in the world.
The industry is heated with debates on whether adopting private or public cloud is the smartest, best, cheapest, you name it choice. But this debate is missing the mark. Businesses shouldn’t be discussing public vs. private, but rather how can they make the two work together to their greatest advantage. The ideal is to merge on-premise and off-premise into a seamless environment that can be managed as a single entity – a forward-looking stance that will eventually see major adoption. But as of late 2013, hybrid cloud was still “rare,” noted Gartner analyst Tom Bittman. In his session at 15th Cloud Expo, Marten Mickos, CEO of Eucalyptus Systems, will discuss how public clouds need on-premise satellites to win and, conversely, how on-premise environments cannot be really powerful unless they are connected to the public cloud. It’s not two competing worlds; it’s two dimensions of the same world.
SYS-CON Events announced today that the Wall Street Technology Association (WSTA) has been named "Association Sponsor" of SYS-CON's 15th International Cloud Expo®, which will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. The Wall Street Technology Association (WSTA®) provides financial industry technology professionals, vendors, service providers, and consultants with forums to learn from and connect with each other. The WSTA facilitates seminars and networking events where members meet and exchange ideas and best practices that assist them in effectively capitalizing on technology advances and dealing with financial industry business challenges. Founded in 1967, the WSTA is a not-for-profit association with a long history of evolving to meet the needs of its members.