Welcome!

@CloudExpo Authors: Liz McMillan, Jason Bloomberg, Pat Romanski, Ed Featherston, Elizabeth White

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Agile Computing, @BigDataExpo, SDN Journal

@CloudExpo: Article

Can We Finally Find the Database Holy Grail? | Part 2

A truly distributed DBMS with ACID transactional guarantees could address the key pain points in modern database management

In Part One of this three part series I talked about distributed transactional databases being the Holy Grail of database systems. Among other things the promise of such systems is to provide on-demand capacity, continuous availability and geographically distributed operation. But the historical approaches to building distributed transactional databases have involved unacceptable trade-offs, and as a result general purpose databases systems predicated on a single-server architecture have dominated the industry for decades.

In our quest for the Holy Grail of databases we acknowledge that there are folks who have given up on the highly desirable characteristic of transactional consistency in favor of a distributed operation. That is a trade-off that may be attractive if you can't find a way to scale-out transactions, but it is a drastic choice that moves a lot of complexity and cost up the application stack. If there is a way to scale out transactional databases then that is clearly a much better outcome. Our Holy Grail discussion is specifically about distributed transactional databases because if such a thing can be built without a substantial downside then no one would want any other distributed data store.

Is there a way to build one of these things? I am excited by the potential of a new design that we call a Durable Distributed Cache, but I'd like to first lay out the three categories of designs that have been proposed historically.

Shared-Disk Databases
The idea of Shared-Disk designs is fairly self-explanatory. What you do is to have several machines in a cluster all loading and storing pages of data from the same shared disk subsystem. A good example of a Shared-Disk system is Oracle Real Application Clusters. The database system tracks updated pages, writing them to disk, tracks "stale" pages, invalidating them, and implements some kind of a network lock manager that coordinates concurrency conflicts on a distributed basis. Per our example above you need to be very aware of the nature of your workload.

Workloads that require minimal coordination can scale fairly well on Shared-Disk systems. In the extreme case consider an all-read workload. This is likely to be I/O-bound before it is limited by inter-machine coordination delays. The I/O-bound issue can become a problem, because you don't increase I/O when you add more machines. If your database performance is limited by disk I/O and not by computational bandwidth then there is no point adding more machines.

But coordination is often required between machines, and in these cases Shared-Disk systems are often limited by lock-based coordination protocols. The core database management system (DBMS) design is typically built around the idea of efficiently managing tightly coupled memory and disk pages. Consequently, the introduction of synchronous distributed locking to guarantee page ownership introduces both latency problems and thrashing problems that can easily bring these systems to their knees. Careful allocation of transactions to machines to maximize affinity can ameliorate these issues to some extent, but this increases the fragility of the systems and adds substantially to long-term DBA expenses.

Shared-Disk systems are at their best for workloads that are highly read dominated or in which the data can effectively be partitioned by machine.

Shared-Nothing Databases
These are both arguments in favor of Shared-Nothing designs. Shared-Nothing is really a cute way of saying "partitioning." In other words you essentially run multiple databases, carefully arranging that part of the dataset that resides in each partition. To take a trivial example of a database containing all US citizens, you might put everyone with last names starting with A-L in one partition and everyone else in the other partition. Or you could put all male citizens in one and all female citizens in the other. For each incoming transaction you detect which data it needs to access and then send it to the relevant partition.

You might ask why, if you are essentially just running multiple databases with the data split between them, then why it is a DBMS issue at all? Surely the application can just do that itself, using any traditional client/server database system? The answer is that of course you can do that, and it is called "Sharding." Big Internet applications, like Facebook, do this, and in some cases they run many thousands of shards. Shared-Nothing distributed databases try to do this transparently for you, in particular striving to help you with the really hard problems of how to optimally partition the data, how to perform transactions efficiently when they touch data in multiple partitions, how to deliver transactional semantics across partitions, how to rebalance the partitions as the database grows, and so on.

Once again, partitioned stores can be very effective for certain kinds of workloads but you have to be very careful to arrange your queries and data layout to make sure it is optimized. For some cases there is no good way to partition the data. In all cases there is a limit to how many partitions you can profitably create, which means there is a limit to the degree that the system can scale out. If you are running a workload that has any appreciable proportion of cross-partition transactions, you will most likely need specialized low-latency interconnects between your servers. And the partitioning model is not very dynamic, so it does not address the desire for on-demand capacity management.

One last comment on partitioned stores: They are not all disk-based. In-memory systems typically also use partitioning in order to exploit multiple database servers in parallel.

Synchronous Commit Databases
There is a third traditional way of building distributed databases. You could call it the obvious way, the brute force way or, more technically, Synchronous Commit (Google calls it "Synchronous Replication").

In a transactional DBMS an application makes changes to the data before committing or rolling back the transaction. If the transaction is successfully committed then the state of the database has been (atomically) updated with the full set of committed changes. If the commit fails then the canonical state of the database is unchanged. And if the application performs a rollback then the uncommitted changes are discarded. The Synchronous Commit approach has been adopted in various forms over the years, not least in the form of Two Phase Commit (2PC) protocols.

The idea of Synchronous Commit is simply that when an application makes a request for the DBMS to commit a transaction, the DBMS performs the Commit in multiple locations before returning success. The best modern example of Synchronous Commit is Google F1. F1 is the database system that Google uses to support Google AdWords, and which will support many more Google applications in due course. Self-evidently it works for Google, and for a very demanding application. Google specifically notes that they could not address the AdWords challenge using NoSQL technologies or MySQL approaches.

Although Google have made it work, the disadvantage of Synchronous Commit is stated here in their own whitepaper: The Google F1 White Paper: Synchronous replication implies higher commit latency, but we mitigate that latency by using a hierarchical schema model with structured data types and through smart application design". Also F1 relies on state-of-the-art Google networking to reduce inter-node latency, and the existence of atomic clocks on every participating machine in order to support a common notion of true time. The obvious challenge with Synchronous Commit is latency (in other words delays). There are some applications for which throughput is more important than transaction latency but for most modern applications the reverse is true.

A less obvious challenge relating to Synchronous Commit is managing error and failure conditions. Designing distributed synchronous commit protocols is relatively straightforward if you can assume 100% reliability. It is recovery detection and recovery procedures that are the hard part of the problem, and the designer is generally left with a trade-off between maintaining transactional guarantees for a large number of recovery scenarios, avoiding significant performance degradation, and limiting implementation complexity. In consequence of the latency issues and failure management challenges the Synchronous Commit approach has had limited success historically.

Summary
It is clear that a truly distributed DBMS with ACID transactional guarantees could address the key pain points in modern database management, notably on-demand capacity, continuous availability and geo-distributed operation. The three traditional designs unfortunately deliver much less than those promised advantages and involve costs, complexities and/or functional limitations that limit their usefulness and general applicability.

In my next post I aim to introduce a new design approach, one that we call Durable Distributed Cache. By stepping back and rethinking database design from the ground up Jim Starkey has come up with an innovative solution that makes very different trade-offs. The net effect is a system that scales-out/in dynamically on commodity machines and virtual machines, has no single point of failure, and delivers full ACID transactional semantics.

More Stories By Barry Morris

Barry Morris is CEO & Co-Founder of NuoDB, Inc. An accomplished software CEO with over 25 years of industry experience in the USA and Europe, running private and public companies ranging in scale from early startup phase to 1,000+ employees, he loves to build companies around industry-changing paradigm-shifts in technology. Morris was previously CEO of StreamBase and Iona Technologies.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
SYS-CON Events announced today that Cedexis will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Cedexis is the leader in data-driven enterprise global traffic management. Whether optimizing traffic through datacenters, clouds, CDNs, or any combination, Cedexis solutions drive quality and cost-effectiveness.
SYS-CON Events announced today that Mobile Create USA will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Mobile Create USA Inc. is an MVNO-based business model that uses portable communication devices and cellular-based infrastructure in the development, sales, operation and mobile communications systems incorporating GPS capabi...
SYS-CON Events announced today that Enroute Lab will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enroute Lab is an industrial design, research and development company of unmanned robotic vehicle system. For more information, please visit http://elab.co.jp/.
SYS-CON Events announced today that Suzuki Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Suzuki Inc. is a semiconductor-related business, including sales of consuming parts, parts repair, and maintenance for semiconductor manufacturing machines, etc. It is also a health care business providing experimental research for...
IBM helps FinTechs and financial services companies build and monetize cognitive-enabled financial services apps quickly and at scale. Hosted on IBM Bluemix, IBM’s platform builds in customer insights, regulatory compliance analytics and security to help reduce development time and testing. In his session at 21st Cloud Expo, Lennart Frantzell, a Developer Advocate with IBM, will discuss how these tools simplify the time-consuming tasks of selection, mapping and data integration, allowing devel...
Cloud-based disaster recovery is critical to any production environment and is a high priority for many enterprise organizations today. Nearly 40% of organizations have had to execute their BCDR plan due to a service disruption in the past two years. Zerto on IBM Cloud offer VMware and Microsoft customers simple, automated recovery of on-premise VMware and Microsoft workloads to IBM Cloud data centers.
Why Federal cloud? What is in Federal Clouds and integrations? This session will identify the process and the FedRAMP initiative. But is it sufficient? What is the remedy for keeping abreast of cutting-edge technology? In his session at 21st Cloud Expo, Rasananda Behera will examine the proposed solutions: Private or public or hybrid cloud Responsible governing bodies How can we accomplish?
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
Many companies start their journey to the cloud in the DevOps environment, where software engineers want self-service access to the custom tools and frameworks they need. Machine learning technology can help IT departments keep up with these demands. In his session at 21st Cloud Expo, Ajay Gulati, Co-Founder, CTO and Board Member at ZeroStack, will discuss the use of machine learning for automating provisioning of DevOps resources, taking the burden off IT teams.
SYS-CON Events announced today that Nihon Micron will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Nihon Micron Co., Ltd. strives for technological innovation to establish high-density, high-precision processing technology for providing printed circuit board and metal mount RFID tags used for communication devices. For more inf...
SYS-CON Events announced today that mruby Forum will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. mruby is the lightweight implementation of the Ruby language. We introduce mruby and the mruby IoT framework that enhances development productivity. For more information, visit http://forum.mruby.org/.
In his session at @ThingsExpo, Greg Gorman is the Director, IoT Developer Ecosystem, Watson IoT, will provide a short tutorial on Node-RED, a Node.js-based programming tool for wiring together hardware devices, APIs and online services in new and interesting ways. It provides a browser-based editor that makes it easy to wire together flows using a wide range of nodes in the palette that can be deployed to its runtime in a single-click. There is a large library of contributed nodes that help so...
SYS-CON Events announced today that Ryobi Systems will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ryobi Systems Co., Ltd., as an information service company, specialized in business support for local governments and medical industry. We are challenging to achive the precision farming with AI. For more information, visit http:...
SYS-CON Events announced today that SIGMA Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. uLaser flow inspection device from the Japanese top share to Global Standard! Then, make the best use of data to flip to next page. For more information, visit http://www.sigma-k.co.jp/en/.
SYS-CON Events announced today that Daiya Industry will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Daiya Industry specializes in orthotic support systems and assistive devices with pneumatic artificial muscles in order to contribute to an extended healthy life expectancy. For more information, please visit https://www.daiyak...
Today traditional IT approaches leverage well-architected compute/networking domains to control what applications can access what data, and how. DevOps includes rapid application development/deployment leveraging concepts like containerization, third-party sourced applications and databases. Such applications need access to production data for its test and iteration cycles. Data Security? That sounds like a roadblock to DevOps vs. protecting the crown jewels to those in IT.
SYS-CON Events announced today that B2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. B2Cloud specializes in IoT devices for preventive and predictive maintenance in any kind of equipment retrieving data like Energy consumption, working time, temperature, humidity, pressure, etc.
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, paneli...
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp em...
Most of the time there is a lot of work involved to move to the cloud, and most of that isn't really related to AWS or Azure or Google Cloud. Before we talk about public cloud vendors and DevOps tools, there are usually several technical and non-technical challenges that are connected to it and that every company needs to solve to move to the cloud. In his session at 21st Cloud Expo, Stefano Bellasio, CEO and founder of Cloud Academy Inc., will discuss what the tools, disciplines, and cultural...