Welcome!

@CloudExpo Authors: Kong Yang, Liz McMillan, Elizabeth White, Carmen Gonzalez, Yeshim Deniz

Related Topics: @CloudExpo, Containers Expo Blog, @BigDataExpo

@CloudExpo: Article

Findings on Database Management | @CloudExpo #Cloud #IoT #BigData

Technical decisions around data persistence are hard, which is why we surveyed 583 IT professionals

Technical decisions around data persistence are hard, which is why we surveyed 583 IT professionals on everything from current DBMS and ORM usage to modern database engines' data structures and access patterns to storing data on a mobile device.

The demographics of this survey are as follows:

  • 69% of these respondents use Java as their primary programming language at work.
  • 68% develop primarily web applications.
  • 66% have been IT professionals for over 10 years.
  • 45% work at companies whose headquarters are located in Europe, 27% in the USA.
  • 44% work at companies with more than 500 employees, 19% at companies with more than 10,000 employees.

Give the key findings below a read and let us know what you think.

Oracle, MySQL, and SQL Server Remain Head And Shoulders Above the Rest; Oracle and MySQL Remain Neck-and-Neck
The two most mature commercial DBMS offerings (Oracle and MySQL) are used in production by 51% and 49% of respondents, respectively-significantly ahead of the third-ranked DBMS (SQL Server, at 34%). The top three, and the tight race between the top two, have not changed in years, among our survey respondents as well as on the DBMS ranking aggregator dbengines.com. The nearest NoSQL challenger, MongoDB, remains a distant fourth in production environments.

NoSQL - Especially Document-Oriented - DBMS Adoption Is Significantly Greater in Nonproduction Environments
Non-production environments are more friendly to less mature and less thoroughly supported database management systems and also more likely to be affected by desire to optimize for structural fit and ease of access. In production, where data stores are often managed by specialist non-developers, factors other than developer experience and optimal match between data processing and storage and retrieval algorithms weigh into DBMS selection more heavily. NoSQL and generally less mature and/or less supported offerings should therefore be more popular in non-production environments. DBMSes that implement simpler storage models well-suited to lightweight prototyping - especially, therefore, document-oriented DBMSes - should gain an extra boost in development environments.

Accordingly, the gap between production and non-production usage is greatest for the two most mature commercial DBMS offerings: Oracle (at 51% in production vs. 37% in nonproduction) and SQL Server (34% in production vs. 25% in non-production), and the gap between the most popular NoSQL offering in production (MongoDB, at 20% adoption) and the least popular of the top three (SQL Server, at 34%) enters within the survey's margin of error in non-production environments (where MongoDB enjoys 25.4% adoption vs. SQL Server's 24.6%).

MongoDB's (static-schema-free) document orientation, familiar JSON-like document format (ordered lists supporting a variety of types), and widespread connector availability make it easy to set up without heavyweight data modeling and relatively straightforward to use for many less-data-intensive applications without cramping application architecture or code. Indeed, many non-relational stores are easier to spin up quickly than a full-power RDBMS. Some benefits of the relational model (especially integrity enforcement) are less relevant in nonproduction environments, where updates don't always need to propagate across all entities.

Note: of the top three DBMSes, only MySQL enjoys greater adoption in non-production vs. production environments. MySQL is especially likely to be many developers' default nonproduction RDBMS, presumably because it is popular, open source, mature, familiar, and supported by a strong community. (For the importance of familiarity in developers' preference for a particular data persistence technology, see the upcoming section on matching storage model to data structure.)

Applications Are Almost as Likely to Use Two Storage Models as One
Developers may use more than one storage model in different applications with no reference to the work done by the application; variety by developer speaks more about the human than about the technology. But variety of storage models within a single application indicates "polyglot" persistence - that is, how many storage models are used to persist data where technical and business needs overlap. Among our respondents, nearly as many respondents typically use two storage models in their applications (38%) as use one (40%). This result confirms that "NoSQL" is better understood as "Not Only SQL" because the most popular storage model (given DBMS and query language usage data) remains relational. Based on DBMS adoption data, the second most popular storage model by user count is probably document-oriented; but because other storage models (especially column-oriented, graph, and key-value) are particularly well suited to analytical processing of many data rows, further research is required to discover storage model usage by data volume. In any case, the near-parity between one and two storage models per application indicates increasing interest in matching persistence mechanism to the structures of data to be persisted.

Matching Storage Model to Data Structure: Modeling Graph Data
Graph structures do not fit the relational model comfortably. In a relational database, most (Shannon) information is stored in the columns and rows of each table; the schema is a technical construct designed to enforce data integrity, make the data model more legible, and make the querying model more efficient; not to encode more information. In a graph, however, most information is stored in the structure of the nodes and the edges; additional information about nodes and edges is treated as metadata. Yet many real-world entities are most naturally represented as graphs: social, travel, and trade networks; packet routes; control flows; etc. Storing graph structures in tabular storage is inelegant and inefficient even at first, static only glance; but the problem gets worse in a dynamic setting. Because a graph's computational complexity may diverge wildly from its combinatorial complexity, reducing a graph to a relational schema (e.g.,  two-column mapping tables that relate a row in one table to a row in another - that is, modeling nodes as columns and edges as rows in a new table) may work far better for some algorithms than for others (in ways that are not immediately obvious from the graph itself).

Nevertheless, three factors encourage developers and DBAs to store data that is naturally modeled as a graph in a relational DBMS: first, the maturity of relational DBMSes; second, the simplicity and familiarity of SQL (which 90% of respondents use regularly); and third, the availability and maturity of powerful object-relational mappers (ORMs) that make relational data easily accessible (often with automatic and highly effective optimizations) from application code.

Accordingly, only a small minority (20%) of respondents persist data that is naturally modeled as a graph in a specialized graph DBMS. Further, more respondents store naturally-graph data in a relational database without explicit modeling of edges as rows (39%) than with node-node mapping tables (31%). We expect this distribution to change as graph DBMSes and query languages grow more familiar, as tooling ecosystem around these DBMSes approaches the maturity of ORMs, as inefficiencies introduced by storage-structure mismatch grow more expensive as graph data volume increases, and as use cases (and corresponding storage and retrieval algorithms) grow more varied.

Two possibly linked correlations are also worth noting. First, the largest chunk of respondents who store graphs in a relational database without explicit modeling of edges use Oracle (25%)-probably the most mature and most thoroughly optimized RDBMS. Second, the largest chunk of respondents who store graphs in relational database WITH node-node mapping tables use MySQL (24%), which is also the only RDBMS that gains popularity in non-production vs. production environments. This difference may be a function of both the greater likelihood that MySQL will be used for experimental purposes - where graph problems, insofar as conceptually farther from actuarial use (for which relational databases are a more natural fit), are more likely to appear.

Matching Processing Approach to Storage Model: Use and Enjoyment of Orms
Most developers use SQL (90%) but the relational algebra does not naturally capture object-orientation. Objects do not fall into Venn diagrams; but objects and relational tables do share enough structure that, for many simple (few-join) access patterns, the so-called object-relational impedance mismatch does not cause catastrophic performance or integrity loss. Accordingly, object-relational mappers (ORMs) are not only widely used, but also preferred by a majority of developers. In response to our question, "What persistence-related technology do you most enjoy working with?" 58% of respondents answered that they most enjoy working with ORMs. Of these, 70% specifically enjoyed working with Hibernate -  probably a function of both Hibernate's maturity and also our respondents' heavy focus on Java. Although the tail of most-enjoyed data persistence technologies was quite long (26 distinct technologies), Spring Data emerged as the most popular comprehensive data access framework by far (16%).

Reasons Developers Enjoy Working with a Data Persistence Technology
Just under two-thirds of all respondents who named the persistence-related technologies they enjoy working with also specified why they enjoyed working with those technologies. Grounded-theoretic "bucketing" analysis yielded seven (somewhat overlapping) reasons to enjoy a persistence technology (listed in order of popularity): ease of use, simplicity, adherence to standards, familiarity, performance, high level of control, and scalability. The most popular reason by far was ease of use (34%), followed by simplicity in distant second (21%). The top four reasons relate more directly to developer experience than to outcomes (such as performance and scalability), as the wording of the question ("enjoy") indicated. Additional research is required to determine how developer experience relates to persistence-related technology selection, especially because many less-familiar (NoSQL) technologies are optimized for scalability and general performance for certain use cases.

Handling Scale: Data Is Partitioned as Frequently as It Is Not, But This Is Often Successfully Made Invisible to Developers
Modern storage engines, across all storage models, are highly optimized for current hardware, access patterns, and network performance. Theoretically massive inefficiencies of the relational storage model sometimes dominate the advantages offered by a higher degree of maturity among RDBMSes, although newer engines store data in structures that are less narrowly tuned to read-heavy loads using slow (spinning) physical media than (for example) B+ trees. But as Big Data strategies aggressively drive data storage and processing needs, data scale becomes increasingly difficult to manage.

To keep performance and availability high, data is often partitioned on physical and logical lines. Among our survey respondents, 38% partition data in some way (vertical, horizontal, or functional) vs. 40% who do not - a difference within the survey's margin of error (5%). Two research follow-ups would prove interesting: first, what specific data volumes (or velocities), application requirements, and infrastructure constraints drive what kinds of partitioning; and second, which storage models are more likely to require partitioning (although application constraints presumably affect both choice of storage model and partition size/need). It would appear, however, that distributed data techniques designed to manage CAP trade-offs are often effective: 22% of respondents - most of whom are developers and not DBAs -  were not even aware of whether or not their databases were partitioned - a sign that, for nearly a quarter of developers, physical splitting of data had no visible impact on their development work.

For more information on Database and Data Persistence Tools and Techniques, please visit: https://dzone.com/guides/data-persistence-2

More Stories By John Esposito

John Esposito is Editor-in-Chief at DZone, having recently finished a doctoral program in Classics from the University of North Carolina. In a previous life he was a VBA and Force.com developer, DBA, and network administrator. John enjoys playing piano and looking at diagrams, and raises two cats with his wife, Sarah.

@CloudExpo Stories
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
Amazon has gradually rolled out parts of its IoT offerings in the last year, but these are just the tip of the iceberg. In addition to optimizing their back-end AWS offerings, Amazon is laying the ground work to be a major force in IoT – especially in the connected home and office. Amazon is extending its reach by building on its dominant Cloud IoT platform, its Dash Button strategy, recently announced Replenishment Services, the Echo/Alexa voice recognition control platform, the 6-7 strategic...
Bert Loomis was a visionary. This general session will highlight how Bert Loomis and people like him inspire us to build great things with small inventions. In their general session at 19th Cloud Expo, Harold Hannon, Architect at IBM Bluemix, and Michael O'Neill, Strategic Business Development at Nvidia, discussed the accelerating pace of AI development and how IBM Cloud and NVIDIA are partnering to bring AI capabilities to "every day," on-demand. They also reviewed two "free infrastructure" pr...
Judith Hurwitz is president and CEO of Hurwitz & Associates, a Needham, Mass., research and consulting firm focused on emerging technology, including big data, cognitive computing and governance. She is co-author of the book Cognitive Computing and Big Data Analytics, published in 2015. Her Cloud Expo session, "What Is the Business Imperative for Cognitive Computing?" is scheduled for Wednesday, June 8, at 8:40 a.m. In it, she puts cognitive computing into perspective with its value to the busin...
SYS-CON Events announced today that Hitachi, the leading provider the Internet of Things and Digital Transformation, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Hitachi Data Systems, a wholly owned subsidiary of Hitachi, Ltd., offers an integrated portfolio of services and solutions that enable digital transformation through enhanced data management, governance, mobility and analytics. We help globa...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across supply chain networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost and time for product recall as well as advance trade. Are you curious about Blockchain and how it can provide you with new opportunities for innovation and growth? In her session at 20th Cloud Exp...
Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & ...
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
Financial Technology has become a topic of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 20th Cloud Expo at the Javits Center in New York, June 6-8, 2017, will find fresh new content in a new track called FinTech.
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 add...
In recent years, containers have taken the world by storm. Companies of all sizes and industries have realized the massive benefits of containers, such as unprecedented mobility, higher hardware utilization, and increased flexibility and agility; however, many containers today are non-persistent. Containers without persistence miss out on many benefits, and in many cases simply pass the responsibility of persistence onto other infrastructure, adding additional complexity.
@DevOpsSummit at Cloud taking place June 6-8, 2017, at Javits Center, New York City, is co-located with the 20th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long developm...
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the USA and Europe, we work with a variety of customers from emerging startups to Fortune 1000 companies.
Cloud Expo, Inc. has announced today that Aruna Ravichandran, vice president of DevOps Product and Solutions Marketing at CA Technologies, has been named co-conference chair of DevOps at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
SYS-CON Events announced today that Juniper Networks (NYSE: JNPR), an industry leader in automated, scalable and secure networks, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Juniper Networks challenges the status quo with products, solutions and services that transform the economics of networking. The company co-innovates with customers and partners to deliver automated, scalable and secure network...
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, pane...
SYS-CON Events announced today that Twistlock, the leading provider of cloud container security solutions, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Twistlock is the industry's first enterprise security suite for container security. Twistlock's technology addresses risks on the host and within the application of the container, enabling enterprises to consistently enforce security policies, monitor...