Welcome!

@CloudExpo Authors: Liz McMillan, Elizabeth White, Pat Romanski, Akhil Sahai, Rishi Bhargava

Related Topics: @CloudExpo

@CloudExpo: Blog Feed Post

An Introduction to the Data Cloud

Running a Data Cloud in production presents a new set of challenges for DevOps

As data has grown exponentially at many sites, companies have been forced to horizontally scale their data.  Some have turned to sharding of databases like Postrres or MySQL , while others have switched to newer NoSQL data systems.  There have been many debates in the last few years about SQL vs. NoSQL data management systems and which is better.  What many have failed to grasp, though, is how similar these systems are and how complex they both are to run in production in high scale.

Both of these systems represent what I call a Data Cloud. This Data Cloud is logical data set spread across many nodes.  While developers have heated debates about which system is better and how to design code around it, those in DevOps usually struggle with very similar issues because the two systems are mostly the same.  Both systems

  • Run across many nodes with large amounts of data flowing between them and from/to the application
  • Strain both the hardware of all nodes, and the network connecting them
  • Maintain duplicate data across nodes for fault tolerance, and must have failover ability
  • Must be tuned on a per node and cluster-wide bases
  • Must allow for growth by adding additional nodes.

Running this Data Cloud in production presents a new set of challenges for DevOps, many of which are not well understood or addressed.  One of the main challenges is the management and monitoring of these systems, for which few (if any) tools or products exist at this time.

When systems were smaller and you ran a single Database in production, you probably had all the necessary systems in place.  With a plethora of products for Management, monitoring, visualizing data, and backups, it was not hard to be successful and meet your SLAs.

But now all this is much more complex once you move into the world of the Data Cloud.  Now you have a large number of nodes, all representing the same system and still needing to meet the same SLAs as the old simple DB from before.  Let us look at the challenges for running a production Data Cloud successfully.

Capacity Planning

Do you know how many nodes you need?  How many nodes do you put in each replica set?  How much latency and throughput do you need in your network for the nodes to communicate fast enough?  What is the ideal hardware to use for each node to balance performance with costs?

Monitoring

How do you monitor dozens, hundreds or even thousands of nodes all at once?   How do you get a unified view of your data cloud, and then drill down to the problem nodes?   Are there even any off-the-shelf monitoring tools that can help?  Your old monitoring tool won’t be very useful anymore unless you are willing to look at every node one by one to see what is going on there.

Alerting

How do you set up a common set of alerts across all nodes?  And how do you keep your alert thresholds in sync as you add nodes and remove them?   More importantly, even assuming you have alerting in place,  once staff receives critical alerts, how will they know where to find the troubled node in the massive cloud, or whether it’s a node level  issue or more global in nature?  This must be done quickly during critical outages.

Data Visualization

How does your staff view the data when it is distributed?  In case of data inaccuracy, how can they quickly identify the faulty nodes and fix up the data?

Performance Tuning

As performance degrades, how do you troubleshoot and identify the bottlenecks?  How do you find which nodes by be the cause of the problem?  How do you improve performance across all the nodes.

Data Cloud Management

How do you back up all the data while consistently tracking which nodes were backed up successfully and when?  How do you make schema changes across all the nodes in one consistent step without breaking your app? And how do you make configuration changes on various nodes or across all nodes?  And how do you track the configurations of each node and keep them consistent across your system?

By now you should see that there is a lot to think about before endeavoring to launch a production Data Cloud.  Too many companies focus all their energies on deciding which DB or NoSQL system to use and developing their apps for it.  But that might turn out to be the lesser of your challenges once you struggle to put the system into production.  Be sure you can answer all the questions I have listed above before your launch.

Boris.

Read the original blog entry...

More Stories By AppDynamics Blog

In high-production environments where release cycles are measured in hours or minutes — not days or weeks — there's little room for mistakes and no room for confusion. Everyone has to understand what's happening, in real time, and have the means to do whatever is necessary to keep applications up and running optimally.

DevOps is a high-stakes world, but done well, it delivers the agility and performance to significantly impact business competitiveness.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
There will be new vendors providing applications, middleware, and connected devices to support the thriving IoT ecosystem. This essentially means that electronic device manufacturers will also be in the software business. Many will be new to building embedded software or robust software. This creates an increased importance on software quality, particularly within the Industrial Internet of Things where business-critical applications are becoming dependent on products controlled by software. Qua...
Continuous testing helps bridge the gap between developing quickly and maintaining high quality products. But to implement continuous testing, CTOs must take a strategic approach to building a testing infrastructure and toolset that empowers their team to move fast. Download our guide to laying the groundwork for a scalable continuous testing strategy.
As companies gain momentum, the need to maintain high quality products can outstrip their development team’s bandwidth for QA. Building out a large QA team (whether in-house or outsourced) can slow down development and significantly increases costs. This eBook takes QA profiles from 5 companies who successfully scaled up production without building a large QA team and includes: What to consider when choosing CI/CD tools How culture and communication can make or break implementation
SYS-CON Events has announced today that Roger Strukhoff has been named conference chair of Cloud Expo and @ThingsExpo 2016 Silicon Valley. The 19th Cloud Expo and 6th @ThingsExpo will take place on November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. "The Internet of Things brings trillions of dollars of opportunity to developers and enterprise IT, no matter how you measure it," stated Roger Strukhoff. "More importantly, it leverages the power of devices and the Interne...
"We formed Formation several years ago to really address the need for bring complete modernization and software-defined storage to the more classic private cloud marketplace," stated Mark Lewis, Chairman and CEO of Formation Data Systems, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Machine Learning helps make complex systems more efficient. By applying advanced Machine Learning techniques such as Cognitive Fingerprinting, wind project operators can utilize these tools to learn from collected data, detect regular patterns, and optimize their own operations. In his session at 18th Cloud Expo, Stuart Gillen, Director of Business Development at SparkCognition, discussed how research has demonstrated the value of Machine Learning in delivering next generation analytics to imp...
Most organizations prioritize data security only after their data has already been compromised. Proactive prevention is important, but how can you accomplish that on a small budget? Learn how the cloud, combined with a defense and in-depth approach, creates efficiencies by transferring and assigning risk. Security requires a multi-defense approach, and an in-house team may only be able to cherry pick from the essential components. In his session at 19th Cloud Expo, Vlad Friedman, CEO/Founder o...
Organizations planning enterprise data center consolidation and modernization projects are faced with a challenging, costly reality. Requirements to deploy modern, cloud-native applications simultaneously with traditional client/server applications are almost impossible to achieve with hardware-centric enterprise infrastructure. Compute and network infrastructure are fast moving down a software-defined path, but storage has been a laggard. Until now.
"We host and fully manage cloud data services, whether we store, the data, move the data, or run analytics on the data," stated Kamal Shannak, Senior Development Manager, Cloud Data Services, IBM, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
In addition to all the benefits, IoT is also bringing new kind of customer experience challenges - cars that unlock themselves, thermostats turning houses into saunas and baby video monitors broadcasting over the internet. This list can only increase because while IoT services should be intuitive and simple to use, the delivery ecosystem is a myriad of potential problems as IoT explodes complexity. So finding a performance issue is like finding the proverbial needle in the haystack.
With over 720 million Internet users and 40–50% CAGR, the Chinese Cloud Computing market has been booming. When talking about cloud computing, what are the Chinese users of cloud thinking about? What is the most powerful force that can push them to make the buying decision? How to tap into them? In his session at 18th Cloud Expo, Yu Hao, CEO and co-founder of SpeedyCloud, answered these questions and discussed the results of SpeedyCloud’s survey.
With the proliferation of both SQL and NoSQL databases, organizations can now target specific fit-for-purpose database tools for their different application needs regarding scalability, ease of use, ACID support, etc. Platform as a Service offerings make this even easier now, enabling developers to roll out their own database infrastructure in minutes with minimal management overhead. However, this same amount of flexibility also comes with the challenges of picking the right tool, on the right ...
The Internet of Things will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, demonstrated how to move beyond today's coding paradigm and shared the must-have mindsets for removing complexity from the develo...
SYS-CON Events announced today that MangoApps will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. MangoApps provides modern company intranets and team collaboration software, allowing workers to stay connected and productive from anywhere in the world and from any device.
Redis is not only the fastest database, but it is the most popular among the new wave of databases running in containers. Redis speeds up just about every data interaction between your users or operational systems. In his session at 19th Cloud Expo, Dave Nielsen, Developer Advocate, Redis Labs, will share the functions and data structures used to solve everyday use cases that are driving Redis' popularity.
“We're a global managed hosting provider. Our core customer set is a U.S.-based customer that is looking to go global,” explained Adam Rogers, Managing Director at ANEXIA, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Isomorphic Software will exhibit at DevOps Summit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Isomorphic Software provides the SmartClient HTML5/AJAX platform, the most advanced technology for building rich, cutting-edge enterprise web applications for desktop and mobile. SmartClient combines the productivity and performance of traditional desktop software with the simp...
"When you think about the data center today, there's constant evolution, The evolution of the data center and the needs of the consumer of technology change, and they change constantly," stated Matt Kalmenson, VP of Sales, Service and Cloud Providers at Veeam Software, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.
Early adopters of IoT viewed it mainly as a different term for machine-to-machine connectivity or M2M. This is understandable since a prerequisite for any IoT solution is the ability to collect and aggregate device data, which is most often presented in a dashboard. The problem is that viewing data in a dashboard requires a human to interpret the results and take manual action, which doesn’t scale to the needs of IoT.