|By Stackify Blog||
|November 23, 2014 11:45 PM EST||
Four Ways Cloud Has Influenced Application Troubleshooting
The rise of cloud computing has ushered in an era of unprecedented productivity for developers over the past several years. For those who have embraced this new world order, gone are the days of long lead times for hardware procurement and installation, architecture defined by slow-moving hardware upgrades, hardware-constrained scalability and flexibility, and a world where only sys admins have access to the infrastructure. But, as the barriers between development and delivery disappear, new challenges have emerged that can disrupt the lives of developers and slow down delivery of new products and features, giving back some of the efficiency gains that the Software-Defined Data Center (SDDC) created.
Whether you're new to the cloud or you've been around since before cloud was cool, you are likely to see four common challenges emerge that can make troubleshooting your applications in the cloud more difficult. Let's take a closer look at these common pain points first to help build awareness around the challenges, and then I'll offer some suggestions for how to prevent these hurdles from tripping you and your team up when it comes time to unravel an application troubleshooting mystery.
If you're adopting the cloud with limited support from an operations team, or perhaps you're one of the growing numbers of DevOps or even no-ops teams who find themselves bridging both the operations and development worlds, you will find that the responsibility of operating and supporting both your app and your infrastructure, at some level at least, will introduce a new dynamic that you may not have contemplated.
True, the ability to roll your own architecture without the burden of dealing with physical devices is liberating and far more efficient. But, as developer tools, deployment tools, and cloud operations tools become inextricably linked to one another, the old boundaries between who is dev and who is ops become blurred or even get removed altogether. This means the dev team is suddenly an integral part of operations, whether by design or by default, adding yet another responsibility for developers whose chief mandate is often to go faster. The more time you spend in the operations realm, especially in troubleshooting your app or the cloud resources it depends on, the less time you are able to devote to adding new value through code.
Lack of Transparency and Burden of Proof
While it's true that having the full benefits of the cloud available at the press of a button is awesome, you wouldn't be faulted for having a bit of nostalgia about the "good old days" of being able to have a conversation with a real live person down the hall about real physical hardware that's either healthy or isn't (along with the ability to actually lay hands on it). An old familiar refrain when something went wrong with an app in production was for the burden of proof to rest initially with the ops team - prove the hardware is working, the network is healthy, and the SAN hasn't lost disks before making the dev team dig in. Honestly, everyone was just hoping against hope that it was something "easy" in the infrastructure, because when it was the app, that's when things got hard. Well, that script has been reversed with the cloud: now the burden of proof is on the dev team, because what's really hard is finding a problem that originates with someone else's complex, abstracted, virtualized data center.
App returning a 500 error or performing poorly? If you're using something delivered as-a-Service, such as database, queues, cache and the like, you won't really have any visibility into health other than the cloud provider's status page and whatever you can directly observe. It's either working correctly and is speedy, or it isn't; if it isn't, life gets a lot murkier. Likewise, servers can be monitored, but you can't really tell why your virtual resource's performance has trailed off if you are the victim of something environmental that's out of your control.
No matter how good the support team is at your favorite cloud provider, it's rare that they will be as responsive to your requests for more information on an issue as your own in-house ops team could be, and they won't be as well versed on your architecture. To varying degrees, you're at the mercy of the cloud provider for consistent, reliable services, and it's also up to them to offer timely insight when issues arise with the services you depend on. Your mileage may vary, of course, as to whether your cloud provider offers this level of communication and transparency. But, if they don't, then the burden of proof rests squarely with you to show that the issue isn't in your app. Quite a reversal of fortunes, isn't it?
Compounding the challenge of sorting through infrastructure issues vs. code issues is the simple fact that applications are becoming far more complex and, in many cases, portions of the overall architecture may be transient in nature. Combine complexity with impermanence and you have a recipe for some real Sherlock Holmes-caliber mysteries at times.
The incredible thing about an SDDC is that you can create nearly any kind of architecture required to support your application stack's needs, all relatively easily - if you can dream it, you can build it. Want to cobble together .NET, Java, PHP, Node.js, Ruby, Database-as-a-Service for SQL and NoSQL, Message-Queues-as-a-Service, and Search-as-a-Service? From a cloud deployment perspective, it's been made devilishly easy to deploy and get started. But with that ultra-polyglot approach and a heavy reliance on software-defined services comes a new set of challenges:
- First, you have a variety of services that are black boxes to you. Each of these services comes with its own set of tricks for gaining insight into performance and availability, but each one may be different in how you monitor and troubleshoot.
- Learning how to support a variety of different technologies creates drag on your delivery velocity. It's hard enough learning the performance and reliability tricks for a few technologies; trying it for a wide variety can draw focus from the real goals of building new value through software and making the business more successful.
- Not every monitoring tool can support every technology stack, and the wider you cast the technology net, the harder it can become to support your full stack from a single monitoring tool.
- If you're using dynamic (transient) resources, such as scale-on-demand servers, you are quite likely to lose critical data that you need when troubleshooting a problem if you haven't given thought to how you preserve critical insights that disappear with the server when it's de-provisioned.
More Frequent Change
Finally, we come to the double-edged sword that brought this all about in the first place: going faster! The increased agility that the cloud brings, especially when coupled with dev tools that are integrated into the delivery cycle (think PaaS environments), has a way of shortening delivery cycle times and increasing the number of releases crammed into a given week, month, and year. This is especially true in organizations that have also adopted agile development practices. Code can flow to production smoothly with greater frequency, and architecture changes can be made far more swiftly and easily. Unfortunately, with more frequent code releases and architecture changes comes more frequent opportunities to break something.
A big part of the movement toward Agile and Lean is also the notion of always moving forward - rather than rolling back a release in the event of an issue, detect problems early and patch them quickly. To enable this mandate, however, requires two things that are often missing if you are coming from a slower moving environment or from a more traditional hosting model:
- Developer visibility into a baseline of behavior telemetry to know what "good" looks like historically
- Instant feedback on the health of the application post-release relative to that healthy baseline
Without this, it's hard to know if you've made gains or losses with your release - your users are often your only real barometer.
So... How Do I Code More and Support Less?
There's no denying that the cloud has impacted the life of many developers, mostly in a very positive way. Of course, with new technologies and capabilities always comes a new set of challenges to overcome. In the case of cloud-hosted applications, this includes challenges to effectively and efficiently support those applications in their new environments so that the gains in productivity aren't given back in support of the application.
What can development teams to do adapt to and overcome these challenges?
There are three basic steps that every development team should take to make supporting cloud-based applications easier.
1. Establish Access, Process, and Protocol: The first order of business for helping developers support their cloud-based apps more effectively is giving them safe access to the information and resources they need. Unfortunately, all too often in cloud environments this is an all-or-nothing proposition - full login rights to servers and even potentially full rights to the management portal, or no access at all. Make sure to establish the correct access methods to your developers so that they have the visibility and access they need, without handing over so much control that it increases the likelihood of accidents.
2. Design Supportability Into the Application: Once your application is in production, there are several common questions that you will need to be able to answer at a moment's notice about your application: Is it (and everything it depends on) running? Are users satisfied with the performance? Is anything silently failing and frustrating users without setting off alarms? If something failed, who was impacted, and what caused the issue?
There are also some things that simply cannot be measured and monitored from outside the application, but which speak directly to the health and well being of your application. To enable you to quickly answer the inevitable questions, consider incorporating the following:
- If it moves, measure it. Report application metrics and KPIs from within your code in order to see events and data that would otherwise be locked away from you. Some events and metrics only you, the developer, have the power to expose. Knowing how your app behaves at a core level can provide levels of insight that prove invaluable when searching for troubleshooting clues. If you can configure monitoring and alerts for those metrics, even better. We've elaborate on this subject in this article Errors & Logs: putting the data to work
- Log often, and log meaningfully. If you only report errors, you will lack the critical insights necessary to help point to the root cause of the error. By logging at, say, info or debug instead of just warn or error, you will have the breadcrumb trail you need to find it. It's impossible to get the state of the system after the fact - you need to have logged it at the time of the event.
- Centralize your insights. Remembering that life in the cloud can be both quite distributed, and quite transient, it's always good to bring everything - logs, errors, custom metrics, and other telemetry - into a central location for normalization, correlation, and continuity. You may need the data, and what it tells you, well beyond the ephemeral life span of your cloud resource.
3. Identify Health Baselines Early: Key information like message queue length, average request time, app pool resource utilization, custom metrics values, log and error rates, and more can all be charted for your application these days - monitoring and charting isn't just the domain of ops tools any longer. Understand what your app looks like both when healthy and unhealthy, preferably starting with pre-production environments even, so that you can see how your application morphs from release to release as well as with different loads and as your architecture evolves. By baselining as far back as dev and QA, you can often catch problems well before they impact customers and send you and your team scrambling.
There's no denying the cloud brings incredible capabilities to the lives of developers: speed, agility, flexibility, scalability, and more. As with any new, disruptive technology, new challenges are also par for the course. By applying some basic strategies for application management, monitoring and troubleshooting, you can have all of the advantages of the cloud without giving back the gains during those critical support engagements, and have happier team members and end users as well.
At Stackify we believe we offer a solution to the issues presented in this article learn more at www.stackify.com
SYS-CON Events announced today that Auditwerx will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Auditwerx specializes in SOC 1, SOC 2, and SOC 3 attestation services throughout the U.S. and Canada. As a division of Carr, Riggs & Ingram (CRI), one of the top 20 largest CPA firms nationally, you can expect the resources, skills, and experience of a much larger firm combined with the accessibility and atten...
Mar. 28, 2017 01:31 PM EDT
SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in S...
Mar. 28, 2017 01:15 PM EDT Reads: 1,573
What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? Sounds magical, and it is! In his session at 20th Cloud Expo, Chris Munns, Senior Developer Advocate for Serverless Applications at Amazon Web Services, will show how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverle...
Mar. 28, 2017 01:15 PM EDT Reads: 2,164
SYS-CON Events announced today that HTBase will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. HTBase (Gartner 2016 Cool Vendor) delivers a Composable IT infrastructure solution architected for agility and increased efficiency. It turns compute, storage, and fabric into fluid pools of resources that are easily composed and re-composed to meet each application’s needs. With HTBase, companies can quickly prov...
Mar. 28, 2017 12:45 PM EDT Reads: 3,099
SYS-CON Events announced today that Cloud Academy will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud computing technologies. Ge...
Mar. 28, 2017 11:30 AM EDT Reads: 4,645
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on ...
Mar. 28, 2017 11:30 AM EDT Reads: 2,454
MongoDB Atlas leverages VPC peering for AWS, a service that allows multiple VPC networks to interact. This includes VPCs that belong to other AWS account holders. By performing cross account VPC peering, users ensure networks that host and communicate their data are secure. In his session at 20th Cloud Expo, Jay Gordon, a Developer Advocate at MongoDB, will explain how to properly architect your VPC using existing AWS tools and then peer with your MongoDB Atlas cluster. He'll discuss the secur...
Mar. 28, 2017 11:22 AM EDT Reads: 229
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buyers...
Mar. 28, 2017 11:00 AM EDT Reads: 3,674
[session] Composable Infrastructure and Multi-Cloud By @HTBase | @CloudExpo #API #Cloud #Storage #DataCenter
Imagine having the ability to leverage all of your current technology and to be able to compose it into one resource pool. Now imagine, as your business grows, not having to deploy a complete new appliance to scale your infrastructure. Also imagine a true multi-cloud capability that allows live migration without any modification between cloud environments regardless of whether that cloud is your private cloud or your public AWS, Azure or Google instance. Now think of a world that is not locked i...
Mar. 28, 2017 10:54 AM EDT Reads: 233
SYS-CON Events announced today that Infranics will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Since 2000, Infranics has developed SysMaster Suite, which is required for the stable and efficient management of ICT infrastructure. The ICT management solution developed and provided by Infranics continues to add intelligence to the ICT infrastructure through the IMC (Infra Management Cycle) based on mathemat...
Mar. 28, 2017 10:45 AM EDT Reads: 3,274
[session] Offshore Development: How Not to Screw It Up | @CloudExpo @MobiDev_ #Cloud #DigitalTransformation
In his session at Cloud Expo, Alan Winters, an entertainment executive/TV producer turned serial entrepreneur, will present a success story of an entrepreneur who has both suffered through and benefited from offshore development across multiple businesses: The smart choice, or how to select the right offshore development partner Warning signs, or how to minimize chances of making the wrong choice Collaboration, or how to establish the most effective work processes Budget control, or how to m...
Mar. 28, 2017 10:21 AM EDT Reads: 227
SYS-CON Events announced today that Juniper Networks (NYSE: JNPR), an industry leader in automated, scalable and secure networks, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Juniper Networks challenges the status quo with products, solutions and services that transform the economics of networking. The company co-innovates with customers and partners to deliver automated, scalable and secure network...
Mar. 28, 2017 10:15 AM EDT Reads: 1,541
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 add...
Mar. 28, 2017 10:00 AM EDT Reads: 1,521
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a ...
Mar. 28, 2017 09:45 AM EDT Reads: 2,226
SYS-CON Events announced today that SD Times | BZ Media has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. BZ Media LLC is a high-tech media company that produces technical conferences and expositions, and publishes a magazine, newsletters and websites in the software development, SharePoint, mobile development and commercial UAV markets.
Mar. 28, 2017 09:45 AM EDT Reads: 4,421
Historically, some banking activities such as trading have been relying heavily on analytics and cutting edge algorithmic tools. The coming of age of powerful data analytics solutions combined with the development of intelligent algorithms have created new opportunities for financial institutions. In his session at 20th Cloud Expo, Sebastien Meunier, Head of Digital for North America at Chappuis Halder & Co., will discuss how these tools can be leveraged to develop a lasting competitive advanta...
Mar. 28, 2017 09:30 AM EDT Reads: 2,846
Building custom add-ons does not need to be limited to the ideas you see on a marketplace. In his session at 20th Cloud Expo, Sukhbir Dhillon, CEO and founder of Addteq, will go over some adventures they faced in developing integrations using Atlassian SDK and other technologies/platforms and how it has enabled development teams to experiment with newer paradigms like Serverless and newer features of Atlassian SDKs. In this presentation, you will be taken on a journey of Add-On and Integration ...
Mar. 28, 2017 09:30 AM EDT Reads: 3,249
Now that the world has connected “things,” we need to build these devices as truly intelligent in order to create instantaneous and precise results. This means you have to do as much of the processing at the point of entry as you can: at the edge. The killer use cases for IoT are becoming manifest through AI engines on edge devices. An autonomous car has this dual edge/cloud analytics model, producing precise, real-time results. In his session at @ThingsExpo, John Crupi, Vice President and Eng...
Mar. 28, 2017 09:15 AM EDT Reads: 4,065
There are 66 million network cameras capturing terabytes of data. How did factories in Japan improve physical security at the facilities and improve employee productivity? Edge Computing reduces possible kilobytes of data collected per second to only a few kilobytes of data transmitted to the public cloud every day. Data is aggregated and analyzed close to sensors so only intelligent results need to be transmitted to the cloud. Non-essential data is recycled to optimize storage.
Mar. 28, 2017 08:15 AM EDT Reads: 3,177
"I think that everyone recognizes that for IoT to really realize its full potential and value that it is about creating ecosystems and marketplaces and that no single vendor is able to support what is required," explained Esmeralda Swartz, VP, Marketing Enterprise and Cloud at Ericsson, in this SYS-CON.tv interview at @ThingsExpo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Mar. 28, 2017 08:00 AM EDT Reads: 4,437