|By Mario Meir-Huber||
|May 20, 2012 02:00 PM EDT||
Infrastructure as a Service and Platform as a Service offer us easy scaling of services. However, scaling is not as easy as it seems to be in the Cloud. If your software architecture isn't done right, your services and applications might not scale as expected, even if you add new instances. As for most distributed systems, there are a couple of guidelines you should consider. I have summed up the ones I use most often for designing distributed systems.
Design for Failure
As Moore stated, everything that can fail, will fail. So it is very clear that a distributed system will fail at a certain time, even though cloud computing providers tell us that it is very unlikely. We had some outages  in the last year of some of the major platforms, and there might be even more of them. Therefore, your application should be able to deal with an outage of your cloud provider. This can be done with different techniques such as distributing an application in more than one availability zone (which should be done anyway). Netflix has a very interesting approach to steadily test their software for errors - they have employed an army of "Chaos Monkeys" . Of course, they are not real monkeys. It is software that randomly takes down different Instances. Netflix produces errors on purpose to see how their system reacts and if it is still performing well. The question is not if there will be another outage; the question is when the next outage will be.
Design for at Least Three Running Systems
For on-premise systems, we always used to do an "N+1" Design. This still applies in the cloud. There should always be one more system available than actually necessary. In the cloud, this can easily be achieved by running your instances in different geographical locations and availability zones. In case one region fails, the other region will take over. Some platforms offer intelligent routing and can easily forward traffic to another zone if one zone is down. However, there is this "rule of three," that basically says you should have three systems available: one for me, one for the customer and one if there is a failure. This will minimize the risk of an outage significantly for you.
Design for Monitoring
We all need to know what is going on in our datacenters and on our systems. Therefore, monitoring is an important aspect for every application you build. If you want to design intelligent monitoring, I/O performance or other metrics are not the only important things. It would be best if your system could "predict" your future load - this could either be done by statistical data you have from your applications‘ history or from your applications‘ domain. If your application is on sports betting, you might have high load on during major sports events. If it is for social games, your load might be higher during the day or when the weather is bad outside. However, your system should be monitored all the time and it should tell you in case a major failure might come up.
Design for Rollback
Large systems are typically owned by different teams in your company. This means that a lot of people work on your systems and rollout happens often. Even though there should be a lot of testing involved, it will still happen that new features will affect other services of your application. To prevent from that, our application should allow an easy rollback mechanism.
Design No State
State kills. If you store states on your systems, this will make load balancing much more complicated for you. State should be eliminated wherever and whenever possible. There are several techniques to reduce or remove state in your application. Modern devices such as tablets or smartphones have sufficient performance to store state information on the client. Every service call should be independent and it shouldn‘t be necessary to have a session state on the server. All session state should be transferred to the client, as described by Roy Fielding . Architectural styles such as ROA support this idea and help you make your services stateless. I will dig into ReST and ROA in one of my upcoming articles since this is really great for distributed systems.
Design to Disable Services
It should be easy to disable services that are not performing well or influencing your system in a way that is poisoning the entire application. Therefore, it will be important to isolate each service from each other, since it should not affect the entire system's functionality. Imagine the comment function of Amazon is not working - this might be essential to make up your mind about buying a book, but it wouldn't prevent you from buying the book.
Design Different Roles
However, with distributed systems we have a lot of servers involved - and it‘s necessary not to scale a front-end or back-end server, but to scale individual services. If there is exactly one front-end system that hosts all roles and a specific service experiences high load, why would it be necessary to scale up all services, even those services that have minor load? You might improve your systems if you have them split up in different roles. As already described by Bertram Meyer  with Command Query Separation, your application should also be split in different roles. This is basically a key thing for SOA applications; however, I still see that most services are not separated. There should be more separation of concerns based on the services. Implement some kind of application role separation for your application and services to improve scaling.
There might be additional principles for distributed systems. I see this article as a rather "living" one and will extend it over the time. I would be interested is your feedback on this. What are your thoughts on distributed systems? Email me at [email protected], use the comment section here or get in touch with me via Twitter at @mario_mh
SaaS companies can greatly expand revenue potential by pushing beyond their own borders. The challenge is how to do this without degrading service quality. In his session at 18th Cloud Expo, Adam Rogers, Managing Director at Anexia, discussed how IaaS providers with a global presence and both virtual and dedicated infrastructure can help companies expand their service footprint with low “go-to-market” costs.
Jun. 25, 2016 12:00 AM EDT Reads: 630
The initial debate is over: Any enterprise with a serious commitment to IT is migrating to the cloud. But things are not so simple. There is a complex mix of on-premises, colocated, and public-cloud deployments. In this power panel at 18th Cloud Expo, moderated by Conference Chair Roger Strukhoff, Randy De Meno, Chief Technologist - Windows Products and Microsoft Partnerships at Commvault; Dave Landa, Chief Operating Officer at kintone; William Morrish, General Manager Product Sales at Interou...
Jun. 24, 2016 10:00 PM EDT Reads: 625
You are moving to the Cloud. The question is not if, it’s when. Now that your competitors are in the cloud and lapping you, your “when” better hurry up and get here. But saying and doing are two different things. In his session at @DevOpsSummit at 18th Cloud Expo, Robert Reeves, CTO of Datical, explained how DevOps can be your onramp to the cloud. By adopting simple, platform independent DevOps strategies, you can accelerate your move to the cloud. Spoiler Alert: He also makes sure you don’t...
Jun. 24, 2016 08:00 PM EDT Reads: 789
What does it look like when you have access to cloud infrastructure and platform under the same roof? Let’s talk about the different layers of Technology as a Service: who cares, what runs where, and how does it all fit together. In his session at 18th Cloud Expo, Phil Jackson, Lead Technology Evangelist at SoftLayer, an IBM company, spoke about the picture being painted by IBM Cloud and how the tools being crafted can help fill the gaps in your IT infrastructure.
Jun. 24, 2016 04:30 PM EDT Reads: 450
Machine Learning helps make complex systems more efficient. By applying advanced Machine Learning techniques such as Cognitive Fingerprinting, wind project operators can utilize these tools to learn from collected data, detect regular patterns, and optimize their own operations. In his session at 18th Cloud Expo, Stuart Gillen, Director of Business Development at SparkCognition, discussed how research has demonstrated the value of Machine Learning in delivering next generation analytics to imp...
Jun. 24, 2016 02:15 PM EDT Reads: 341
More and more companies are looking to microservices as an architectural pattern for breaking apart applications into more manageable pieces so that agile teams can deliver new features quicker and more effectively. What this pattern has done more than anything to date is spark organizational transformations, setting the foundation for future application development. In practice, however, there are a number of considerations to make that go beyond simply “build, ship, and run,” which changes ho...
Jun. 24, 2016 01:30 PM EDT Reads: 750
SYS-CON Events announced today that ReadyTalk, a leading provider of online conferencing and webinar services, has been named Vendor Presentation Sponsor at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. ReadyTalk delivers audio and web conferencing services that inspire collaboration and enable the Future of Work for today’s increasingly digital and mobile workforce. By combining intuitive, innovative tec...
Jun. 24, 2016 01:00 PM EDT Reads: 1,291
Amazon has gradually rolled out parts of its IoT offerings, but these are just the tip of the iceberg. In addition to optimizing their backend AWS offerings, Amazon is laying the ground work to be a major force in IoT - especially in the connected home and office. In his session at @ThingsExpo, Chris Kocher, founder and managing director of Grey Heron, explained how Amazon is extending its reach to become a major force in IoT by building on its dominant cloud IoT platform, its Dash Button strat...
Jun. 24, 2016 12:00 PM EDT Reads: 1,533
Connected devices and the industrial internet are growing exponentially every year with Cisco expecting 50 billion devices to be in operation by 2020. In this period of growth, location-based insights are becoming invaluable to many businesses as they adopt new connected technologies. Knowing when and where these devices connect from is critical for a number of scenarios in supply chain management, disaster management, emergency response, M2M, location marketing and more. In his session at @Th...
Jun. 24, 2016 12:00 PM EDT Reads: 709
Your business relies on your applications and your employees to stay in business. Whether you develop apps or manage business critical apps that help fuel your business, what happens when users experience sluggish performance? You and all technical teams across the organization – application, network, operations, among others, as well as, those outside the organization, like ISPs and third-party providers – are called in to solve the problem.
Jun. 24, 2016 12:00 PM EDT Reads: 455
Digital Initiatives create new ways of conducting business, which drive the need for increasingly advanced security and regulatory compliance challenges with exponentially more damaging consequences. In the BMC and Forbes Insights Survey in 2016, 97% of executives said they expect a rise in data breach attempts in the next 12 months. Sixty percent said operations and security teams have only a general understanding of each other’s requirements, resulting in a “SecOps gap” leaving organizations u...
Jun. 24, 2016 11:45 AM EDT Reads: 890
The cloud market growth today is largely in public clouds. While there is a lot of spend in IT departments in virtualization, these aren’t yet translating into a true “cloud” experience within the enterprise. What is stopping the growth of the “private cloud” market? In his general session at 18th Cloud Expo, Nara Rajagopalan, CEO of Accelerite, explored the challenges in deploying, managing, and getting adoption for a private cloud within an enterprise. What are the key differences between wh...
Jun. 24, 2016 11:15 AM EDT Reads: 576
It is one thing to build single industrial IoT applications, but what will it take to build the Smart Cities and truly society changing applications of the future? The technology won’t be the problem, it will be the number of parties that need to work together and be aligned in their motivation to succeed. In his Day 2 Keynote at @ThingsExpo, Henrik Kenani Dahlgren, Portfolio Marketing Manager at Ericsson, discussed how to plan to cooperate, partner, and form lasting all-star teams to change t...
Jun. 24, 2016 11:00 AM EDT Reads: 947
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life sett...
Jun. 24, 2016 10:30 AM EDT Reads: 866
It's easy to assume that your app will run on a fast and reliable network. The reality for your app's users, though, is often a slow, unreliable network with spotty coverage. What happens when the network doesn't work, or when the device is in airplane mode? You get unhappy, frustrated users. An offline-first app is an app that works, without error, when there is no network connection. In his session at 18th Cloud Expo, Bradley Holt, a Developer Advocate with IBM Cloud Data Services, discussed...
Jun. 24, 2016 09:45 AM EDT Reads: 673
19th Cloud Expo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterpri...
Jun. 24, 2016 09:45 AM EDT Reads: 1,152
There are several IoTs: the Industrial Internet, Consumer Wearables, Wearables and Healthcare, Supply Chains, and the movement toward Smart Grids, Cities, Regions, and Nations. There are competing communications standards every step of the way, a bewildering array of sensors and devices, and an entire world of competing data analytics platforms. To some this appears to be chaos. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, Bradley Holt, Developer Advocate a...
Jun. 24, 2016 09:30 AM EDT Reads: 574
SYS-CON Events announced today that Bsquare has been named “Silver Sponsor” of SYS-CON's @ThingsExpo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. For more than two decades, Bsquare has helped its customers extract business value from a broad array of physical assets by making them intelligent, connecting them, and using the data they generate to optimize business processes.
Jun. 24, 2016 09:30 AM EDT Reads: 1,098
The pace of innovation, vendor lock-in, production sustainability, cost-effectiveness, and managing risk… In his session at 18th Cloud Expo, Dan Choquette, Founder of RackN, discussed how CIOs are challenged finding the balance of finding the right tools, technology and operational model that serves the business the best. He also discussed how clouds, open source software and infrastructure solutions have benefits but also drawbacks and how workload and operational portability between vendors ...
Jun. 24, 2016 09:15 AM EDT Reads: 623
There is little doubt that Big Data solutions will have an increasing role in the Enterprise IT mainstream over time. Big Data at Cloud Expo - to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA - has announced its Call for Papers is open. Cloud computing is being adopted in one form or another by 94% of enterprises today. Tens of billions of new devices are being connected to The Internet of Things. And Big Data is driving this bus. An exponential increase is...
Jun. 24, 2016 08:45 AM EDT Reads: 1,200