|By Mario Meir-Huber||
|May 20, 2012 02:00 PM EDT||
Infrastructure as a Service and Platform as a Service offer us easy scaling of services. However, scaling is not as easy as it seems to be in the Cloud. If your software architecture isn't done right, your services and applications might not scale as expected, even if you add new instances. As for most distributed systems, there are a couple of guidelines you should consider. I have summed up the ones I use most often for designing distributed systems.
Design for Failure
As Moore stated, everything that can fail, will fail. So it is very clear that a distributed system will fail at a certain time, even though cloud computing providers tell us that it is very unlikely. We had some outages  in the last year of some of the major platforms, and there might be even more of them. Therefore, your application should be able to deal with an outage of your cloud provider. This can be done with different techniques such as distributing an application in more than one availability zone (which should be done anyway). Netflix has a very interesting approach to steadily test their software for errors - they have employed an army of "Chaos Monkeys" . Of course, they are not real monkeys. It is software that randomly takes down different Instances. Netflix produces errors on purpose to see how their system reacts and if it is still performing well. The question is not if there will be another outage; the question is when the next outage will be.
Design for at Least Three Running Systems
For on-premise systems, we always used to do an "N+1" Design. This still applies in the cloud. There should always be one more system available than actually necessary. In the cloud, this can easily be achieved by running your instances in different geographical locations and availability zones. In case one region fails, the other region will take over. Some platforms offer intelligent routing and can easily forward traffic to another zone if one zone is down. However, there is this "rule of three," that basically says you should have three systems available: one for me, one for the customer and one if there is a failure. This will minimize the risk of an outage significantly for you.
Design for Monitoring
We all need to know what is going on in our datacenters and on our systems. Therefore, monitoring is an important aspect for every application you build. If you want to design intelligent monitoring, I/O performance or other metrics are not the only important things. It would be best if your system could "predict" your future load - this could either be done by statistical data you have from your applications‘ history or from your applications‘ domain. If your application is on sports betting, you might have high load on during major sports events. If it is for social games, your load might be higher during the day or when the weather is bad outside. However, your system should be monitored all the time and it should tell you in case a major failure might come up.
Design for Rollback
Large systems are typically owned by different teams in your company. This means that a lot of people work on your systems and rollout happens often. Even though there should be a lot of testing involved, it will still happen that new features will affect other services of your application. To prevent from that, our application should allow an easy rollback mechanism.
Design No State
State kills. If you store states on your systems, this will make load balancing much more complicated for you. State should be eliminated wherever and whenever possible. There are several techniques to reduce or remove state in your application. Modern devices such as tablets or smartphones have sufficient performance to store state information on the client. Every service call should be independent and it shouldn‘t be necessary to have a session state on the server. All session state should be transferred to the client, as described by Roy Fielding . Architectural styles such as ROA support this idea and help you make your services stateless. I will dig into ReST and ROA in one of my upcoming articles since this is really great for distributed systems.
Design to Disable Services
It should be easy to disable services that are not performing well or influencing your system in a way that is poisoning the entire application. Therefore, it will be important to isolate each service from each other, since it should not affect the entire system's functionality. Imagine the comment function of Amazon is not working - this might be essential to make up your mind about buying a book, but it wouldn't prevent you from buying the book.
Design Different Roles
However, with distributed systems we have a lot of servers involved - and it‘s necessary not to scale a front-end or back-end server, but to scale individual services. If there is exactly one front-end system that hosts all roles and a specific service experiences high load, why would it be necessary to scale up all services, even those services that have minor load? You might improve your systems if you have them split up in different roles. As already described by Bertram Meyer  with Command Query Separation, your application should also be split in different roles. This is basically a key thing for SOA applications; however, I still see that most services are not separated. There should be more separation of concerns based on the services. Implement some kind of application role separation for your application and services to improve scaling.
There might be additional principles for distributed systems. I see this article as a rather "living" one and will extend it over the time. I would be interested is your feedback on this. What are your thoughts on distributed systems? Email me at [email protected], use the comment section here or get in touch with me via Twitter at @mario_mh
There is growing need for data-driven applications and the need for digital platforms to build these apps. In his session at 19th Cloud Expo, Muddu Sudhakar, VP and GM of Security & IoT at Splunk, will cover different PaaS solutions and Big Data platforms that are available to build applications. In addition, AI and machine learning are creating new requirements that developers need in the building of next-gen apps. The next-generation digital platforms have some of the past platform needs a...
Aug. 27, 2016 04:00 PM EDT Reads: 536
Announcing @TelecomReseller Named “Media Sponsor” of @CloudExpo Silicon Valley | #IoT #Cloud #BigData
SYS-CON Events announced today Telecom Reseller has been named “Media Sponsor” of SYS-CON's 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Aug. 27, 2016 03:15 PM EDT Reads: 720
DevOps at Cloud Expo – being held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Am...
Aug. 27, 2016 02:45 PM EDT Reads: 3,439
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - comp...
Aug. 27, 2016 12:30 PM EDT Reads: 3,610
StarNet Adds Secure Remote Linux and Unix Desktops to X-Win32 | @CloudExpo @XWin32 #Cloud #Linux #Security
StarNet Communications Corp has announced the addition of three Secure Remote Desktop modules to its flagship X-Win32 PC X server. The new modules enable X-Win32 to safely tunnel the remote desktops from Linux and Unix servers to the user’s PC over encrypted SSH. Traditionally, users of PC X servers deploy the XDMCP protocol to display remote desktop environments such as the Gnome and KDE desktops on Linux servers and the CDE environment on Solaris Unix machines. XDMCP is used primarily on comp...
Aug. 27, 2016 12:00 PM EDT Reads: 636
Traditional on-premises data centers have long been the domain of modern data platforms like Apache Hadoop, meaning companies who build their business on public cloud were challenged to run Big Data processing and analytics at scale. But recent advancements in Hadoop performance, security, and most importantly cloud-native integrations, are giving organizations the ability to truly gain value from all their data. In his session at 19th Cloud Expo, David Tishgart, Director of Product Marketing ...
Aug. 27, 2016 12:00 PM EDT Reads: 486
SYS-CON Events announced today that eCube Systems, a leading provider of middleware modernization, integration, and management solutions, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. eCube Systems offers a family of middleware evolution products and services that maximize return on technology investment by leveraging existing technical equity to meet evolving business needs. ...
Aug. 27, 2016 12:00 PM EDT Reads: 629
The 19th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportuni...
Aug. 27, 2016 11:00 AM EDT Reads: 3,966
DevOps at Cloud Expo, taking place Nov 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long dev...
Aug. 27, 2016 11:00 AM EDT Reads: 2,352
Aspose.Total for .NET is the most complete package of all file format APIs for .NET as offered by Aspose. It empowers developers to create, edit, render, print and convert between a wide range of popular document formats within any .NET, C#, ASP.NET and VB.NET applications. Aspose compiles all .NET APIs on a daily basis to ensure that it contains the most up to date versions of each of Aspose .NET APIs. If a new .NET API or a new version of existing APIs is released during the subscription peri...
Aug. 27, 2016 10:30 AM EDT Reads: 1,961
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, will discuss the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
Aug. 27, 2016 10:15 AM EDT Reads: 1,894
SYS-CON Events announced today that StarNet Communications will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. StarNet Communications’ FastX is the industry first cloud-based remote X Windows emulator. Using standard Web browsers (FireFox, Chrome, Safari, etc.) users from around the world gain highly secure access to applications and data hosted on Linux-based servers in a central data center. ...
Aug. 27, 2016 08:45 AM EDT Reads: 769
SYS-CON Events announced today that Hitrons Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Hitrons Solutions Inc. is distributor in the North American market for unique products and services of small and medium-size businesses, including cloud services and solutions, SEO marketing platforms, and mobile applications.
Aug. 27, 2016 08:00 AM EDT Reads: 614
Pulzze Systems was happy to participate in such a premier event and thankful to be receiving the winning investment and global network support from G-Startup Worldwide. It is an exciting time for Pulzze to showcase the effectiveness of innovative technologies and enable them to make the world smarter and better. The reputable contest is held to identify promising startups around the globe that are assured to change the world through their innovative products and disruptive technologies. There w...
Aug. 27, 2016 07:45 AM EDT Reads: 676
[session] Architecting for the Cloud By @RagsS | @CloudExpo @IBMBluemix #Cloud #Docker #Microservices
As the world moves toward more DevOps and Microservices, application deployment to the cloud ought to become a lot simpler. The Microservices architecture, which is the basis of many new age distributed systems such as OpenStack, NetFlix and so on, is at the heart of Cloud Foundry - a complete developer-oriented Platform as a Service (PaaS) that is IaaS agnostic and supports vCloud, OpenStack and AWS. Serverless computing is revolutionizing computing. In his session at 19th Cloud Expo, Raghav...
Aug. 27, 2016 07:45 AM EDT Reads: 796
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Aug. 27, 2016 03:15 AM EDT Reads: 1,787
Personalization has long been the holy grail of marketing. Simply stated, communicate the most relevant offer to the right person and you will increase sales. To achieve this, you must understand the individual. Consequently, digital marketers developed many ways to gather and leverage customer information to deliver targeted experiences. In his session at @ThingsExpo, Lou Casal, Founder and Principal Consultant at Practicala, discussed how the Internet of Things (IoT) has accelerated our abil...
Aug. 27, 2016 02:30 AM EDT Reads: 2,008
With so much going on in this space you could be forgiven for thinking you were always working with yesterday’s technologies. So much change, so quickly. What do you do if you have to build a solution from the ground up that is expected to live in the field for at least 5-10 years? This is the challenge we faced when we looked to refresh our existing 10-year-old custom hardware stack to measure the fullness of trash cans and compactors.
Aug. 27, 2016 01:45 AM EDT Reads: 1,742
Extreme Computing is the ability to leverage highly performant infrastructure and software to accelerate Big Data, machine learning, HPC, and Enterprise applications. High IOPS Storage, low-latency networks, in-memory databases, GPUs and other parallel accelerators are being used to achieve faster results and help businesses make better decisions. In his session at 18th Cloud Expo, Michael O'Neill, Strategic Business Development at NVIDIA, focused on some of the unique ways extreme computing is...
Aug. 27, 2016 01:30 AM EDT Reads: 2,116
The emerging Internet of Everything creates tremendous new opportunities for customer engagement and business model innovation. However, enterprises must overcome a number of critical challenges to bring these new solutions to market. In his session at @ThingsExpo, Michael Martin, CTO/CIO at nfrastructure, outlined these key challenges and recommended approaches for overcoming them to achieve speed and agility in the design, development and implementation of Internet of Everything solutions wi...
Aug. 27, 2016 01:15 AM EDT Reads: 2,022