Welcome!

@CloudExpo Authors: Yeshim Deniz, Elizabeth White, Pat Romanski, Liz McMillan, Aruna Ravichandran

Related Topics: @CloudExpo, Java IoT, @BigDataExpo, @ThingsExpo, @DevOpsSummit

@CloudExpo: Article

A High Performing API | @CloudExpo #API #IoT #M2M #DigitalTransformation

Performance is the elusive butterfly of API development

How to Create a High Performing API: A New Perspective for 2016
by Bob Reselman

Performance is the elusive butterfly of API development. Everybody is intrigued with its beauty, yet few know how to capture it.

In the old days, the approach of many shops to ensure a performant API was to create some code and then pass it over to the wall to QA to do load testing. Later some integration testing took place. As long as the API worked and it was met some marginal performance benchmarks, things were good.

This worked well when a public, HTTP based API, consumed by a wide variety of distributed devices was more the exception than the rule. However, today APIs are a big deal and they are everywhere, so much so that companies are posting very big infographics prominently in the front page of the New York Times to create even more awareness about the technology to the general public.

This is good news.

The rapid growth and increasing popularity of API use is causing a lot of companies to look inward, to take new views on API performance. Code, load test, and publish won't do any longer. Companies are doing more. They are looking beyond the HTTP entry points.

Today the whole technical stack upon which an API sits is grist for the performance mill.

Look to the data
One of the most interesting discoveries I've made when talking to people that publish large scale APIs is how critical underlying data structures and data architecture is to the overall picture. Diamond DevOps is a company that does a lot of work on both sides of the API fence, consuming APIs and publishing them. I talked to one of the key technical people, Diego Woitasen (@DiegoWoitasen) co-found and tech-lead, about what he looks for when considering API performance. He came back with a two words, database indexes.

Diego's take is that many times less experienced database developers will throw indexes on a database intended to speed reads without giving consideration to the impact on writes. To quote Diego:

We took an app from a client that we were to refactor, but in the meantime we needed to keep the old app running. We discovered that there were 10 to 15 tables and more than 100 indexes. Indexes affect write performance and in this case the app was used to collect data mostly. Using so many indexes was a really bad choice. You can add indexes for apps that have more read operations than write operations.

Separating read functionality from write functionality at the database level can be a critical design decision when it comes to API performance.

Using denormalization in order to separate read from write functionality proved to be a big win in terms of API performance for Dmytro Seredenko (@dseredenko) Senior Director of North American Business at EPAM Systems. According to Dmytro:

We had a requirement to expose aggregated data on visitors through the API, sliced in multiple dimensions. The underlying system was a reporting component (RDBMS) that was fed by the data from a Map-Reduce job. ... it worked pretty slowly....

So we had to denormalize aggregated data stored in the Reporting RDBMS so the data could be queried quickly without complex joins. It (denormalizing) did increase the performance significantly. Since our API was read-only, we horizontally scaled RDMS through adding read-only nodes.

You can have lightning fast web servers in play up at the endpoints, but if you're not getting the data you need, when you need it, your performance will suffer. Data architecture really does matter. However, data design is not the only consideration. Workflow process comes into play often.

It's the use case
A common scenario in API usage is what I call, "a lot of state definition in, a lot of data back."

In this type of situation, you have an API that requires you to submit a lot of information about the use case at hand. The API will do a boatload of processing on that information and return a lot of data back. I've experienced cases in the casting industry in which an agent will have to submit hundreds of actors for a given role and the API will have to process all of that information. Once processed, a lot of information about that submission is returned. The submission data is large, the processing is laborious, and the data return can be big too.

How to address this issue? To quote Dmytro Seredenko again, "It's important to keep the dialog."

Dmytro and others propose that in certain cases, it's useful to segment processing via a number of API endpoints and to provide callback information when certain background processes complete.

Those of us that have posted video for processing on the Internet are familiar with the pattern. You submit your video and then, once the upload is complete, the site will send you an email indicating your video is ready for viewing. Granted email notification is a pretty primitive way to transmit state information via callback. But, it is consistent with the conversation pattern.

Typically as a site improves processing speed, email callback gets eliminated. But, getting an email is a far sight better than having a user sitting in front of screen watching a spinning dial for tens of minutes on end.

Understanding the services your API is to deliver and figuring out how to design an architecture that segments processing into a series of dialog-like API calls will improve the overall performance of the API experience.

Still, what do you do in situations where you keep finding yourself submitting a lot of information to an API in order to get work done? This is where the notion of state caching can come into play.

Be essential
Online shopping sites are essentially one big state machine. You have a lot of data in play - customers, inventory, shipments, payments, etc  - all in various states of flux. Also there are algorithms reacting to any and all state change. Online shopping can be an API performance nightmare, API all upon API call needed to select items to buy, make payment and then shipment.

The online retailer Nordstromrack.com | HauteLook is confronted with this state problem all the time. The way the company has dealt with the problem is to create a core design sensibility which all developers are to follow. Raj Murali (@rex_thuh_king ) Senior Manager of ERP Engineering at Nordstromrack.com | HauteLook, states this principal simply:

"The fastest API is one that has to do NOTHING."

Raj and his team have devised a way in which a significant load of API work is done by background processes that store information in a distributed cache. In many cases, the work the API does is nothing more than checking the cache to determine the state of the given process. Also, their code takes full advantage of the HTTP response code standard. When a process is started via an API call, a 202 Accepted response code is returned. Later on when an API call needs to know if a process is complete, a 200 OK response code is delivered.

Creating an API endpoint that has essentially one piece of fast, finite work goes a long way to improving API performance. Yes, there is a lot of management to be done on the backend. However, making your API endpoint essential allows you the flexibility to seek performance gains down in the stack. The more work your API has to do, and the more state it has to hold on the web server, will make it more brittle. A brittle API may be fast today and slow a week from now.

Putting it all together
As I mentioned at the beginning of this article, there is a whole lot more to creating high performance APIs than coding and load testing. Comprehensive design and analysis all the way through the stack, from database, to workflow process design, all the way up to HTTP access point, is critical. It's a different way of thinking, a different perspective. There are the three fundamental takeaways to remember as we move forward.

First, give a lot of attention to how your API is writing and reading data. Be relentless in squeezing every bit of unnecessary work out of your data infrastructure. As we read above, be very careful about how you use indexes. Separate read databases from write database and synchronize data accordingly. Denormalize whenever possible. Make each of these things more efficient can add up to enough improvements in performance.

The second is to understand the use of your API as an aggregate of endpoints. Can you define relationships among your API endpoints that have a common semantic meaning? If so, can you make it so that your API endpoints can participate effectively and efficiently in a structured, self-enforcing conversation? Sometimes a lot of back and forth transmission between a publisher and a consumer can be more effective than one big, data heavy interaction with a lot of processing burden.

The third is have your API get as close to doing nothing as is possible. If your application accesses a lot of global state information that is slow moving, can you make it so your API avoids the costly CPU utilization that comes with in-process calculation? Can you use background processes? Can you use a distributed cache to hold slow moving data that is global to all endpoints? Can you just make a simple call to another endpoint to get the information? Again, you want your API calls to be fast, without having to bear the burden of a lot of real time processing.

In closing
Consumers want information and services that are accurate and they want them fast. Thus, just to be in the game your API needs to a level of performance that is very high.

Moving beyond the old school paradigm of code, load test, publish will open new doors in which performance is seen as an important feature of your API and not some after the fact consideration. Take a new perspective on API performance. Move beyond the endpoint perspective to one in which your entire system is really the API.

You'll be happy you did. Your customers will be even happier.

More Stories By SmartBear Blog

As the leader in software quality tools for the connected world, SmartBear supports more than two million software professionals and over 25,000 organizations in 90 countries that use its products to build and deliver the world’s greatest applications. With today’s applications deploying on mobile, Web, desktop, Internet of Things (IoT) or even embedded computing platforms, the connected nature of these applications through public and private APIs presents a unique set of challenges for developers, testers and operations teams. SmartBear's software quality tools assist with code review, functional and load testing, API readiness as well as performance monitoring of these modern applications.

@CloudExpo Stories
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities – ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups. As a result, many firms employ new business models that place enormous impor...
SYS-CON Events announced today that TidalScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale is the leading provider of Software-Defined Servers that bring flexibility to modern data centers by right-sizing servers on the fly to fit any data set or workload. TidalScale’s award-winning inverse hypervisor technology combines multiple commodity servers (including their ass...
As popularity of the smart home is growing and continues to go mainstream, technological factors play a greater role. The IoT protocol houses the interoperability battery consumption, security, and configuration of a smart home device, and it can be difficult for companies to choose the right kind for their product. For both DIY and professionally installed smart homes, developers need to consider each of these elements for their product to be successful in the market and current smart homes.
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, will go over the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, applicatio...
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, will lead you through the exciting evolution of the cloud. He'll look at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering ...
As hybrid cloud becomes the de-facto standard mode of operation for most enterprises, new challenges arise on how to efficiently and economically share data across environments. In his session at 21st Cloud Expo, Dr. Allon Cohen, VP of Product at Elastifile, will explore new techniques and best practices that help enterprise IT benefit from the advantages of hybrid cloud environments by enabling data availability for both legacy enterprise and cloud-native mission critical applications. By rev...
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we'v...
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp emp...
SYS-CON Events announced today that TidalScale, a leading provider of systems and services, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale has been involved in shaping the computing landscape. They've designed, developed and deployed some of the most important and successful systems and services in the history of the computing industry - internet, Ethernet, operating s...
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant tha...
Infoblox delivers Actionable Network Intelligence to enterprise, government, and service provider customers around the world. They are the industry leader in DNS, DHCP, and IP address management, the category known as DDI. We empower thousands of organizations to control and secure their networks from the core-enabling them to increase efficiency and visibility, improve customer service, and meet compliance requirements.
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, will describe how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launchi...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
Amazon is pursuing new markets and disrupting industries at an incredible pace. Almost every industry seems to be in its crosshairs. Companies and industries that once thought they were safe are now worried about being “Amazoned.”. The new watch word should be “Be afraid. Be very afraid.” In his session 21st Cloud Expo, Chris Kocher, a co-founder of Grey Heron, will address questions such as: What new areas is Amazon disrupting? How are they doing this? Where are they likely to go? What are th...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...