|By SmartBear Blog||
|May 3, 2016 03:30 PM EDT||
How to Create a High Performing API: A New Perspective for 2016
by Bob Reselman
Performance is the elusive butterfly of API development. Everybody is intrigued with its beauty, yet few know how to capture it.
In the old days, the approach of many shops to ensure a performant API was to create some code and then pass it over to the wall to QA to do load testing. Later some integration testing took place. As long as the API worked and it was met some marginal performance benchmarks, things were good.
This worked well when a public, HTTP based API, consumed by a wide variety of distributed devices was more the exception than the rule. However, today APIs are a big deal and they are everywhere, so much so that companies are posting very big infographics prominently in the front page of the New York Times to create even more awareness about the technology to the general public.
This is good news.
The rapid growth and increasing popularity of API use is causing a lot of companies to look inward, to take new views on API performance. Code, load test, and publish won't do any longer. Companies are doing more. They are looking beyond the HTTP entry points.
Today the whole technical stack upon which an API sits is grist for the performance mill.
Look to the data
One of the most interesting discoveries I've made when talking to people that publish large scale APIs is how critical underlying data structures and data architecture is to the overall picture. Diamond DevOps is a company that does a lot of work on both sides of the API fence, consuming APIs and publishing them. I talked to one of the key technical people, Diego Woitasen (@DiegoWoitasen) co-found and tech-lead, about what he looks for when considering API performance. He came back with a two words, database indexes.
Diego's take is that many times less experienced database developers will throw indexes on a database intended to speed reads without giving consideration to the impact on writes. To quote Diego:
We took an app from a client that we were to refactor, but in the meantime we needed to keep the old app running. We discovered that there were 10 to 15 tables and more than 100 indexes. Indexes affect write performance and in this case the app was used to collect data mostly. Using so many indexes was a really bad choice. You can add indexes for apps that have more read operations than write operations.
Separating read functionality from write functionality at the database level can be a critical design decision when it comes to API performance.
Using denormalization in order to separate read from write functionality proved to be a big win in terms of API performance for Dmytro Seredenko (@dseredenko) Senior Director of North American Business at EPAM Systems. According to Dmytro:
We had a requirement to expose aggregated data on visitors through the API, sliced in multiple dimensions. The underlying system was a reporting component (RDBMS) that was fed by the data from a Map-Reduce job. ... it worked pretty slowly....
So we had to denormalize aggregated data stored in the Reporting RDBMS so the data could be queried quickly without complex joins. It (denormalizing) did increase the performance significantly. Since our API was read-only, we horizontally scaled RDMS through adding read-only nodes.
You can have lightning fast web servers in play up at the endpoints, but if you're not getting the data you need, when you need it, your performance will suffer. Data architecture really does matter. However, data design is not the only consideration. Workflow process comes into play often.
It's the use case
A common scenario in API usage is what I call, "a lot of state definition in, a lot of data back."
In this type of situation, you have an API that requires you to submit a lot of information about the use case at hand. The API will do a boatload of processing on that information and return a lot of data back. I've experienced cases in the casting industry in which an agent will have to submit hundreds of actors for a given role and the API will have to process all of that information. Once processed, a lot of information about that submission is returned. The submission data is large, the processing is laborious, and the data return can be big too.
How to address this issue? To quote Dmytro Seredenko again, "It's important to keep the dialog."
Dmytro and others propose that in certain cases, it's useful to segment processing via a number of API endpoints and to provide callback information when certain background processes complete.
Those of us that have posted video for processing on the Internet are familiar with the pattern. You submit your video and then, once the upload is complete, the site will send you an email indicating your video is ready for viewing. Granted email notification is a pretty primitive way to transmit state information via callback. But, it is consistent with the conversation pattern.
Typically as a site improves processing speed, email callback gets eliminated. But, getting an email is a far sight better than having a user sitting in front of screen watching a spinning dial for tens of minutes on end.
Understanding the services your API is to deliver and figuring out how to design an architecture that segments processing into a series of dialog-like API calls will improve the overall performance of the API experience.
Still, what do you do in situations where you keep finding yourself submitting a lot of information to an API in order to get work done? This is where the notion of state caching can come into play.
Online shopping sites are essentially one big state machine. You have a lot of data in play - customers, inventory, shipments, payments, etc - all in various states of flux. Also there are algorithms reacting to any and all state change. Online shopping can be an API performance nightmare, API all upon API call needed to select items to buy, make payment and then shipment.
The online retailer Nordstromrack.com | HauteLook is confronted with this state problem all the time. The way the company has dealt with the problem is to create a core design sensibility which all developers are to follow. Raj Murali (@rex_thuh_king ) Senior Manager of ERP Engineering at Nordstromrack.com | HauteLook, states this principal simply:
"The fastest API is one that has to do NOTHING."
Raj and his team have devised a way in which a significant load of API work is done by background processes that store information in a distributed cache. In many cases, the work the API does is nothing more than checking the cache to determine the state of the given process. Also, their code takes full advantage of the HTTP response code standard. When a process is started via an API call, a 202 Accepted response code is returned. Later on when an API call needs to know if a process is complete, a 200 OK response code is delivered.
Creating an API endpoint that has essentially one piece of fast, finite work goes a long way to improving API performance. Yes, there is a lot of management to be done on the backend. However, making your API endpoint essential allows you the flexibility to seek performance gains down in the stack. The more work your API has to do, and the more state it has to hold on the web server, will make it more brittle. A brittle API may be fast today and slow a week from now.
Putting it all together
As I mentioned at the beginning of this article, there is a whole lot more to creating high performance APIs than coding and load testing. Comprehensive design and analysis all the way through the stack, from database, to workflow process design, all the way up to HTTP access point, is critical. It's a different way of thinking, a different perspective. There are the three fundamental takeaways to remember as we move forward.
First, give a lot of attention to how your API is writing and reading data. Be relentless in squeezing every bit of unnecessary work out of your data infrastructure. As we read above, be very careful about how you use indexes. Separate read databases from write database and synchronize data accordingly. Denormalize whenever possible. Make each of these things more efficient can add up to enough improvements in performance.
The second is to understand the use of your API as an aggregate of endpoints. Can you define relationships among your API endpoints that have a common semantic meaning? If so, can you make it so that your API endpoints can participate effectively and efficiently in a structured, self-enforcing conversation? Sometimes a lot of back and forth transmission between a publisher and a consumer can be more effective than one big, data heavy interaction with a lot of processing burden.
The third is have your API get as close to doing nothing as is possible. If your application accesses a lot of global state information that is slow moving, can you make it so your API avoids the costly CPU utilization that comes with in-process calculation? Can you use background processes? Can you use a distributed cache to hold slow moving data that is global to all endpoints? Can you just make a simple call to another endpoint to get the information? Again, you want your API calls to be fast, without having to bear the burden of a lot of real time processing.
Consumers want information and services that are accurate and they want them fast. Thus, just to be in the game your API needs to a level of performance that is very high.
Moving beyond the old school paradigm of code, load test, publish will open new doors in which performance is seen as an important feature of your API and not some after the fact consideration. Take a new perspective on API performance. Move beyond the endpoint perspective to one in which your entire system is really the API.
You'll be happy you did. Your customers will be even happier.
DevOps and microservices are permeating software engineering teams broadly, whether these teams are in pure software shops but happen to run a business, such Uber and Airbnb, or in companies that rely heavily on software to run more traditional business, such as financial firms or high-end manufacturers. Microservices and DevOps have created software development and therefore business speed and agility benefits, but they have also created problems; specifically, they have created software securi...
Feb. 22, 2017 11:00 AM EST Reads: 3,236
All clouds are not equal. To succeed in a DevOps context, organizations should plan to develop/deploy apps across a choice of on-premise and public clouds simultaneously depending on the business needs. This is where the concept of the Lean Cloud comes in - resting on the idea that you often need to relocate your app modules over their life cycles for both innovation and operational efficiency in the cloud. In his session at @DevOpsSummit at19th Cloud Expo, Valentin (Val) Bercovici, CTO of Soli...
Feb. 22, 2017 10:45 AM EST Reads: 340
Almost two-thirds of companies either have or soon will have IoT as the backbone of their business. Though, IoT is far more complex than most firms expected with a majority of IoT projects having failed. How can you not get trapped in the pitfalls? In his session at @ThingsExpo, Tony Shan, Chief IoTologist at Wipro, will introduce a holistic method of IoTification, which is the process of IoTifying the existing technology portfolios and business models to adopt and leverage IoT. He will delve in...
Feb. 22, 2017 10:00 AM EST Reads: 1,499
Cloud Expo, Inc. has announced today that Aruna Ravichandran, vice president of DevOps Product and Solutions Marketing at CA Technologies, has been named co-conference chair of DevOps at Cloud Expo 2017. The @DevOpsSummit at Cloud Expo New York will take place on June 6-8, 2017, at the Javits Center in New York City, New York, and @DevOpsSummit at Cloud Expo Silicon Valley will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Feb. 22, 2017 09:45 AM EST Reads: 1,324
Bert Loomis was a visionary. This general session will highlight how Bert Loomis and people like him inspire us to build great things with small inventions. In their general session at 19th Cloud Expo, Harold Hannon, Architect at IBM Bluemix, and Michael O'Neill, Strategic Business Development at Nvidia, discussed the accelerating pace of AI development and how IBM Cloud and NVIDIA are partnering to bring AI capabilities to "every day," on-demand. They also reviewed two "free infrastructure" pr...
Feb. 22, 2017 09:45 AM EST Reads: 797
In his session at @ThingsExpo, Steve Wilkes, CTO and founder of Striim, will delve into four enterprise-scale, business-critical case studies where streaming analytics serves as the key to enabling real-time data integration and right-time insights in hybrid cloud, IoT, and fog computing environments. As part of this discussion, he will also present a demo based on its partnership with Fujitsu, highlighting their technologies in a healthcare IoT use-case. The demo showcases the tracking of pati...
Feb. 22, 2017 09:29 AM EST Reads: 145
Tricky charts and visually deceptive graphs often make a case for the impact IT performance has on business. The debate isn't around the obvious; of course, IT performance metrics like website load time influence business metrics such as conversions and revenue. Rather, this presentation will explore various data analysis concepts to understand how, and how not to, assert such correlations. In his session at 20th Cloud Expo, Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Sys...
Feb. 22, 2017 09:15 AM EST Reads: 1,379
Stratoscale, the software company developing the next generation data center operating system, exhibited at SYS-CON's 18th International Cloud Expo®, which took place at the Javits Center in New York City, NY, in June 2016.Stratoscale is revolutionizing the data center with a zero-to-cloud-in-minutes solution. With Stratoscale’s hardware-agnostic, Software Defined Data Center (SDDC) solution to store everything, run anything and scale everywhere, IT is empowered to take control of their data ce...
Feb. 22, 2017 08:30 AM EST Reads: 849
The buzz continues for cloud, data analytics and the Internet of Things (IoT) and their collective impact across all industries. But a new conversation is emerging - how do companies use industry disruption and technology enablers to lead in markets undergoing change, uncertainty and ambiguity? Organizations of all sizes need to evolve and transform, often under massive pressure, as industry lines blur and merge and traditional business models are assaulted and turned upside down. In this new da...
Feb. 22, 2017 08:30 AM EST Reads: 833
It is one thing to build single industrial IoT applications, but what will it take to build the Smart Cities and truly society changing applications of the future? The technology won’t be the problem, it will be the number of parties that need to work together and be aligned in their motivation to succeed. In his Day 2 Keynote at @ThingsExpo, Henrik Kenani Dahlgren, Portfolio Marketing Manager at Ericsson, discussed how to plan to cooperate, partner, and form lasting all-star teams to change the...
Feb. 22, 2017 08:15 AM EST Reads: 4,655
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
Feb. 22, 2017 08:00 AM EST Reads: 671
What are the new priorities for the connected business? First: businesses need to think differently about the types of connections they will need to make – these span well beyond the traditional app to app into more modern forms of integration including SaaS integrations, mobile integrations, APIs, device integration and Big Data integration. It’s important these are unified together vs. doing them all piecemeal. Second, these types of connections need to be simple to design, adapt and configure...
Feb. 22, 2017 07:00 AM EST Reads: 2,936
To manage complex web services with lots of calls to the cloud, many businesses have invested in Application Performance Management (APM) and Network Performance Management (NPM) tools. Together APM and NPM tools are essential aids in improving a business's infrastructure required to support an effective web experience... but they are missing a critical component - Internet visibility.
Feb. 22, 2017 06:45 AM EST Reads: 1,639
Microservices are a very exciting architectural approach that many organizations are looking to as a way to accelerate innovation. Microservices promise to allow teams to move away from monolithic "ball of mud" systems, but the reality is that, in the vast majority of organizations, different projects and technologies will continue to be developed at different speeds. How to handle the dependencies between these disparate systems with different iteration cycles? Consider the "canoncial problem" ...
Feb. 22, 2017 06:00 AM EST Reads: 5,264
Both SaaS vendors and SaaS buyers are going “all-in” to hyperscale IaaS platforms such as AWS, which is disrupting the SaaS value proposition. Why should the enterprise SaaS consumer pay for the SaaS service if their data is resident in adjacent AWS S3 buckets? If both SaaS sellers and buyers are using the same cloud tools, automation and pay-per-transaction model offered by IaaS platforms, then why not host the “shrink-wrapped” software in the customers’ cloud? Further, serverless computing, cl...
Feb. 22, 2017 06:00 AM EST Reads: 1,851
“We're a global managed hosting provider. Our core customer set is a U.S.-based customer that is looking to go global,” explained Adam Rogers, Managing Director at ANEXIA, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Feb. 22, 2017 05:45 AM EST Reads: 1,701
The speed of software changes in growing and large scale rapid-paced DevOps environments presents a challenge for continuous testing. Many organizations struggle to get this right. Practices that work for small scale continuous testing may not be sufficient as the requirements grow. In his session at DevOps Summit, Marc Hornbeek, Sr. Solutions Architect of DevOps continuous test solutions at Spirent Communications, explained the best practices of continuous testing at high scale, which is rele...
Feb. 22, 2017 03:15 AM EST Reads: 5,535
Hardware virtualization and cloud computing allowed us to increase resource utilization and increase our flexibility to respond to business demand. Docker Containers are the next quantum leap - Are they?! Databases always represented an additional set of challenges unique to running workloads requiring a maximum of I/O, network, CPU resources combined with data locality.
Feb. 22, 2017 02:45 AM EST Reads: 2,053
"Matrix is an ambitious open standard and implementation that's set up to break down the fragmentation problems that exist in IP messaging and VoIP communication," explained John Woolf, Technical Evangelist at Matrix, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Feb. 22, 2017 02:00 AM EST Reads: 13,012
"A lot of times people will come to us and have a very diverse set of requirements or very customized need and we'll help them to implement it in a fashion that you can't just buy off of the shelf," explained Nick Rose, CTO of Enzu, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Feb. 22, 2017 01:45 AM EST Reads: 6,082