|By SmartBear Blog||
|May 3, 2016 03:30 PM EDT||
How to Create a High Performing API: A New Perspective for 2016
by Bob Reselman
Performance is the elusive butterfly of API development. Everybody is intrigued with its beauty, yet few know how to capture it.
In the old days, the approach of many shops to ensure a performant API was to create some code and then pass it over to the wall to QA to do load testing. Later some integration testing took place. As long as the API worked and it was met some marginal performance benchmarks, things were good.
This worked well when a public, HTTP based API, consumed by a wide variety of distributed devices was more the exception than the rule. However, today APIs are a big deal and they are everywhere, so much so that companies are posting very big infographics prominently in the front page of the New York Times to create even more awareness about the technology to the general public.
This is good news.
The rapid growth and increasing popularity of API use is causing a lot of companies to look inward, to take new views on API performance. Code, load test, and publish won't do any longer. Companies are doing more. They are looking beyond the HTTP entry points.
Today the whole technical stack upon which an API sits is grist for the performance mill.
Look to the data
One of the most interesting discoveries I've made when talking to people that publish large scale APIs is how critical underlying data structures and data architecture is to the overall picture. Diamond DevOps is a company that does a lot of work on both sides of the API fence, consuming APIs and publishing them. I talked to one of the key technical people, Diego Woitasen (@DiegoWoitasen) co-found and tech-lead, about what he looks for when considering API performance. He came back with a two words, database indexes.
Diego's take is that many times less experienced database developers will throw indexes on a database intended to speed reads without giving consideration to the impact on writes. To quote Diego:
We took an app from a client that we were to refactor, but in the meantime we needed to keep the old app running. We discovered that there were 10 to 15 tables and more than 100 indexes. Indexes affect write performance and in this case the app was used to collect data mostly. Using so many indexes was a really bad choice. You can add indexes for apps that have more read operations than write operations.
Separating read functionality from write functionality at the database level can be a critical design decision when it comes to API performance.
Using denormalization in order to separate read from write functionality proved to be a big win in terms of API performance for Dmytro Seredenko (@dseredenko) Senior Director of North American Business at EPAM Systems. According to Dmytro:
We had a requirement to expose aggregated data on visitors through the API, sliced in multiple dimensions. The underlying system was a reporting component (RDBMS) that was fed by the data from a Map-Reduce job. ... it worked pretty slowly....
So we had to denormalize aggregated data stored in the Reporting RDBMS so the data could be queried quickly without complex joins. It (denormalizing) did increase the performance significantly. Since our API was read-only, we horizontally scaled RDMS through adding read-only nodes.
You can have lightning fast web servers in play up at the endpoints, but if you're not getting the data you need, when you need it, your performance will suffer. Data architecture really does matter. However, data design is not the only consideration. Workflow process comes into play often.
It's the use case
A common scenario in API usage is what I call, "a lot of state definition in, a lot of data back."
In this type of situation, you have an API that requires you to submit a lot of information about the use case at hand. The API will do a boatload of processing on that information and return a lot of data back. I've experienced cases in the casting industry in which an agent will have to submit hundreds of actors for a given role and the API will have to process all of that information. Once processed, a lot of information about that submission is returned. The submission data is large, the processing is laborious, and the data return can be big too.
How to address this issue? To quote Dmytro Seredenko again, "It's important to keep the dialog."
Dmytro and others propose that in certain cases, it's useful to segment processing via a number of API endpoints and to provide callback information when certain background processes complete.
Those of us that have posted video for processing on the Internet are familiar with the pattern. You submit your video and then, once the upload is complete, the site will send you an email indicating your video is ready for viewing. Granted email notification is a pretty primitive way to transmit state information via callback. But, it is consistent with the conversation pattern.
Typically as a site improves processing speed, email callback gets eliminated. But, getting an email is a far sight better than having a user sitting in front of screen watching a spinning dial for tens of minutes on end.
Understanding the services your API is to deliver and figuring out how to design an architecture that segments processing into a series of dialog-like API calls will improve the overall performance of the API experience.
Still, what do you do in situations where you keep finding yourself submitting a lot of information to an API in order to get work done? This is where the notion of state caching can come into play.
Online shopping sites are essentially one big state machine. You have a lot of data in play - customers, inventory, shipments, payments, etc - all in various states of flux. Also there are algorithms reacting to any and all state change. Online shopping can be an API performance nightmare, API all upon API call needed to select items to buy, make payment and then shipment.
The online retailer Nordstromrack.com | HauteLook is confronted with this state problem all the time. The way the company has dealt with the problem is to create a core design sensibility which all developers are to follow. Raj Murali (@rex_thuh_king ) Senior Manager of ERP Engineering at Nordstromrack.com | HauteLook, states this principal simply:
"The fastest API is one that has to do NOTHING."
Raj and his team have devised a way in which a significant load of API work is done by background processes that store information in a distributed cache. In many cases, the work the API does is nothing more than checking the cache to determine the state of the given process. Also, their code takes full advantage of the HTTP response code standard. When a process is started via an API call, a 202 Accepted response code is returned. Later on when an API call needs to know if a process is complete, a 200 OK response code is delivered.
Creating an API endpoint that has essentially one piece of fast, finite work goes a long way to improving API performance. Yes, there is a lot of management to be done on the backend. However, making your API endpoint essential allows you the flexibility to seek performance gains down in the stack. The more work your API has to do, and the more state it has to hold on the web server, will make it more brittle. A brittle API may be fast today and slow a week from now.
Putting it all together
As I mentioned at the beginning of this article, there is a whole lot more to creating high performance APIs than coding and load testing. Comprehensive design and analysis all the way through the stack, from database, to workflow process design, all the way up to HTTP access point, is critical. It's a different way of thinking, a different perspective. There are the three fundamental takeaways to remember as we move forward.
First, give a lot of attention to how your API is writing and reading data. Be relentless in squeezing every bit of unnecessary work out of your data infrastructure. As we read above, be very careful about how you use indexes. Separate read databases from write database and synchronize data accordingly. Denormalize whenever possible. Make each of these things more efficient can add up to enough improvements in performance.
The second is to understand the use of your API as an aggregate of endpoints. Can you define relationships among your API endpoints that have a common semantic meaning? If so, can you make it so that your API endpoints can participate effectively and efficiently in a structured, self-enforcing conversation? Sometimes a lot of back and forth transmission between a publisher and a consumer can be more effective than one big, data heavy interaction with a lot of processing burden.
The third is have your API get as close to doing nothing as is possible. If your application accesses a lot of global state information that is slow moving, can you make it so your API avoids the costly CPU utilization that comes with in-process calculation? Can you use background processes? Can you use a distributed cache to hold slow moving data that is global to all endpoints? Can you just make a simple call to another endpoint to get the information? Again, you want your API calls to be fast, without having to bear the burden of a lot of real time processing.
Consumers want information and services that are accurate and they want them fast. Thus, just to be in the game your API needs to a level of performance that is very high.
Moving beyond the old school paradigm of code, load test, publish will open new doors in which performance is seen as an important feature of your API and not some after the fact consideration. Take a new perspective on API performance. Move beyond the endpoint perspective to one in which your entire system is really the API.
You'll be happy you did. Your customers will be even happier.
Niagara Networks exhibited at the 19th International Cloud Expo, which took place at the Santa Clara Convention Center in Santa Clara, CA, in November 2016. Niagara Networks offers the highest port-density systems, and the most complete Next-Generation Network Visibility systems including Network Packet Brokers, Bypass Switches, and Network TAPs.
Mar. 23, 2017 03:00 AM EDT Reads: 2,874
Extreme Computing is the ability to leverage highly performant infrastructure and software to accelerate Big Data, machine learning, HPC, and Enterprise applications. High IOPS Storage, low-latency networks, in-memory databases, GPUs and other parallel accelerators are being used to achieve faster results and help businesses make better decisions. In his session at 18th Cloud Expo, Michael O'Neill, Strategic Business Development at NVIDIA, focused on some of the unique ways extreme computing is...
Mar. 23, 2017 02:15 AM EDT Reads: 11,087
SYS-CON Events announced today that HTBase will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. HTBase (Gartner 2016 Cool Vendor) delivers a Composable IT infrastructure solution architected for agility and increased efficiency. It turns compute, storage, and fabric into fluid pools of resources that are easily composed and re-composed to meet each application’s needs. With HTBase, companies can quickly prov...
Mar. 23, 2017 02:15 AM EDT Reads: 2,241
SYS-CON Events announced today that Outlyer, a monitoring service for DevOps and operations teams, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Outlyer is a monitoring service for DevOps and Operations teams running Cloud, SaaS, Microservices and IoT deployments. Designed for today's dynamic environments that need beyond cloud-scale monitoring, we make monitoring effortless so you ...
Mar. 23, 2017 02:00 AM EDT Reads: 3,643
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex softw...
Mar. 23, 2017 01:15 AM EDT Reads: 3,283
What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? Sounds magical, and it is! In his session at 20th Cloud Expo, Chris Munns, Senior Developer Advocate for Serverless Applications at Amazon Web Services, will show how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverle...
Mar. 23, 2017 12:00 AM EDT Reads: 1,150
In his General Session at 17th Cloud Expo, Bruce Swann, Senior Product Marketing Manager for Adobe Campaign, explored the key ingredients of cross-channel marketing in a digital world. Learn how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects).
Mar. 22, 2017 11:00 PM EDT Reads: 2,930
SYS-CON Events announced today that Hitrons Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Hitrons Solutions Inc. is distributor in the North American market for unique products and services of small and medium-size businesses, including cloud services and solutions, SEO marketing platforms, and mobile applications.
Mar. 22, 2017 10:15 PM EDT Reads: 3,244
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
Mar. 22, 2017 10:15 PM EDT Reads: 3,226
Historically, some banking activities such as trading have been relying heavily on analytics and cutting edge algorithmic tools. The coming of age of powerful data analytics solutions combined with the development of intelligent algorithms have created new opportunities for financial institutions. In his session at 20th Cloud Expo, Sebastien Meunier, Head of Digital for North America at Chappuis Halder & Co., will discuss how these tools can be leveraged to develop a lasting competitive advanta...
Mar. 22, 2017 09:30 PM EDT Reads: 2,302
Your homes and cars can be automated and self-serviced. Why can't your storage? From simply asking questions to analyze and troubleshoot your infrastructure, to provisioning storage with snapshots, recovery and replication, your wildest sci-fi dream has come true. In his session at @DevOpsSummit at 20th Cloud Expo, Dan Florea, Director of Product Management at Tintri, will provide a ChatOps demo where you can talk to your storage and manage it from anywhere, through Slack and similar services ...
Mar. 22, 2017 06:15 PM EDT Reads: 3,930
VeriStor Systems has announced that CRN has named VeriStor to its 2017 Managed Service Provider (MSP) 500 list in the Elite 150 category. This annual list recognizes North American solution providers with cutting-edge approaches to delivering managed services. Their offerings help companies navigate the complex and ever-changing landscape of IT, improve operational efficiencies, and maximize their return on IT investments. In today’s fast-paced business environments, MSPs play an important role...
Mar. 22, 2017 05:45 PM EDT Reads: 1,803
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
Mar. 22, 2017 04:30 PM EDT Reads: 835
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a ...
Mar. 22, 2017 03:45 PM EDT Reads: 1,171
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor - all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...
Mar. 22, 2017 03:30 PM EDT Reads: 810
SYS-CON Events announced today that Juniper Networks (NYSE: JNPR), an industry leader in automated, scalable and secure networks, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Juniper Networks challenges the status quo with products, solutions and services that transform the economics of networking. The company co-innovates with customers and partners to deliver automated, scalable and secure network...
Mar. 22, 2017 03:15 PM EDT Reads: 546
My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sum...
Mar. 22, 2017 02:45 PM EDT Reads: 2,226
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...
Mar. 22, 2017 02:15 PM EDT Reads: 2,184
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Mar. 22, 2017 02:00 PM EDT Reads: 905
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
Mar. 22, 2017 02:00 PM EDT Reads: 1,374