|By Radu Gheorghe||
|February 27, 2017 12:30 PM EST||
When it comes to centralizing logs to Elasticsearch, the first log shipper that comes to mind is Logstash. People hear about it even if it's not clear what it does:
- Bob: I'm looking to aggregate logs
- Alice: you mean... like... Logstash?
When you get into it, you realize centralizing logs often implies a bunch of things, and Logstash isn't the only log shipper that fits the bill:
- fetching data from a source: a file, a UNIX socket, TCP, UDP...
- processing it: appending a timestamp, parsing unstructured data, adding Geo information based on IP
- shipping it to a destination. In this case, Elasticsearch. And because Elasticsearch can be down or struggling, or the network can be down, the shipper would ideally be able to buffer and retry
In this post, we'll describe Logstash and its alternatives - 5 "alternative" log shippers (Filebeat, Fluentd, rsyslog, syslog-ng and Logagent), so you know which fits which use-case.
It's not the oldest shipper of this list (that would be syslog-ng, ironically the only one with "new" in its name), it's certainly the best known. That's because it has lots of plugins: inputs, codecs, filters and outputs. Basically, you can take pretty much any kind of data, enrich it as you wish, then push it to lots of destinations.
Logstash's main strongpoint is flexibility, due to the number of plugins. Also, its clear documentation and straightforward configuration format means it's used in a variety of use-cases. This leads to a virtuous cycle: you can find online recipes for doing pretty much anything. Here are a few examples from us: 5 minute intro, reindexing data in Elasticsearch, parsing Elasticsearch logs, rewriting Elasticsearch slowlogs so you can replay them with JMeter.
Logstash's Achille's heel has always been performance and resource consumption (the default heap size is 1GB). Though performance improved a lot over the years, it's still a lot slower than the alternatives. We've done some benchmarks comparing Logstash to rsyslog and to filebeat and Elasticsearch's Ingest node. This can be a problem for high traffic deployments, when Logstash servers would need to be comparable with the Elasticsearch ones.
Another problem is that Logstash currently doesn't buffer yet. A typical workaround is to use Redis or Kafka as a central buffer:
Because of the flexibility and abundance of recipes, Logstash is a great tool for prototyping, especially for more complex parsing. If you have big servers, you might as well install Logstash on each. You won't need buffering if you're tailing files, because the file itself can act as a buffer (i.e. Logstash remembers where it left off):
If you have small servers, installing Logstash on each is a no go, so you'll need a lightweight log shipper on them, that could push data to Elasticsearch though one (or more) central Logstash servers:
As your logging project moves forward, you may or may not need to change your log shipper because of performance/cost. When choosing whether Logstash performs well enough, it's important to have a good estimation of throughput needs - which would predict how much you'd spend on Logstash hardware.
As part of the Beats "family", Filebeat is a lightweight log shipper that came to life precisely to address the weakness of Logstash: Filebeat was made to be that lightweight log shipper that pushes to Logstash.
With version 5.x, Elasticsearch has some parsing capabilities (like Logstash's filters) called Ingest. This means you can push directly from Filebeat to Elasticsearch, and have Elasticsearch do both parsing and storing. You shouldn't need a buffer when tailing files because, just as Logstash, Filebeat remembers where it left off:
If you need buffering (e.g. because you don't want to fill up the file system on logging servers), you can use Redis/Kafka, because Filebeat can talk to them:
Filebeat is just a tiny binary with no dependencies. It takes very little resources and, though it's young, I find it quite reliable - mainly because it's simple and there are few things that can go wrong. That said, you have lots of knobs regarding what it can do. For example, how aggressive it should be in searching for new files to tail and when to close file handles when a file didn't get changes for a while.
Filebeat's scope is very limited, so you'll have a problem to solve somewhere else. For example, if you use Logstash down the pipeline, you have about the same performance issue. Because of this, Filebeat's scope is growing. Initially it could only send logs to Logstash and Elasticsearch, but now it can send to Kafka and Redis, and in 5.x it also gains filtering capabilities.
Filebeat is great for solving a specific problem: you log to files, and you want to either:
- ship directly to Elasticsearch. This works if you want to just "grep" them or if you log in JSON (Filebeat can parse JSON). Or, if you want to use Elasticsearch's Ingest for parsing and enriching (assuming the performance and functionality of Ingest fits your needs)
- put them in Kafka/Redis, so another shipper (e.g. Logstash, or a custom Kafka consumer) can do the enriching and shipping. This assumes that the chosen shipper fits your functionality and performance needs
This is our log shipper that was born out of the need to make it easy for someone who didn't use a log shipper before to send logs to Logsene (our logging SaaS which exposes the Elasticsearch API). And because Logsene exposes the Elasticsearch API, Logagent can be just as easily used to push data to Elasticsearch.
The main one is ease of use: if Logstash is easy (actually, you still need a bit of learning if you never used it, that's natural), this one really gets you started in a minute. It tails everything in /var/log out of the box, parses various logging formats out of the box (Elasticsearch, Solr, MongoDB, Apache HTTPD...). It can mask sensitive data like PII, date of birth, credit card numbers, etc. It will also do GeoIP enriching based on IPs (e.g., for access logs) and update the GeoIP database automatically. It's also light and fast, you'll be able to put it on most logging boxes (unless you have very small ones, like appliances). The new 2.x version added support for pluggable inputs and outputs in a form of 3rd party node.js modules. Very importantly, Logagent has local buffering so, unlike Logstash, it will not lose your logs when the destination is not available.
Logagent is still young, although is developing and maturing quickly. It has some interesting functionality (e.g. it accepts Heroku or CloudFoundry logs), but it is not yet as flexible as Logstash.
Logagent is a good choice of a shipper that can do everything (tail, parse, buffer - yes, it can buffer on disk - and ship) that you can install on each logging server. Especially if you want to get started quickly. Logagent is embedded in Sematext Docker Agent to parse and ship Docker containers logs. Sematext Docker Agent works with Docker Swarm, Docker Datacenter, Docker Cloud, as well as Amazon EC2, Google Container Engine, Kubernetes, Mesos, RancherOS, and CoreOS, so for Docker log shipping, this is the tool to use.
The default syslog daemon on most Linux distros, rsyslog can do so much more than just picking logs from the syslog socket and writing to /var/log/messages. It can tail files, parse them, buffer (on disk and in memory) and ship to a number of destinations, including Elasticsearch. You can find a howto for processing Apache and system logs here.
rsyslog is the fastest shipper that we tested so far. If you use it as a simple router/shipper, any decent machine will be limited by network bandwidth, but it really shines when you want to parse multiple rules. Its grammar-based parsing module (mmnormalize) works at constant speed no matter the number of rules (we tested this claim). This means that with 20-30 rules, like you have when parsing Cisco logs, it can outperform the regex-based parsers like grok by a factor of 100 (it can be more or less, depending on the grok implementation and liblognorm version).
It's also one of the lightest parsers you can find, depending on the configured memory buffers.
rsyslog requires more work to get the configuration right (you can find some sample configuration snippets here on our blog) and this is made more difficult by two things:
- documentation is hard to navigate, especially for somebody new to the terminology
- versions up to 5.x had a different configuration format (expanded from the syslogd config format, which it still supports). Newer versions can still work with the old format, but most newer features (like the Elasticsearch output) only work with the new configuration format, but then again there are older plugins (for example, the Postgres output) which only support the old format
Though rsyslog tends to be reliable once you get to a stable configuration (and it's rich enough that there are usually multiple ways of getting the same result), you're likely to find some interesting bugs along the way. Not all features are tested as part of the testbench.
rsyslog fits well in scenarios where you either need something very light yet capable (an appliance, a small VM, collecting syslog from within a Docker container). If you need to do processing in another shipper (e.g. Logstash) you can forward JSON over TCP for example, or connect them via a Kafka/Redis buffer.
rsyslog also works well when you need that ultimate performance. Especially if you have multiple parsing rules. Then it makes sense to invest time in getting that configuration working.
You can think of syslog-ng as an alternative to rsyslog (though historically it was actually the other way around). It's also a modular syslog daemon, that can do much more than just syslog. It recently received disk buffers and an Elasticsearch HTTP output. Equipped with a grammar-based parser (PatternDB), it has all you probably need to be a good log shipper to Elasticsearch.
Like rsyslog, it's a light log shipper and it also performs well. It used to be a lot slower than rsyslog before, and I haven't benchmarked the two recently, but 570K logs/s two years ago isn't bad at all. Unlike rsyslog, it features a clear, consistent configuration format and has nice documentation.
The main reason why distros switched to rsyslog was syslog-ng Premium Edition, which used to be much more feature-rich than the Open Source Edition which was somewhat restricted back then. We're concentrating on the Open Source Edition here, all these log shippers are open source. Things have changed in the meantime, for example disk buffers, which used to be a PE feature, landed in OSE. Still, some features, like the reliable delivery protocol (with application-level acknowledgements) have not made it to OSE yet.
Similarly to rsyslog, you'd probably want to deploy syslog-ng on boxes where resources are tight, yet you do want to perform potentially complex processing. As with rsyslog, there's a Kafka output that allows you to use Kafka as a central queue and potentially do more processing in Logstash or a custom consumer:
The difference is, syslog-ng has an easier, more polished feel than rsyslog, but likely not that ultimate performance: for example, only outputs are buffered, so processing is done before buffering - meaning that a processing spike would put pressure up the logging stream.
Fluentd was built on the idea of logging in JSON wherever possible (which is a practice we totally agree with) so that log shippers down the line don't have to guess which substring is which field of which type. As a result, there are libraries for virtually every language, meaning you can easily plug in your custom applications to your logging pipeline.
Like most Logstash plugins, Fluentd plugins are in Ruby and very easy to write. So there are lots of them, pretty much any source and destination has a plugin (with varying degrees of maturity, of course). This, coupled with the "fluent libraries" means you can easily hook almost anything to anything using Fluentd.
Because in most cases you'll get structured data through Fluentd, it's not made to have the flexibility of other shippers on this list (Filebeat excluded). You can still parse unstructured via regular expressions and filter them using tags, for example, but you don't get features such as local variables or full-blown conditionals. Also, while performance is fine for most use-cases, it's not in on the top of this list: buffers exist only for outputs (like in syslog-ng), single-threaded core and the Ruby GIL for plugins means ultimate performance on big boxes is limited, but resource consumption is acceptable for most use-cases. For small/embedded devices, you might want to look at Fluent Bit, which is to Fluentd similar to how Filebeat is for Logstash.
Fluentd is a good fit when you have diverse or exotic sources and destinations for your logs, because of the number of plugins. Also, if most of the sources are custom applications, you may find it easier to work with fluent libraries than coupling a logging library with a log shipper. Especially if your applications are written in multiple languages - meaning you'd use multiple logging libraries, which may behave differently.
First of all, the conclusion is that you're awesome for reading all the way to this point. If you did that, you get the nuances of an "it depends on your use-case" kind of answer. All these shippers have their pros and cons, and ultimately it's down to your specifications (and in practice, also to your personal preferences) to choose the one that works best for you. If you need help deciding, integrating, or really any help with logging don't be afraid to reach out - we offer Logging Consulting. Similarly, if you are looking for a place to ship your logs and avoid costs/headaches associated with running the full ELK/Elastic Stack on your own servers, check out Logsene - it exposes Elasticsearch API, so you can use it with all shippers we covered here.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
Mar. 23, 2017 06:00 PM EDT Reads: 1,170
My team embarked on building a data lake for our sales and marketing data to better understand customer journeys. This required building a hybrid data pipeline to connect our cloud CRM with the new Hadoop Data Lake. One challenge is that IT was not in a position to provide support until we proved value and marketing did not have the experience, so we embarked on the journey ourselves within the product marketing team for our line of business within Progress. In his session at @BigDataExpo, Sum...
Mar. 23, 2017 04:15 PM EDT Reads: 2,363
SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in S...
Mar. 23, 2017 03:45 PM EDT Reads: 505
The taxi industry never saw Uber coming. Startups are a threat to incumbents like never before, and a major enabler for startups is that they are instantly “cloud ready.” If innovation moves at the pace of IT, then your company is in trouble. Why? Because your data center will not keep up with frenetic pace AWS, Microsoft and Google are rolling out new capabilities In his session at 20th Cloud Expo, Don Browning, VP of Cloud Architecture at Turner, will posit that disruption is inevitable for c...
Mar. 23, 2017 03:45 PM EDT Reads: 1,669
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 20th International Cloud Expo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Mar. 23, 2017 03:30 PM EDT Reads: 1,624
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
Mar. 23, 2017 03:30 PM EDT Reads: 1,507
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...
Mar. 23, 2017 02:30 PM EDT Reads: 2,294
In recent years, containers have taken the world by storm. Companies of all sizes and industries have realized the massive benefits of containers, such as unprecedented mobility, higher hardware utilization, and increased flexibility and agility; however, many containers today are non-persistent. Containers without persistence miss out on many benefits, and in many cases simply pass the responsibility of persistence onto other infrastructure, adding additional complexity.
Mar. 23, 2017 02:30 PM EDT Reads: 3,996
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a ...
Mar. 23, 2017 02:30 PM EDT Reads: 1,363
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
Mar. 23, 2017 02:15 PM EDT Reads: 1,099
Pentaho has announced orchestration capabilities that streamline the entire machine learning workflow and enable teams of data scientists, engineers and analysts to train, tune, test and deploy predictive models. Pentaho’s Data Integration and analytics platform ends the ‘gridlock’ associated with machine learning by enabling smooth team collaboration, maximizing limited data science resources and putting predictive models to work on big data faster – regardless of use case, industry, or languag...
Mar. 23, 2017 01:00 PM EDT Reads: 2,039
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on ...
Mar. 23, 2017 12:45 PM EDT Reads: 1,604
Keeping pace with advancements in software delivery processes and tooling is taxing even for the most proficient organizations. Point tools, platforms, open source and the increasing adoption of private and public cloud services requires strong engineering rigor - all in the face of developer demands to use the tools of choice. As Agile has settled in as a mainstream practice, now DevOps has emerged as the next wave to improve software delivery speed and output. To make DevOps work, organization...
Mar. 23, 2017 12:30 PM EDT Reads: 1,072
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
Mar. 23, 2017 12:30 PM EDT Reads: 978
SYS-CON Events announced today that Addteq will exhibit at SYS-CON's DevOps Summit at Cloud Expo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Addteq specializes in creating innovative solutions to solve business processes through the use of DevOps automation. Addteq was founded on the firm belief that automation is essential for successful software releases. Addteq's products and services are centered around the fundamental approach of understanding the pr...
Mar. 23, 2017 12:30 PM EDT Reads: 2,666
SYS-CON Events announced today that Juniper Networks (NYSE: JNPR), an industry leader in automated, scalable and secure networks, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Juniper Networks challenges the status quo with products, solutions and services that transform the economics of networking. The company co-innovates with customers and partners to deliver automated, scalable and secure network...
Mar. 23, 2017 12:00 PM EDT Reads: 755
Peak 10 has announced that it has completed a 20,000 square foot expansion of its Cincinnati-area data center, a 6,000 square foot expansion of its data center campus in Charlotte’s University Research Park, and added a pair of seasoned executives to its leadership team. This further propels the company on its aggressive growth trajectory to meet the rising demand for flexible hybrid IT strategies and solutions across its enterprise customer base. Cincinnati is home to companies like Kroger, Pr...
Mar. 23, 2017 11:00 AM EDT Reads: 2,337
VeriStor Systems has announced that CRN has named VeriStor to its 2017 Managed Service Provider (MSP) 500 list in the Elite 150 category. This annual list recognizes North American solution providers with cutting-edge approaches to delivering managed services. Their offerings help companies navigate the complex and ever-changing landscape of IT, improve operational efficiencies, and maximize their return on IT investments. In today’s fast-paced business environments, MSPs play an important role...
Mar. 23, 2017 11:00 AM EDT Reads: 1,909
SYS-CON Events announced today that Infranics will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Since 2000, Infranics has developed SysMaster Suite, which is required for the stable and efficient management of ICT infrastructure. The ICT management solution developed and provided by Infranics continues to add intelligence to the ICT infrastructure through the IMC (Infra Management Cycle) based on mathemat...
Mar. 23, 2017 09:30 AM EDT Reads: 2,468
Have you ever noticed how some IT people seem to lead successful, rewarding, and satisfying lives and careers, while others struggle? IT author and speaker Don Crawley uncovered the five principles that successful IT people use to build satisfying lives and careers and he shares them in this fast-paced, thought-provoking webinar. You'll learn the importance of striking a balance with technical skills and people skills, challenge your pre-existing ideas about IT customer service, and gain new in...
Mar. 23, 2017 08:00 AM EDT Reads: 2,051