The project we’re talking about in this article may not be as popular or as well-known as others in our Top CNCF Projects to Look Out For in 2023 list, but chances are that you’re already using it if you have a cloud-native architecture, with microservices communicating amongst themselves and with the client. Meet Envoy, the proxy designed for the cloud.
What is Envoy?
Envoy is an L7 proxy and communication bus designed for large modern service-oriented architectures. It was created by Lyft, in response to their need to manage and supervise their own network when they were migrating from a monolithic paradigm to a service-based one. In this process they found several challenges:
They were using several languages and frameworks, so communication between services required a lot of work. Implementation of distributed systems best practices was partial at best since the complexity was just too much to handle.
Whenever there was a service failure, or the system was experiencing unusual latency, debugging was a nightmare.
This scenario made the dev team distrust the service architecture, so while they did develop their features as services, they would end up developing them again in the monolith.
They started the project to solve these conflicts with a clear idea in mind:
“The network should be transparent to applications. When network and application problems do occur it should be easy to determine the source of the problem.”(envoyproxy documentation)
This is the base on which Envoy is built upon. If (when) something fails in your network, your system needs to tell you what is going on so that you can dedicate your time not to finding the problem, but to doing something about it. The goal is to abstract as much as possible, and when it can’t be abstracted, to help people figure out the source of the problem so they can fix it quickly.
This project is agnostic: it was written in C++, but it can work with any language, and it is commonly used with Kubernetes, but it doesn’t have to be (it can work with any service-oriented architecture).
Service mesh
Before going any further, let’s do a quick review of what a service mesh is and how it can help us
CNCF Envoy — service mesh image Copyright @mattklein123
In this system, the different clusters of applications co-locate a sidecar proxy next to every service instance, running on localhost. When a service wants to talk to another one (say, service B wants to talk with service C), instead of communicating directly with it, it communicates with its own sidecar proxy, which is a localhost communication. The proxy then uses its network capabilities (service discovery, load balancing rate limiting, circuit breaking… to send that request over to the sidecar proxy on service C, and then that proxy will deliver the request to service C.
This architecture abstracts the network from the services, which means that the developer doesn’t need to be aware of the networking topology or of the tools that are needed to manage the service requests. For the dev team, the service only works with localhost messaging, and the service mesh deals with the rest of the complexity.
The ideal sidecar proxy, then, is self-contained, is able to run alongside any app language, and has a bootstrap configuration that we only have to create once (no matter how many different frameworks you use). We would then run the proxy, let it connect to the rest of the proxies, and then start using it.
Napptive enables developer self-service. We encourage you to try our playground and experience accelerated cloud-native development. It’s completely free*, all you need to do is simply [sign up](playground.napptive.dev/campaign_source=blo..) and get started!*
What does Envoy provide?
As you may have guessed by now, that is exactly how Envoy works. Envoy implements this idea with an out-of-process architecture (it is a self-contained process that runs alongside each application server, not a part of the app service itself) with several features that bring the project to the spotlight when developing for the cloud.
Observability
One of the most valuable aspects of Envoy is the importance they assign to observability. Their core belief highlighted the necessity of a good observability system, and they deliver one of the most robust systems we could find. Envoy counts on Prometheus on its functionality list, so there’s no need to even install an extra plugin to use it. Having all traffic transit through Envoy provides a single place to:
produce consistent statistics for every hop.
create and propagate a stable request ID and tracing context
consistent logging
distributed tracing
Each layer of abstraction in your architecture makes it more difficult to find where the failure is occurring. This observability system allows the organization to provide developers with a very powerful default dashboard, with panels with a huge amount of consistent information, which brings down the cognitive load put on developers.
Cloud-native origins
Another reason for Envoy’s success is that it’s been built from the ground up for a cloud-native configuration. Other servers have to be configured manually before connecting them to your network, and if your architecture is a bit complex (different frameworks, different languages, different servers for the edge, the services, and the middle proxies), this setup is all but trivial. With Envoy, you can have a single bootstrap configuration for all your Envoy servers, no matter the language of the application they are being sidecar proxy to, and then when they are up they will ask the management server for their configuration. Also, Envoy servers don’t need to be rebooted to load new configurations, so they can still be on and routing packages even in that situation.
Service, middle, and edge proxy
Envoy’s versatility is patent in the roles it can take in a network. Historically, a cloud network used NGINX proxies at the edge and HAProxy in the middle. They are two very different and very complex architectures, and the operational burden of understanding them both and making them communicate fluently can be too much to handle.
Envoy can work as a service proxy, as we have seen, but also as a middle and even as an edge proxy. This simplifies the network enormously, since running the same software on the edge and the middle allows for that consistency in observability we were looking for, and also network maintenance doesn’t require the management of two different frameworks that speak to each other.
It is also a very extensible system (there is a growing community around it, and developers can find there not only powerful extensions they can use, but also the support they need to create their own), which is perfect for tailoring it to your network’s needs.
If you want to know more about what Envoy can do for you, you can find a full recollection of features in the project documentation, where the Envoy team explains them in depth.
Where can I see Envoy in action?
Envoy as-is is hard to see around. There are some case studies on the CNCF site, but it’s not a common occurrence.
What is a common occurrence is an Envoy+Kubernetes system. You can find some examples in technologies such as:
Most of them are also CNCF projects, and all of them use Envoy under the hood, creating a configuration layer on top of it. This is why this project may have been a stranger to you even though you have been using it all this time.
Envoy is a fantastic, versatile, open-source option to consider when developing your own service network. And it does an excellent job, even when it goes unnoticed.
In case you have not yet tried Napptive, we encourage you to sign up for free and discover how we are helping propel the development of cloud-native apps.