Building an eBPF-Based Profiler

Recently, the team has been working hard to release an eBPF-based profiler for a number of different languages. Continuous profiling is an important practice because it can help engineering teams spot performance bottlenecks and troubleshoot issues faster.

In building our eBPF-based profiler, we learned a number of new techniques that might be interesting to people who want to implement something similar or who simply want to get started writing eBPF-based programs.

In this post, we’ll recap our methodology and process for building our profiler. The techniques used to profile differ based on the language you’re targeting; this post has a section on techniques for compiled languages and a section on techniques for interpreted languages.

Techniques for compiled languages

The process for compiled languages is relatively straightforward and uses some already well-known Linux systems. This section highlights the process we used to support compiled languages.

Using the linux perf_event subsystem, we can attach a software event with the cpu_clock config:

bash

After attaching our function to the perf event, we can grab the stack and generate a unique stack ID for counting via the `bpf_get_stackid` helper:

bash

We save the stack trace to a stack trace BPFmap:

bash

And then we use the stack ID to increment a counter for that stack:

bash

Lastly, in user space we utilize a technique called stack walking to generate the symbols for the stack, and then convert to folded format for further analysis.

Stack walking examples

Loop through all of the stacks:

bash

Loop through all of addresses within a given trace:

bash

Techniques for interpreted languages

Things are more interesting with interpreted languages where symbol resolution isn’t as easy. Existing profiling solutions for interpreted or JIT languages usually require that the language generate a perf-map that correlates symbol addresses to their human readable names or, in some cases, that it read from the process memory, directly mapping addresses to language-specific structs that differ based on version.

Using eBPF, we can take another approach by using language specific USDT probes. USDTs are a low-overhead (sometimes) way of deriving specific insights from the application you’re instrumenting. The code below shows how you can leverage USDTs for specific languages to build out a complete stack.

The first step is to add the probe to the language runtime you’d like to instrument. In the examples below, I’m using libbpf to add the probes and Ruby as the desired language:

bash

After hooking the method entry, we now need to build the stack frame and add it to our maps:

bash

Then we push the stack to user space for further analysis:

bash

When the method returns, we pop it off the stack:

bash

Once we have the information in user space, we iterate over the stacks to generate counts and convert them into folded format for further analysis.

Limitations

The performance overhead of the function entry and exit probes is, as expected, relatively poor. Without further modifications, the code above can cause significant drags on your application.

If you're looking to build a dashboard to monitor your applications, then Airplane is a good fit for you. You can transform scripts, queries, APIs, and more into powerful internal workflows and UIs using Airplane. Use Airplane to build admin panels, host scheduled operations, share scripts, and more.

To try it out, sign up for a free account or book a demo.

Building an eBPF-based profiler

Techniques for compiled languages

Stack walking examples

Techniques for interpreted languages

Limitations

How to use NGINX Prometheus exporter

Collecting logs from AWS Fargate