eBPF Doable and Undoable
eBPF is a powerful tool for Linux kernel tracing and observability. Compared with kernel module, eBPF is more flexible and easier to develop and deploy. However, eBPF lacks many key features that kernel module has, and cannot be used as the replacement for kernel module framework.
Doable
One of the most important features of eBPF is tracing, including tracepoints and kprobe for kernel functions, and userspace tracepoints and uprobes for user-space functions. The measurement recorded in the eBPF program can be exposed to userspace driver programs using channels like BPF maps.
kernel-space tracepoints
Linux provides a set of tracepoints to trace kernel functions, e.g., sys_enter_execve
and sys_exit_execve
.
SEC("tracepoint/syscalls/sys_enter_execve")
int tracepoint_sys_enter_execve(struct sys_enter_execve_args *ctx) {
char comm[TASK_COMM_LEN] = {0};
bpf_get_current_comm(comm, sizeof(comm));
bpf_printk("tracepoint/sys_enter_execve: comm=%s (%s)", comm,
ctx->filename);
return 0;
}
The full list of tracepoints can be found by inspecting /sys/kernel/debug/tracing/events
.
Kernel-space probes
Tracepoints requires being exported from the kernel source code. To observe arbitrary kernel functions, eBPF provides kprobes. The kprobe can be attached to any kernel functions and will be triggered when the function is called.
SEC("kprobe/__x64_sys_execve")
int kprobe_sys_execve(struct pt_regs *ctx) {
char comm[TASK_COMM_LEN] = {0};
bpf_get_current_comm(comm, sizeof(comm));
bpf_printk("kprobe/sys_execve: comm=%s (%s)", comm);
return 0;
}
Compared with predefined tracepoints, kprobes passes the kernel function
arguments in a struct pt_regs *
struct and requires extra work to extract
the arguments using bpf_probe_read
as it requires accessing kernel memory.
const char *filename = NULL;
const char *const *argv = NULL;
const char *const *envp = NULL;
bpf_probe_read(&filename, sizeof(filename) /* read the pointer */,
&PT_REGS_PARM1(__ctx));
bpf_probe_read(&argv, sizeof(argv) /* read the pointer */,
&PT_REGS_PARM2(__ctx));
bpf_probe_read(&envp, sizeof(envp) /* read the pointer */,
&PT_REGS_PARM3(__ctx));
In the example above, pointers in argv
and envp
still in the kernel
memory and requires bpf_probe_read
to access them.
for (int i = 0; i < 128; i++) {
const char *arg = NULL;
bpf_probe_read(&arg, sizeof(arg) /* read the pointer */, &argv[i]);
if (!arg) {
break;
}
bpf_printk("kprobe/sys_execve: argv[%d] = %s", i, arg);
}
Linux 4.17 added a configuration entry CONFIG_ARCH_HAS_SYSCALL_WRAPPER
and
defaults to y
for x86_64, adding an extra indirection to syscall function
arguments,
struct pt_regs *__ctx = (struct pt_regs *) PT_REGS_PARM1(ctx);
See also the effort to automatically generate the function argument extraction code in the BCC project.
Userspace tracepoints
Userspace tracepoints are similar to kernel-space tracepoints, but they are
defined in userspace and can be triggered by userspace programs. The userspace
tracepoints can be listed using the tplist.py
tools in BCC:
$ ~/bcc/tools/tplist.py -l /lib/x86_64-linux-gnu/libc.so.6
/lib/x86_64-linux-gnu/libc.so.6 libc:setjmp
/lib/x86_64-linux-gnu/libc.so.6 libc:longjmp
/lib/x86_64-linux-gnu/libc.so.6 libc:longjmp_target
...
The userspace tracepoints can be hooked by eBPF programs like kernel-space tracepoints.
SEC("usdt/libc.so.6:libc:mutex_acquired")
int usdt_libc_mutex_acquired(struct pt_regs *ctx) {
bpf_printk("usdt/libc:mutex_acquired: process=%d",
bpf_get_current_pid_tgid());
return 0;
}
SEC("usdt/libc.so.6:libc:mutex_release")
int usdt_libc_mutex_released(struct pt_regs *ctx) {
bpf_printk("usdt/libc:mutex_release: process=%d",
bpf_get_current_pid_tgid());
return 0;
}
The program above will react to events that mutex_acquired
and mutex_release
been called.
Userspace probes
Like kprobe, userspace probes can be attached to any userspace functions and will be triggered when the function is called.
SEC("uprobe//proc/self/exe:random_gen")
int uprobe_random_gen(struct pt_regs *ctx) {
int argument = (int) PT_REGS_PARM1(ctx);
bpf_printk("uprobe/random_gen: argument=%d", argument);
return 0;
}
SEC("uretprobe//proc/self/exe:random_gen")
int uretprobe_random_gen(struct pt_regs *ctx) {
bpf_printk("uretprobe/random_gen: output=%d", PT_REGS_RC(ctx));
return 0;
}
Note that target functions cannot be inlined and is a mangled name in the
uprobe/uretprobe
entry, e.g.,
extern "C" {
int __attribute__((noinline)) random_gen(int argument) {
std::srand(argument);
return std::rand();
}
}
Undoable
eBPF provides extensible, convenient, and powerful tools to trace kernel and
userspace programs. However, as eBPF is executed in a virtual machine, the
kernel space functions that eBPF program can access is fairly limited. The
make the kernel function accessible to eBPF programs, the function needs to
be encoded in the eBPF VM or exported using kfuncs
.
Exported helper functions
The Linux kernel encoded supported kernel function calls in the VM and exposed
in include/uapi/linux/bpf.h
. For eBPF developers, there’s a
bpf_helper_defs.h
header that exposed those help function prototypes in C,
generated by the bpf_doc.py
script in Linux Kernel.
BFP Kernel Functions (kfuncs)
Besides predefined functions, there’s a new mechanism called
“BFP Kernel Function (kfuncs)” that allows exposed any kernel functions to eBPF
programs. The kfuncs
is exposed by the macro BTF_ID_FLAGS
, and can be
first declared as __ksym
before being used in eBPF programs.
int bpf_kfunc_call_test2(struct sock *sk, __u32 a, __u32 b) __ksym;
SEC("classifier")
int kprobe_drop_cache_impl(struct __sk_buff *skb) {
struct bpf_sock *sk = skb->sk;
// ....
return bpf_kfunc_call_test2((struct sock *)sk, 1, 2);
}
At the time of writing, the exported function set is still very limited. It is impossible to access arbitrary kernel functions and arbitrary kernel data structures from eBPF programs so for complex tasks beyond simple tracing, eBPF is not a replacement for kernel module for extending the kernel’s capability safely and efficiently.
Thinking about the future
Algothough there are still many limitations when programming eBPF programs in Linux, eBPF must be the trending of exposing Linux Kernel to programmers who want to observe, hook, and even hack into the kernel space. At the KubeCon China, when talking with people from the Cilium project, we both agree that more and more kernel functionalities will be made available in eBPF in the future.