|
NAME | SYNOPSIS | DESCRIPTION | RETURN VALUE | ERRORS | EXAMPLES | NOTES | SEE ALSO | COLOPHON |
|
|
|
io_uring_re..._bpf_filter(3) liburing Manual io_uring_re..._bpf_filter(3)
io_uring_register_bpf_filter, io_uring_register_bpf_filter_task -
register classic BPF filters for io_uring operations
#include <liburing.h>
#include <liburing/io_uring/bpf_filter.h>
#include <linux/filter.h>
int io_uring_register_bpf_filter(struct io_uring *ring,
struct io_uring_bpf *bpf);
int io_uring_register_bpf_filter_task(struct io_uring_bpf *bpf);
These functions register classic BPF (cBPF) filters to restrict
io_uring operations. Filters can be used to implement security
policies by allowing or denying specific operations based on their
parameters.
io_uring_register_bpf_filter(3) registers a filter on a specific
ring. The filter only applies to operations submitted through
that ring.
io_uring_register_bpf_filter_task(3) registers a filter on the
calling task. The filter applies to all io_uring rings created by
the task after the filter is registered, and is inherited by child
processes created via fork(2). Rings that were created before the
filter was registered are not affected. Task-level filters cannot
be removed and child processes cannot loosen restrictions set by
their parent.
The bpf argument is a pointer to a struct io_uring_bpf with
cmd_type set to IO_URING_BPF_CMD_FILTER. The embedded struct
io_uring_bpf_filter describes the filter to register:
struct io_uring_bpf_filter {
__u32 opcode; /* io_uring opcode to filter */
__u32 flags; /* IO_URING_BPF_FILTER_* */
__u32 filter_len; /* number of BPF instructions */
__u8 pdu_size; /* expected pdu size for opcode */
__u8 resv[3];
__u64 filter_ptr; /* pointer to BPF filter */
__u64 resv2[5];
};
opcode specifies which io_uring operation the filter applies to
(e.g., IORING_OP_SOCKET, IORING_OP_NOP, IORING_OP_READ).
filter_ptr points to an array of filter_len BPF instructions
(struct sock_filter). The filter is executed for each matching
operation and must return non-zero to allow the operation or zero
to deny it (resulting in -EACCES being returned to the
application).
pdu_size specifies the expected size in bytes of the operation-
specific payload data for the given opcode (e.g., the socket or
open structs inside struct io_uring_bpf_ctx). For opcodes that
have no extra payload, this should be zero. For IORING_OP_SOCKET
this would be 12 (three 4-byte members), and for IORING_OP_OPENAT
and IORING_OP_OPENAT2 this would be 24 (three 8-byte members).
If the application's pdu_size matches the kernel's expected size
for the opcode, registration succeeds. If the sizes differ, the
behavior depends on whether IO_URING_BPF_FILTER_SZ_STRICT is set
in flags:
• If IO_URING_BPF_FILTER_SZ_STRICT is set, registration
fails with -EMSGSIZE if the sizes differ.
• If IO_URING_BPF_FILTER_SZ_STRICT is not set, registration
is allowed if the application's pdu_size is smaller than
the kernel's. This permits older applications that were
compiled against a smaller payload to still load filters,
as the kernel can safely evaluate the filter on the
subset of data the application expects.
• Regardless of IO_URING_BPF_FILTER_SZ_STRICT, registration
always fails with -EMSGSIZE if the application's pdu_size
is larger than the kernel's, since the kernel cannot
provide data that it does not support.
On an -EMSGSIZE failure, the kernel writes back the kernel's
expected pdu_size into the struct io_uring_bpf_filter passed by
the application. This allows the application to discover the
kernel's expected payload size and adjust or retry accordingly.
flags can be zero or a bitwise OR of the following:
IO_URING_BPF_FILTER_DENY_REST
When set, any opcode that does not have a filter registered
will be denied. This allows creating an allowlist of
permitted operations.
IO_URING_BPF_FILTER_SZ_STRICT
When set, registration of a filter will fail with -EMSGSIZE
if the application's pdu_size does not exactly match the
kernel's expected payload size for the opcode. Without
this flag, the kernel permits filters where the
application's pdu_size is smaller than or equal to the
kernel's.
Filter Context
The BPF filter receives a context structure that can be inspected
using BPF_LD instructions with absolute addressing. The context
layout is:
struct io_uring_bpf_ctx {
__u64 user_data; /* offset 0: user_data from SQE */
__u8 opcode; /* offset 8: io_uring opcode */
__u8 sqe_flags; /* offset 9: SQE flags */
__u8 pdu_size; /* offset 10: aux data size for filter */
__u8 pad[5]; /* offset 11-15: padding */
union {
struct {
__u32 family; /* offset 16: socket family */
__u32 type; /* offset 20: socket type */
__u32 protocol; /* offset 24: socket protocol */
} socket;
struct {
__u64 flags; /* offset 16: open flags */
__u64 mode; /* offset 24: file mode */
__u64 resolve; /* offset 32: resolve flags */
} open;
};
};
The pdu_size field indicates the size in bytes of the operation-
specific data passed in the union. A filter can check this value
to verify it is receiving the expected payload. This is useful for
forward compatibility: if a future kernel adds new members to an
operation's context, the filter can inspect pdu_size to determine
whether those fields are present.
For IORING_OP_SOCKET operations, the socket family, type, and
protocol fields are populated and can be used to filter based on
socket parameters. pdu_size is set to 12 (three 4-byte members).
For IORING_OP_OPENAT and IORING_OP_OPENAT2 operations, the open
flags, mode, and resolve fields are populated. The flags field
contains the open flags (e.g., O_RDONLY, O_CREAT). The resolve
field is only meaningful for IORING_OP_OPENAT2 and contains
resolve flags (e.g., RESOLVE_IN_ROOT). pdu_size is set to 24
(three 8-byte members).
Filter Stacking
Multiple filters can be registered for the same opcode. When
multiple filters exist, they are evaluated in order and all must
return non-zero for the operation to be allowed. For task-level
filters, the child's filters are evaluated before the parent's
filters.
On success, these functions return 0. On failure, they return a
negative error code.
-EINVAL
Invalid filter, opcode, or flags specified.
-EMSGSIZE
The application's pdu_size does not match the kernel's
expected payload size for the opcode. This occurs when
IO_URING_BPF_FILTER_SZ_STRICT is set and the sizes differ,
or when the application's pdu_size is larger than the
kernel's regardless of flags.
-ENOMEM
Insufficient memory to register the filter.
-EFAULT
The filter pointer is invalid.
-EACCES
The caller does not have the CAP_SYS_ADMIN capability and
the no_new_privs attribute is not set on the calling task.
See prctl(2) with PR_SET_NO_NEW_PRIVS.
Deny all NOP operations
#include <sys/prctl.h>
#include <linux/filter.h>
#include <liburing.h>
#include <liburing/io_uring/bpf_filter.h>
struct sock_filter deny_filter[] = {
BPF_STMT(BPF_RET | BPF_K, 0), /* return 0 (deny) */
};
struct io_uring_bpf bpf = {
.cmd_type = IO_URING_BPF_CMD_FILTER,
.filter = {
.opcode = IORING_OP_NOP,
.filter_len = 1,
.filter_ptr = (unsigned long) deny_filter,
},
};
/* Must set no_new_privs before registering task filters */
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
/* Register on a specific ring */
io_uring_register_bpf_filter(&ring, &bpf);
/* Or register on the task */
io_uring_register_bpf_filter_task(&bpf);
Allow only AF_INET sockets
#include <sys/prctl.h>
#include <linux/filter.h>
#include <sys/socket.h>
#include <liburing.h>
#include <liburing/io_uring/bpf_filter.h>
#define CTX_OFF_SOCKET_FAMILY 16
struct sock_filter inet_only_filter[] = {
/* Load socket family from context */
BPF_STMT(BPF_LD | BPF_W | BPF_ABS, CTX_OFF_SOCKET_FAMILY),
/* If family == AF_INET, jump to allow */
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, AF_INET, 0, 1),
/* Allow: return 1 */
BPF_STMT(BPF_RET | BPF_K, 1),
/* Deny: return 0 */
BPF_STMT(BPF_RET | BPF_K, 0),
};
struct io_uring_bpf bpf = {
.cmd_type = IO_URING_BPF_CMD_FILTER,
.filter = {
.opcode = IORING_OP_SOCKET,
.filter_len = 4,
.filter_ptr = (unsigned long) inet_only_filter,
.pdu_size = 12, /* 3x __u32: family, type, protocol */
},
};
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
io_uring_register_bpf_filter_task(&bpf);
Allow only NOP, deny everything else
struct sock_filter allow_filter[] = {
BPF_STMT(BPF_RET | BPF_K, 1), /* return 1 (allow) */
};
struct io_uring_bpf bpf = {
.cmd_type = IO_URING_BPF_CMD_FILTER,
.filter = {
.opcode = IORING_OP_NOP,
.flags = IO_URING_BPF_FILTER_DENY_REST,
.filter_len = 1,
.filter_ptr = (unsigned long) allow_filter,
},
};
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
io_uring_register_bpf_filter_task(&bpf);
Discover kernel pdu_size for an opcode
This example demonstrates how to use the -EMSGSIZE write-back to
discover the kernel's expected payload size.
struct sock_filter allow[] = {
BPF_STMT(BPF_RET | BPF_K, 1),
};
struct io_uring_bpf bpf = {
.cmd_type = IO_URING_BPF_CMD_FILTER,
.filter = {
.opcode = IORING_OP_SOCKET,
.flags = IO_URING_BPF_FILTER_SZ_STRICT,
.filter_len = 1,
.filter_ptr = (unsigned long) allow,
.pdu_size = 0, /* intentionally wrong */
},
};
int ret;
ret = io_uring_register_bpf_filter(&ring, &bpf);
if (ret == -EMSGSIZE) {
/* kernel wrote back expected size */
printf("kernel pdu_size for SOCKET: %u\n",
bpf.filter.pdu_size);
/* retry with correct size */
ret = io_uring_register_bpf_filter(&ring, &bpf);
}
Privilege Requirements
Similar to seccomp(2), registering BPF filters requires either the
CAP_SYS_ADMIN capability or the no_new_privs attribute to be set
on the calling task. This prevents an unprivileged process from
installing a filter and then executing a setuid binary, which
would run with elevated privileges but under the attacker-
controlled filter.
To set the no_new_privs attribute, call:
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
Once set, no_new_privs cannot be unset and is inherited by child
processes across fork(2) and preserved across execve(2).
Inheritance
Task-level filters registered with
io_uring_register_bpf_filter_task(3) are inherited by child
processes. This allows a parent process to establish security
restrictions that apply to all descendants. Children can add
additional restrictions but cannot remove or weaken filters set by
their ancestors.
Ring-level filters registered with io_uring_register_bpf_filter(3)
only apply to that specific ring and are not inherited.
io_uring_register(2), io_uring_setup(2), bpf(2), seccomp(2)
This page is part of the liburing (A library for io_uring)
project. Information about the project can be found at
⟨https://github.com/axboe/liburing⟩. If you have a bug report for
this manual page, send it to io-uring@vger.kernel.org. This page
was obtained from the project's upstream Git repository
⟨https://github.com/axboe/liburing⟩ on 2026-05-24. (At that time,
the date of the most recent commit that was found in the
repository was 2026-05-18.) If you discover any rendering
problems in this HTML version of the page, or you believe there is
a better or more up-to-date source for the page, or you have
corrections or improvements to the information in this COLOPHON
(which is not part of the original manual page), send a mail to
man-pages@man7.org
liburing-2.14 January 18, 2026io_uring_re..._bpf_filter(3)
Pages that refer to this page: io_uring_register_bpf_filter(3), io_uring_register_bpf_filter_task(3)