As
The Linux Programming Interface
went to press in August 2010, it was up to date with the then current
versions of the Linux kernel (2.6.35) and glibc (2.12).
Because the developers of both the Linux kernel and glibc
are committed to maintaining
ABI
compatibility,
virtually all of the details provided in TLPI should
remain accurate in the future.
However, new features are added to the kernel and glibc
with each release.
As each new release of the Linux kernel and glibc occurs,
this page will attempt to note new interface features that are
relevant to the subject area of the book.
A new
prctl() operation,
PR_SET_MM,
intended for use by the checkpoint/restart facility,
allows text, data, and heap sizes to be set
to the values in effect at checkpoint time
when a process is restored.
The caller must have the
CAP_SYS_ADMIN
capability.
This operation is only supported if the kernel is configured with the
CONFIG_CHECKPOINT_RESTORE
option.
Two changes related to the /proc
filesystem, as reported on
LWN.net:
A new
/proc/PID/map_files
directory contains symbolic links
describing the file mappings of the process identified by PID.
New mount options for the
/proc
file system can be used to control the visibility of the
/proc/PID
directories.
Linux 3.2 (4 Jan 2012)
Changes include the following:
The "Cross-Memory Attach" facility, which provides a mechanism
for fast interprocess communication.
Some information can be found on LWN.net
here,
here
(describes an early version of the API),
and
here.
Files under
/proc/sys
are now pollable, meaning
that applications can use
poll(),
select(),
and
epoll
to check for changes to
sysctl
parameters.
Linux 3.1 (24 Oct 2011)
Changes include the following:
Three new operations are added for the
ptrace()
system call:
PTRACE_SEIZE,
PTRACE_INTERRUPT,
and
PTRACE_LISTEN.
Some further information can be found
here.
Two new flags for the
lseek()
system call,
SEEK_HOLE
and
SEEK_DATA,
provide the ability to search for holes in sparsely allocated files.
Some further information can be found in
this LWN.net article
and the
lseek(2) man page.
A new
setns()
system call allows its caller to join the namespace
specified by its two arguments--a namespace type
(one of a subset of the
CLONE_*
constants given to
clone(2))
and a file descriptor referring to one of the files in a
/proc/PID/ns
directory.
Some further info
here,
and in the
setns(2)
man page contributed by Eric Biederman.
A new
sendmmsg()
system call provides multiple message sending facilities
(the analog of the
recvmmsg(2)
system call added in Linux 2.6.33).
The
timerfd_settime()
system call adds a
TFD_TIMER_CANCEL_ON_SET
flag.
If this flag is set for a
CLOCK_REALTIME
absolute
(TFD_TIMER_ABSTIME)
timer, then the timer is expired if the clock is reset.
Two new POSIX clocks:
CLOCK_BOOTTIME_ALARM
and
CLOCK_REALTIME_ALARM.
According to the commit message,
these clocks behave identically to
CLOCK_REALTIME
and
CLOCK_BOOTTIME,
but the
_ALARM
suffixed clocks will wake the system if it is suspended.
Some further details can be found
here.
A new
CAP_WAKE_ALARMcapability
governs the use of the
CLOCK_BOOTTIME_ALARM
and
CLOCK_REALTIME_ALARM
clocks.
The
/proc/sys/kernel/core_pattern
file adds a new specifier, %E.
This specifier is replaced by the pathname of the executable,
with slashes replaced by exclamation marks
(so that the basename of the resulting core dump filename
does not contain slashes).
Linux 2.6.39 (19 May 2011)
Changes include the following:
New
name_to_handle_at() and
open_by_handle_at()
system calls.
These system calls provide functionality that is useful for
file-system servers that run in user space.
Some details
here and
here.
A new
clock_adjtime()
system call, analogous to
adjtimex(2),
permits adjustments to POSIX clocks.
A new
syncfs()
system call, which is similar to
sync(2),
but flushes only the file system containing the file
referred to by its file-descriptor argument.
A new
O_PATH
flag is added for
open(2).
Some details
here.
O_PATH
descriptors can be obtained for symbolic links,
and can be passed via
SCM_RIGHTS
datagrams.
A new POSIX clock,
CLOCK_BOOTTIME,
is identical to
CLOCK_MONOTONIC,
but includes time that the system has been suspended.
This clock is intended for applications that want a
monotonically increasing clock and also want to be aware of
time the system has been suspended.
Some background
here.
A new
AT_EMPTY_PATH
flag allows empty relative pathnames for
linkat(2),
fchownat(2),
fstatat(2),
and
name_to_handle_at(),
in which case the calls operate on
their directory file descriptor argument.
A thread operating under the
SCHED_IDLEpolicy
is now allowed to upgrade itself to the
SCHED_BATCH
or
SCHED_OTHER
policy if its nice value falls within the range permitted by its
RLIMIT_NICE
resource limit.
Linux 2.6.38 (15 Mar 2011)
Changes include the following:
A new
AT_NO_AUTOMOUNT
flag for
fstatat(2),
which can be used to suppress automounting of the terminal
component of the pathname argument.
A new
CAP_SYSLOGcapability,
used (instead of
CAP_SYS_ADMIN)
to govern privileged
syslog(2)
operations.
A new
FALLOC_FL_PUNCH_HOLE
operation for
fallocate(2).
This operation creates a hole (see page 83 of TLPI) in the file
in the byte range indicated by the
offset
and
len
arguments.
(The file data in the specified range is lost.)
File system support is required for the
FALLOC_FL_PUNCH_HOLE
operation.
Among the file systems that support
FALLOC_FL_PUNCH_HOLE
are XFS and
(since Linux 3.0) ext4.
Btrfs is capable of supporting the operation,
and support is likely to be added in the future.
As currently implemented,
FALLOC_FL_PUNCH_HOLE
must be specified with
FALLOC_FL_KEEP_SIZE,
which means that the size of a file can't change,
even if a hole is punched at the end of the file.
New
MADV_HUGEPAGE
and
MADV_NOHUGEPAGE
flags for
madvise(2).
These flags enable and disable an attribute on the memory region
that indicates that it is important that the region be backed by
huge pages, when this is possible.
Further information on this feature can be found
in the Kernel source file
Documentation/vm/transhuge.txt
as well as
here
and
here.
Linux 2.6.37 (5 Jan 2011)
fanotify_init() and fanotify_mark() system calls
The
fanotify_init()
and
fanotify_mark()
system calls
are designed for use in virus-scanning tools,
but may also serve other more general uses.
These two system calls
provide functionality that is in some ways similar to
inotify(7).
Note however that the
fanotify
interface is not a superset of
inotify.
(The existence of two APIs with heavily overlapping functionality,
rather than a new API that is a superset of the earlier API,
is unfortunate.)
These two system calls were added in Linux 2.6.36,
but disabled while concerns about the API were resolved.
In Linux 2.6.37, the system calls have been enabled.
The
prlimit()
system call is an enhancement of
setrlimit()
and
getrlimit().
It allows the caller to both set and retrieve its own resource limits
(including retrieving the old limit at the same time as a new limit is set),
and (with suitable permissions) perform the same task for other processes.
This system call does not suffer
this kernel bug,
which affects
getrlimit()/setrlimit().
(See pages 759 and 760 of the book.)
Indeed, it could eventually be used to provide
glibc wrappers for
setrlimit()
and
getrlimit()
that work around the kernel bug.
I've added documentation of this system call to the
getrlimit(2)
man page,
starting with man-pages-3.31.
inotifyIN_EXCL_UNLINK flag
The
inotifyIN_EXCL_UNLINK
flag prevents children of a watched directory
from generating events for a directory after they have been
unlinked from that directory.
I've added documentation of this flag to the
inotify(7)
man page,
starting with man-pages-3.31.
glibc API changes
glibc 2.16 (not yet released)
The glibc header files now handle the
_ISOC11_SOURCE
feature test macro,
as a mechanism for exposing declarations conforming to the
C11
standard.
glibc 2.15 (tagged 23 Dec 2011)
A new
scandirat()
function, which is to
scandir()
as
openat(2)
is to
open().
glibc 2.14 (tagged 31 May 2011)
No API changes (other than simple
wrappers for recently added Linux system calls).
glibc 2.13 (tagged 17 Jan 2011)
No API changes (other than simple
wrappers for recently added Linux system calls).