regex(3) — Linux manual page


regex(3)                Library Functions Manual                regex(3)

NAME         top

       regcomp, regexec, regerror, regfree - POSIX regex functions

LIBRARY         top

       Standard C library (libc, -lc)

SYNOPSIS         top

       #include <regex.h>

       int regcomp(regex_t *restrict preg, const char *restrict regex,
                   int cflags);
       int regexec(const regex_t *restrict preg, const char *restrict string,
                   size_t nmatch, regmatch_t pmatch[_Nullable restrict .nmatch],
                   int eflags);

       size_t regerror(int errcode, const regex_t *_Nullable restrict preg,
                   char errbuf[_Nullable restrict .errbuf_size],
                   size_t errbuf_size);
       void regfree(regex_t *preg);

       typedef struct {
           size_t    re_nsub;
       } regex_t;

       typedef struct {
           regoff_t  rm_so;
           regoff_t  rm_eo;
       } regmatch_t;

       typedef /* ... */  regoff_t;

DESCRIPTION         top

       regcomp() is used to compile a regular expression into a form
       that is suitable for subsequent regexec() searches.

       On success, the pattern buffer at *preg is initialized.  regex is
       a null-terminated string.  The locale must be the same when
       running regexec().

       After regcomp() succeeds, preg->re_nsub holds the number of
       subexpressions in regex.  Thus, a value of preg->re_nsub + 1
       passed as nmatch to regexec() is sufficient to capture all

       cflags is the bitwise OR of zero or more of the following:

              Use POSIX Extended Regular Expression syntax when
              interpreting regex.  If not set, POSIX Basic Regular
              Expression syntax is used.

              Do not differentiate case.  Subsequent regexec() searches
              using this pattern buffer will be case insensitive.

              Report only overall success.  regexec() will use only
              pmatch for REG_STARTEND, ignoring nmatch.

              Match-any-character operators don't match a newline.

              A nonmatching list ([^...]) not containing a newline does
              not match a newline.

              Match-beginning-of-line operator (^) matches the empty
              string immediately after a newline, regardless of whether
              eflags, the execution flags of regexec(), contains

              Match-end-of-line operator ($) matches the empty string
              immediately before a newline, regardless of whether eflags
              contains REG_NOTEOL.

       regexec() is used to match a null-terminated string against the
       compiled pattern buffer in *preg, which must have been
       initialised with regexec().  eflags is the bitwise OR of zero or
       more of the following flags:

              The match-beginning-of-line operator always fails to match
              (but see the compilation flag REG_NEWLINE above).  This
              flag may be used when different portions of a string are
              passed to regexec() and the beginning of the string should
              not be interpreted as the beginning of the line.

              The match-end-of-line operator always fails to match (but
              see the compilation flag REG_NEWLINE above).

              Match [string + pmatch[0].rm_so, string + pmatch[0].rm_eo)
              instead of [string, string + strlen(string)).  This allows
              matching embedded NUL bytes and avoids a strlen(3) on
              known-length strings.  If any matches are returned
              (REG_NOSUB wasn't passed to regcomp(), the match
              succeeded, and nmatch > 0), they overwrite pmatch as
              usual, and the match offsets remain relative to string
              (not string + pmatch[0].rm_so).  This flag is a BSD
              extension, not present in POSIX.

   Match offsets
       Unless REG_NOSUB was passed to regcomp(), it is possible to
       obtain the locations of matches within string: regexec() fills
       nmatch elements of pmatch with results: pmatch[0] corresponds to
       the entire match, pmatch[1] to the first subexpression, etc.  If
       there were more matches than nmatch, they are discarded; if
       fewer, unused elements of pmatch are filled with -1s.

       Each returned valid (non--1) match corresponds to the range
       [string + rm_so, string + rm_eo).

       regoff_t is a signed integer type capable of storing the largest
       value that can be stored in either an ptrdiff_t type or a ssize_t

   Error reporting
       regerror() is used to turn the error codes that can be returned
       by both regcomp() and regexec() into error message strings.

       If preg isn't a null pointer, errcode must be the latest error
       returned from an operation on preg.

       If errbuf_size isn't 0, up to errbuf_size bytes are copied to
       errbuf; the error string is always null-terminated, and truncated
       to fit.

       regfree() deinitializes the pattern buffer at *preg, freeing any
       associated memory; *preg must have been initialized via

RETURN VALUE         top

       regcomp() returns zero for a successful compilation or an error
       code for failure.

       regexec() returns zero for a successful match or REG_NOMATCH for

       regerror() returns the size of the buffer required to hold the

ERRORS         top

       The following errors can be returned by regcomp():

              Invalid use of back reference operator.

              Invalid use of pattern operators such as group or list.

              Invalid use of repetition operators such as using '*' as
              the first character.

              Un-matched brace interval operators.

              Un-matched bracket list operators.

              Invalid collating element.

              Unknown character class name.

              Nonspecific error.  This is not defined by POSIX.

              Trailing backslash.

              Un-matched parenthesis group operators.

              Invalid use of the range operator; for example, the ending
              point of the range occurs prior to the starting point.

              Compiled regular expression requires a pattern buffer
              larger than 64 kB.  This is not defined by POSIX.

              The regex routines ran out of memory.

              Invalid back reference to a subexpression.

ATTRIBUTES         top

       For an explanation of the terms used in this section, see
       │ Interface                    Attribute     Value          │
       │ regcomp(), regexec()         │ Thread safety │ MT-Safe locale │
       │ regerror()                   │ Thread safety │ MT-Safe env    │
       │ regfree()                    │ Thread safety │ MT-Safe        │

STANDARDS         top


HISTORY         top


       Prior to POSIX.1-2008, regoff_t was required to be capable of
       storing the largest value that can be stored in either an off_t
       type or a ssize_t type.

CAVEATS         top

       re_nsub is only required to be initialized if REG_NOSUB wasn't
       specified, but all known implementations initialize it

       Both regex_t and regmatch_t may (and do) have more members, in
       any order.  Always reference them by name.

EXAMPLES         top

       #include <stdint.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <regex.h>

       #define ARRAY_SIZE(arr) (sizeof((arr)) / sizeof((arr)[0]))

       static const char *const str =
               "1) John Driverhacker;\n2) John Doe;\n3) John Foo;\n";
       static const char *const re = "John.*o";

       int main(void)
           static const char *s = str;
           regex_t     regex;
           regmatch_t  pmatch[1];
           regoff_t    off, len;

           if (regcomp(&regex, re, REG_NEWLINE))

           printf("String = \"%s\"\n", str);

           for (unsigned int i = 0; ; i++) {
               if (regexec(&regex, s, ARRAY_SIZE(pmatch), pmatch, 0))

               off = pmatch[0].rm_so + (s - str);
               len = pmatch[0].rm_eo - pmatch[0].rm_so;
               printf("#%zu:\n", i);
               printf("offset = %jd; length = %jd\n", (intmax_t) off,
                       (intmax_t) len);
               printf("substring = \"%.*s\"\n", len, s + pmatch[0].rm_so);

               s += pmatch[0].rm_eo;


SEE ALSO         top

       grep(1), regex(7)

       The glibc manual section, Regular Expressions

COLOPHON         top

       This page is part of the man-pages (Linux kernel and C library
       user-space interface documentation) project.  Information about
       the project can be found at 
       ⟨⟩.  If you have a bug report
       for this manual page, see
       This page was obtained from the tarball man-pages-6.9.1.tar.gz
       fetched from
       ⟨⟩ on
       2024-06-26.  If you discover any rendering problems in this HTML
       version of the page, or you believe there is a better or more up-
       to-date source for the page, or you have corrections or
       improvements to the information in this COLOPHON (which is not
       part of the original manual page), send a mail to

Linux man-pages 6.9.1          2024-06-15                       regex(3)

Pages that refer to this page: bash(1)killall(1)pmdamailq(1)pmdaweblog(1)pmie(1)pmlogrewrite(1)pmval(1)trace-cmd-list(1)trace-cmd-report(1)ausearch_add_regex(3)nl_langinfo(3)pmregisterderived(3)re_comp(3)rpmatch(3)sysconf(3)tracefs_event_systems(3)regex(7)