160 lines
6.0 KiB
Plaintext
160 lines
6.0 KiB
Plaintext
NAME
|
|
pcreposix - POSIX API for Perl-compatible regular expres-
|
|
sions.
|
|
|
|
|
|
|
|
SYNOPSIS
|
|
#include <pcreposix.h>
|
|
|
|
int regcomp(regex_t *preg, const char *pattern,
|
|
int cflags);
|
|
|
|
int regexec(regex_t *preg, const char *string,
|
|
size_t nmatch, regmatch_t pmatch[], int eflags);
|
|
|
|
size_t regerror(int errcode, const regex_t *preg,
|
|
char *errbuf, size_t errbuf_size);
|
|
|
|
void regfree(regex_t *preg);
|
|
|
|
|
|
|
|
DESCRIPTION
|
|
This set of functions provides a POSIX-style API to the PCRE
|
|
regular expression package. See the pcre documentation for a
|
|
description of the native API, which contains additional
|
|
functionality.
|
|
|
|
The functions described here are just wrapper functions that
|
|
ultimately call the native API. Their prototypes are defined
|
|
in the pcreposix.h header file, and on Unix systems the
|
|
library itself is called pcreposix.a, so can be accessed by
|
|
adding -lpcreposix to the command for linking an application
|
|
which uses them. Because the POSIX functions call the native
|
|
ones, it is also necessary to add -lpcre.
|
|
|
|
I have implemented only those option bits that can be rea-
|
|
sonably mapped to PCRE native options. In addition, the
|
|
options REG_EXTENDED and REG_NOSUB are defined with the
|
|
value zero. They have no effect, but since programs that are
|
|
written to the POSIX interface often use them, this makes it
|
|
easier to slot in PCRE as a replacement library. Other POSIX
|
|
options are not even defined.
|
|
|
|
When PCRE is called via these functions, it is only the API
|
|
that is POSIX-like in style. The syntax and semantics of the
|
|
regular expressions themselves are still those of Perl, sub-
|
|
ject to the setting of various PCRE options, as described
|
|
below.
|
|
|
|
The header for these functions is supplied as pcreposix.h to
|
|
avoid any potential clash with other POSIX libraries. It
|
|
can, of course, be renamed or aliased as regex.h, which is
|
|
the "correct" name. It provides two structure types, regex_t
|
|
for compiled internal forms, and regmatch_t for returning
|
|
captured substrings. It also defines some constants whose
|
|
names start with "REG_"; these are used for setting options
|
|
and identifying error codes.
|
|
|
|
|
|
|
|
COMPILING A PATTERN
|
|
The function regcomp() is called to compile a pattern into
|
|
an internal form. The pattern is a C string terminated by a
|
|
binary zero, and is passed in the argument pattern. The preg
|
|
argument is a pointer to a regex_t structure which is used
|
|
as a base for storing information about the compiled expres-
|
|
sion.
|
|
|
|
The argument cflags is either zero, or contains one or more
|
|
of the bits defined by the following macros:
|
|
|
|
REG_ICASE
|
|
|
|
The PCRE_CASELESS option is set when the expression is
|
|
passed for compilation to the native function.
|
|
|
|
REG_NEWLINE
|
|
|
|
The PCRE_MULTILINE option is set when the expression is
|
|
passed for compilation to the native function.
|
|
|
|
In the absence of these flags, no options are passed to the
|
|
native function. This means the the regex is compiled with
|
|
PCRE default semantics. In particular, the way it handles
|
|
newline characters in the subject string is the Perl way,
|
|
not the POSIX way. Note that setting PCRE_MULTILINE has only
|
|
some of the effects specified for REG_NEWLINE. It does not
|
|
affect the way newlines are matched by . (they aren't) or a
|
|
negative class such as [^a] (they are).
|
|
|
|
The yield of regcomp() is zero on success, and non-zero oth-
|
|
erwise. The preg structure is filled in on success, and one
|
|
member of the structure is publicized: re_nsub contains the
|
|
number of capturing subpatterns in the regular expression.
|
|
Various error codes are defined in the header file.
|
|
|
|
|
|
|
|
MATCHING A PATTERN
|
|
The function regexec() is called to match a pre-compiled
|
|
pattern preg against a given string, which is terminated by
|
|
a zero byte, subject to the options in eflags. These can be:
|
|
|
|
REG_NOTBOL
|
|
|
|
The PCRE_NOTBOL option is set when calling the underlying
|
|
PCRE matching function.
|
|
|
|
REG_NOTEOL
|
|
|
|
The PCRE_NOTEOL option is set when calling the underlying
|
|
PCRE matching function.
|
|
|
|
The portion of the string that was matched, and also any
|
|
captured substrings, are returned via the pmatch argument,
|
|
which points to an array of nmatch structures of type
|
|
regmatch_t, containing the members rm_so and rm_eo. These
|
|
contain the offset to the first character of each substring
|
|
and the offset to the first character after the end of each
|
|
substring, respectively. The 0th element of the vector
|
|
relates to the entire portion of string that was matched;
|
|
subsequent elements relate to the capturing subpatterns of
|
|
the regular expression. Unused entries in the array have
|
|
both structure members set to -1.
|
|
|
|
A successful match yields a zero return; various error codes
|
|
are defined in the header file, of which REG_NOMATCH is the
|
|
"expected" failure code.
|
|
|
|
|
|
|
|
ERROR MESSAGES
|
|
The regerror() function maps a non-zero errorcode from
|
|
either regcomp or regexec to a printable message. If preg is
|
|
not NULL, the error should have arisen from the use of that
|
|
structure. A message terminated by a binary zero is placed
|
|
in errbuf. The length of the message, including the zero, is
|
|
limited to errbuf_size. The yield of the function is the
|
|
size of buffer needed to hold the whole message.
|
|
|
|
|
|
|
|
STORAGE
|
|
Compiling a regular expression causes memory to be allocated
|
|
and associated with the preg structure. The function reg-
|
|
free() frees all such memory, after which preg may no longer
|
|
be used as a compiled expression.
|
|
|
|
|
|
|
|
AUTHOR
|
|
Philip Hazel <ph10@cam.ac.uk>
|
|
University Computing Service,
|
|
New Museums Site,
|
|
Cambridge CB2 3QG, England.
|
|
Phone: +44 1223 334714
|
|
|
|
Copyright (c) 1997-2000 University of Cambridge.
|