150 lines
5.9 KiB
Groff
150 lines
5.9 KiB
Groff
.TH PCRE 3
|
|
.SH NAME
|
|
pcreposix - POSIX API for Perl-compatible regular expressions.
|
|
.SH SYNOPSIS
|
|
.B #include <pcreposix.h>
|
|
.PP
|
|
.SM
|
|
.br
|
|
.B int regcomp(regex_t *\fIpreg\fR, const char *\fIpattern\fR,
|
|
.ti +5n
|
|
.B int \fIcflags\fR);
|
|
.PP
|
|
.br
|
|
.B int regexec(regex_t *\fIpreg\fR, const char *\fIstring\fR,
|
|
.ti +5n
|
|
.B size_t \fInmatch\fR, regmatch_t \fIpmatch\fR[], int \fIeflags\fR);
|
|
.PP
|
|
.br
|
|
.B size_t regerror(int \fIerrcode\fR, const regex_t *\fIpreg\fR,
|
|
.ti +5n
|
|
.B char *\fIerrbuf\fR, size_t \fIerrbuf_size\fR);
|
|
.PP
|
|
.br
|
|
.B void regfree(regex_t *\fIpreg\fR);
|
|
|
|
|
|
.SH DESCRIPTION
|
|
This set of functions provides a POSIX-style API to the PCRE regular expression
|
|
package. See the \fBpcre\fR documentation for a description of the native API,
|
|
which contains additional functionality.
|
|
|
|
The functions described here are just wrapper functions that ultimately call
|
|
the native API. Their prototypes are defined in the \fBpcreposix.h\fR header
|
|
file, and on Unix systems the library itself is called \fBpcreposix.a\fR, so
|
|
can be accessed by adding \fB-lpcreposix\fR to the command for linking an
|
|
application which uses them. Because the POSIX functions call the native ones,
|
|
it is also necessary to add \fR-lpcre\fR.
|
|
|
|
I have implemented only those option bits that can be reasonably mapped to PCRE
|
|
native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
|
|
with the value zero. They have no effect, but since programs that are written
|
|
to the POSIX interface often use them, this makes it easier to slot in PCRE as
|
|
a replacement library. Other POSIX options are not even defined.
|
|
|
|
When PCRE is called via these functions, it is only the API that is POSIX-like
|
|
in style. The syntax and semantics of the regular expressions themselves are
|
|
still those of Perl, subject to the setting of various PCRE options, as
|
|
described below.
|
|
|
|
The header for these functions is supplied as \fBpcreposix.h\fR to avoid any
|
|
potential clash with other POSIX libraries. It can, of course, be renamed or
|
|
aliased as \fBregex.h\fR, which is the "correct" name. It provides two
|
|
structure types, \fIregex_t\fR for compiled internal forms, and
|
|
\fIregmatch_t\fR for returning captured substrings. It also defines some
|
|
constants whose names start with "REG_"; these are used for setting options and
|
|
identifying error codes.
|
|
|
|
|
|
.SH COMPILING A PATTERN
|
|
|
|
The function \fBregcomp()\fR is called to compile a pattern into an
|
|
internal form. The pattern is a C string terminated by a binary zero, and
|
|
is passed in the argument \fIpattern\fR. The \fIpreg\fR argument is a pointer
|
|
to a regex_t structure which is used as a base for storing information about
|
|
the compiled expression.
|
|
|
|
The argument \fIcflags\fR is either zero, or contains one or more of the bits
|
|
defined by the following macros:
|
|
|
|
REG_ICASE
|
|
|
|
The PCRE_CASELESS option is set when the expression is passed for compilation
|
|
to the native function.
|
|
|
|
REG_NEWLINE
|
|
|
|
The PCRE_MULTILINE option is set when the expression is passed for compilation
|
|
to the native function.
|
|
|
|
In the absence of these flags, no options are passed to the native function.
|
|
This means the the regex is compiled with PCRE default semantics. In
|
|
particular, the way it handles newline characters in the subject string is the
|
|
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
|
|
\fIsome\fR of the effects specified for REG_NEWLINE. It does not affect the way
|
|
newlines are matched by . (they aren't) or a negative class such as [^a] (they
|
|
are).
|
|
|
|
The yield of \fBregcomp()\fR is zero on success, and non-zero otherwise. The
|
|
\fIpreg\fR structure is filled in on success, and one member of the structure
|
|
is publicized: \fIre_nsub\fR contains the number of capturing subpatterns in
|
|
the regular expression. Various error codes are defined in the header file.
|
|
|
|
|
|
.SH MATCHING A PATTERN
|
|
The function \fBregexec()\fR is called to match a pre-compiled pattern
|
|
\fIpreg\fR against a given \fIstring\fR, which is terminated by a zero byte,
|
|
subject to the options in \fIeflags\fR. These can be:
|
|
|
|
REG_NOTBOL
|
|
|
|
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
|
function.
|
|
|
|
REG_NOTEOL
|
|
|
|
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
|
function.
|
|
|
|
The portion of the string that was matched, and also any captured substrings,
|
|
are returned via the \fIpmatch\fR argument, which points to an array of
|
|
\fInmatch\fR structures of type \fIregmatch_t\fR, containing the members
|
|
\fIrm_so\fR and \fIrm_eo\fR. These contain the offset to the first character of
|
|
each substring and the offset to the first character after the end of each
|
|
substring, respectively. The 0th element of the vector relates to the entire
|
|
portion of \fIstring\fR that was matched; subsequent elements relate to the
|
|
capturing subpatterns of the regular expression. Unused entries in the array
|
|
have both structure members set to -1.
|
|
|
|
A successful match yields a zero return; various error codes are defined in the
|
|
header file, of which REG_NOMATCH is the "expected" failure code.
|
|
|
|
|
|
.SH ERROR MESSAGES
|
|
The \fBregerror()\fR function maps a non-zero errorcode from either
|
|
\fBregcomp\fR or \fBregexec\fR to a printable message. If \fIpreg\fR is not
|
|
NULL, the error should have arisen from the use of that structure. A message
|
|
terminated by a binary zero is placed in \fIerrbuf\fR. The length of the
|
|
message, including the zero, is limited to \fIerrbuf_size\fR. The yield of the
|
|
function is the size of buffer needed to hold the whole message.
|
|
|
|
|
|
.SH STORAGE
|
|
Compiling a regular expression causes memory to be allocated and associated
|
|
with the \fIpreg\fR structure. The function \fBregfree()\fR frees all such
|
|
memory, after which \fIpreg\fR may no longer be used as a compiled expression.
|
|
|
|
|
|
.SH AUTHOR
|
|
Philip Hazel <ph10@cam.ac.uk>
|
|
.br
|
|
University Computing Service,
|
|
.br
|
|
New Museums Site,
|
|
.br
|
|
Cambridge CB2 3QG, England.
|
|
.br
|
|
Phone: +44 1223 334714
|
|
|
|
Copyright (c) 1997-2000 University of Cambridge.
|