source: trunk/third/pcre/README @ 19309

Revision 19309, 13.9 KB checked in by ghudson, 22 years ago (diff)
This commit was generated by cvs2svn to compensate for changes in r19308, which included commits to RCS files with non-trunk default branches.
Line 
1README file for PCRE (Perl-compatible regular expression library)
2-----------------------------------------------------------------
3
4The latest release of PCRE is always available from
5
6  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
7
8Please read the NEWS file if you are upgrading from a previous release.
9
10PCRE has its own native API, but a set of "wrapper" functions that are based on
11the POSIX API are also supplied in the library libpcreposix. Note that this
12just provides a POSIX calling interface to PCRE: the regular expressions
13themselves still follow Perl syntax and semantics. The header file
14for the POSIX-style functions is called pcreposix.h. The official POSIX name is
15regex.h, but I didn't want to risk possible problems with existing files of
16that name by distributing it that way. To use it with an existing program that
17uses the POSIX API, it will have to be renamed or pointed at by a link.
18
19
20Contributions by users of PCRE
21------------------------------
22
23You can find contributions from PCRE users in the directory
24
25  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
26
27where there is also a README file giving brief descriptions of what they are.
28Several of them provide support for compiling PCRE on various flavours of
29Windows systems (I myself do not use Windows). Some are complete in themselves;
30others are pointers to URLs containing relevant files.
31
32
33Building PCRE on a Unix system
34------------------------------
35
36To build PCRE on a Unix system, first run the "configure" command from the PCRE
37distribution directory, with your current directory set to the directory where
38you want the files to be created. This command is a standard GNU "autoconf"
39configuration script, for which generic instructions are supplied in INSTALL.
40
41Most commonly, people build PCRE within its own distribution directory, and in
42this case, on many systems, just running "./configure" is sufficient, but the
43usual methods of changing standard defaults are available. For example,
44
45CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
46
47specifies that the C compiler should be run with the flags '-O2 -Wall' instead
48of the default, and that "make install" should install PCRE under /opt/local
49instead of the default /usr/local.
50
51If you want to build in a different directory, just run "configure" with that
52directory as current. For example, suppose you have unpacked the PCRE source
53into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
54
55cd /build/pcre/pcre-xxx
56/source/pcre/pcre-xxx/configure
57
58If you want to make use of the experimential, incomplete support for UTF-8
59character strings in PCRE, you must add --enable-utf8 to the "configure"
60command. Without it, the code for handling UTF-8 is not included in the
61library. (Even when included, it still has to be enabled by an option at run
62time.)
63
64The "configure" script builds five files:
65
66. libtool is a script that builds shared and/or static libraries
67. Makefile is built by copying Makefile.in and making substitutions.
68. config.h is built by copying config.in and making substitutions.
69. pcre-config is built by copying pcre-config.in and making substitutions.
70. RunTest is a script for running tests
71
72Once "configure" has run, you can run "make". It builds two libraries called
73libpcre and libpcreposix, a test program called pcretest, and the pcregrep
74command. You can use "make install" to copy these, the public header files
75pcre.h and pcreposix.h, and the man pages to appropriate live directories on
76your system, in the normal way.
77
78Running "make install" also installs the command pcre-config, which can be used
79to recall information about the PCRE configuration and installation. For
80example,
81
82  pcre-config --version
83
84prints the version number, and
85
86 pcre-config --libs
87
88outputs information about where the library is installed. This command can be
89included in makefiles for programs that use PCRE, saving the programmer from
90having to remember too many details.
91
92There is one esoteric feature that is controlled by "configure". It concerns
93the character value used for "newline", and is something that you probably do
94not want to change on a Unix system. The default is to use whatever value your
95compiler gives to '\n'. By using --enable-newline-is-cr or
96--enable-newline-is-lf you can force the value to be CR (13) or LF (10) if you
97really want to.
98
99
100Shared libraries on Unix systems
101--------------------------------
102
103The default distribution builds PCRE as two shared libraries and two static
104libraries, as long as the operating system supports shared libraries. Shared
105library support relies on the "libtool" script which is built as part of the
106"configure" process.
107
108The libtool script is used to compile and link both shared and static
109libraries. They are placed in a subdirectory called .libs when they are newly
110built. The programs pcretest and pcregrep are built to use these uninstalled
111libraries (by means of wrapper scripts in the case of shared libraries). When
112you use "make install" to install shared libraries, pcregrep and pcretest are
113automatically re-built to use the newly installed shared libraries before being
114installed themselves. However, the versions left in the source directory still
115use the uninstalled libraries.
116
117To build PCRE using static libraries only you must use --disable-shared when
118configuring it. For example
119
120./configure --prefix=/usr/gnu --disable-shared
121
122Then run "make" in the usual way. Similarly, you can use --disable-static to
123build only shared libraries.
124
125
126Building on non-Unix systems
127----------------------------
128
129For a non-Unix system, read the comments in the file NON-UNIX-USE. PCRE has
130been compiled on Windows systems and on Macintoshes, but I don't know the
131details because I don't use those systems. It should be straightforward to
132build PCRE on any system that has a Standard C compiler, because it uses only
133Standard C functions.
134
135
136Testing PCRE
137------------
138
139To test PCRE on a Unix system, run the RunTest script that is created by the
140configuring process. (This can also be run by "make runtest", "make check", or
141"make test".) For other systems, see the instruction in NON-UNIX-USE.
142
143The script runs the pcretest test program (which is documented in the doc
144directory) on each of the testinput files (in the testdata directory) in turn,
145and compares the output with the contents of the corresponding testoutput file.
146A file called testtry is used to hold the output from pcretest. To run pcretest
147on just one of the test files, give its number as an argument to RunTest, for
148example:
149
150  RunTest 3
151
152The first and third test files can also be fed directly into the perltest
153script to check that Perl gives the same results. The third file requires the
154additional features of release 5.005, which is why it is kept separate from the
155main test input, which needs only Perl 5.004. In the long run, when 5.005 (or
156higher) is widespread, these two test files may get amalgamated.
157
158The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
159pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
160detection, and run-time flags that are specific to PCRE, as well as the POSIX
161wrapper API. It also uses the debugging flag to check some of the internals of
162pcre_compile().
163
164If you build PCRE with a locale setting that is not the standard C locale, the
165character tables may be different (see next paragraph). In some cases, this may
166cause failures in the second set of tests. For example, in a locale where the
167isprint() function yields TRUE for characters in the range 128-255, the use of
168[:isascii:] inside a character class defines a different set of characters, and
169this shows up in this test as a difference in the compiled code, which is being
170listed for checking. Where the comparison test output contains [\x00-\x7f] the
171test will contain [\x00-\xff], and similarly in some other cases. This is not a
172bug in PCRE.
173
174The fourth set of tests checks pcre_maketables(), the facility for building a
175set of character tables for a specific locale and using them instead of the
176default tables. The tests make use of the "fr" (French) locale. Before running
177the test, the script checks for the presence of this locale by running the
178"locale" command. If that command fails, or if it doesn't include "fr" in the
179list of available locales, the fourth test cannot be run, and a comment is
180output to say why. If running this test produces instances of the error
181
182  ** Failed to set locale "fr"
183
184in the comparison output, it means that locale is not available on your system,
185despite being listed by "locale". This does not mean that PCRE is broken.
186
187The fifth test checks the experimental, incomplete UTF-8 support. It is not run
188automatically unless PCRE is built with UTF-8 support. This file can be fed
189directly to the perltest8 script, which requires Perl 5.6 or higher. The sixth
190file tests internal UTF-8 features of PCRE that are not relevant to Perl.
191
192
193Character tables
194----------------
195
196PCRE uses four tables for manipulating and identifying characters. The final
197argument of the pcre_compile() function is a pointer to a block of memory
198containing the concatenated tables. A call to pcre_maketables() can be used to
199generate a set of tables in the current locale. If the final argument for
200pcre_compile() is passed as NULL, a set of default tables that is built into
201the binary is used.
202
203The source file called chartables.c contains the default set of tables. This is
204not supplied in the distribution, but is built by the program dftables
205(compiled from dftables.c), which uses the ANSI C character handling functions
206such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
207sources. This means that the default C locale which is set for your system will
208control the contents of these default tables. You can change the default tables
209by editing chartables.c and then re-building PCRE. If you do this, you should
210probably also edit Makefile to ensure that the file doesn't ever get
211re-generated.
212
213The first two 256-byte tables provide lower casing and case flipping functions,
214respectively. The next table consists of three 32-byte bit maps which identify
215digits, "word" characters, and white space, respectively. These are used when
216building 32-byte bit maps that represent character classes.
217
218The final 256-byte table has bits indicating various character types, as
219follows:
220
221    1   white space character
222    2   letter
223    4   decimal digit
224    8   hexadecimal digit
225   16   alphanumeric or '_'
226  128   regular expression metacharacter or binary zero
227
228You should not alter the set of characters that contain the 128 bit, as that
229will cause PCRE to malfunction.
230
231
232Manifest
233--------
234
235The distribution should contain the following files:
236
237(A) The actual source files of the PCRE library functions and their
238    headers:
239
240  dftables.c            auxiliary program for building chartables.c
241  get.c                 )
242  maketables.c          )
243  study.c               ) source of
244  pcre.c                )   the functions
245  pcreposix.c           )
246  pcre.in               "source" for the header for the external API; pcre.h
247                          is built from this by "configure"
248  pcreposix.h           header for the external POSIX wrapper API
249  internal.h            header for internal use
250  config.in             template for config.h, which is built by configure
251
252(B) Auxiliary files:
253
254  AUTHORS               information about the author of PCRE
255  ChangeLog             log of changes to the code
256  INSTALL               generic installation instructions
257  LICENCE               conditions for the use of PCRE
258  COPYING               the same, using GNU's standard name
259  Makefile.in           template for Unix Makefile, which is built by configure
260  NEWS                  important changes in this release
261  NON-UNIX-USE          notes on building PCRE on non-Unix systems
262  README                this file
263  RunTest.in            template for a Unix shell script for running tests
264  config.guess          ) files used by libtool,
265  config.sub            )   used only when building a shared library
266  configure             a configuring shell script (built by autoconf)
267  configure.in          the autoconf input used to build configure
268  doc/Tech.Notes        notes on the encoding
269  doc/pcre.3            man page source for the PCRE functions
270  doc/pcre.html         HTML version
271  doc/pcre.txt          plain text version
272  doc/pcreposix.3       man page source for the POSIX wrapper API
273  doc/pcreposix.html    HTML version
274  doc/pcreposix.txt     plain text version
275  doc/pcretest.txt      documentation of test program
276  doc/perltest.txt      documentation of Perl test program
277  doc/pcregrep.1        man page source for the pcregrep utility
278  doc/pcregrep.html     HTML version
279  doc/pcregrep.txt      plain text version
280  install-sh            a shell script for installing files
281  ltmain.sh             file used to build a libtool script
282  pcretest.c            comprehensive test program
283  pcredemo.c            simple demonstration of coding calls to PCRE
284  perltest              Perl test program
285  perltest8             Perl test program for UTF-8 tests
286  pcregrep.c            source of a grep utility that uses PCRE
287  pcre-config.in        source of script which retains PCRE information
288  testdata/testinput1   test data, compatible with Perl 5.004 and 5.005
289  testdata/testinput2   test data for error messages and non-Perl things
290  testdata/testinput3   test data, compatible with Perl 5.005
291  testdata/testinput4   test data for locale-specific tests
292  testdata/testinput5   test data for UTF-8 tests compatible with Perl 5.6
293  testdata/testinput6   test data for other UTF-8 tests
294  testdata/testoutput1  test results corresponding to testinput1
295  testdata/testoutput2  test results corresponding to testinput2
296  testdata/testoutput3  test results corresponding to testinput3
297  testdata/testoutput4  test results corresponding to testinput4
298  testdata/testoutput5  test results corresponding to testinput5
299  testdata/testoutput6  test results corresponding to testinput6
300
301(C) Auxiliary files for Win32 DLL
302
303  dll.mk
304  pcre.def
305
306Philip Hazel <ph10@cam.ac.uk>
307August 2001
Note: See TracBrowser for help on using the repository browser.