source: trunk/third/ispell/ispell.4 @ 10334

Revision 10334, 29.0 KB checked in by ghudson, 27 years ago (diff)
This commit was generated by cvs2svn to compensate for changes in r10333, which included commits to RCS files with non-trunk default branches.
Line 
1.\"
2.\" $Id: ispell.4,v 1.1.1.1 1997-09-03 21:08:11 ghudson Exp $
3.\"
4.\" Copyright 1992, 1993, Geoff Kuenning, Granada Hills, CA
5.\" All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\"
11.\" 1. Redistributions of source code must retain the above copyright
12.\"    notice, this list of conditions and the following disclaimer.
13.\" 2. Redistributions in binary form must reproduce the above copyright
14.\"    notice, this list of conditions and the following disclaimer in the
15.\"    documentation and/or other materials provided with the distribution.
16.\" 3. All modifications to the source code must be clearly marked as
17.\"    such.  Binary redistributions based on modified source code
18.\"    must be clearly marked as modified versions in the documentation
19.\"    and/or other materials provided with the distribution.
20.\" 4. All advertising materials mentioning features or use of this software
21.\"    must display the following acknowledgment:
22.\"      This product includes software developed by Geoff Kuenning and
23.\"      other unpaid contributors.
24.\" 5. The name of Geoff Kuenning may not be used to endorse or promote
25.\"    products derived from this software without specific prior
26.\"    written permission.
27.\"
28.\" THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS IS'' AND
29.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
30.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
31.\" ARE DISCLAIMED.  IN NO EVENT SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE
32.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
33.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
34.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
35.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
36.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
37.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
38.\" SUCH DAMAGE.
39.\"
40.\" $Log: not supported by cvs2svn $
41.\" Revision 1.30  1995/08/05  23:19:39  geoff
42.\" Fix a place where a line was eaten because it was seen as an nroff
43.\" command.
44.\"
45.\" Revision 1.29  1995/01/08  23:23:45  geoff
46.\" Fix a tiny typo.
47.\"
48.\" Revision 1.28  1994/11/02  06:56:07  geoff
49.\" Remove the anyword feature, which I've decided is a bad idea.
50.\"
51.\" Revision 1.27  1994/10/26  05:12:31  geoff
52.\" Document the new compound-word options for German and Scandinavian
53.\" languages, and the always-OK flag for French.
54.\"
55.\" Revision 1.26  1994/05/25  04:29:19  geoff
56.\" Document the new restriction that boundary characters must appear
57.\" singly.
58.\"
59.\" Revision 1.25  1994/01/25  07:11:42  geoff
60.\" Get rid of all old RCS log lines in preparation for the 3.1 release.
61.\"
62.\"
63.TH ISPELL 4 local
64.SH NAME
65ispell \- format of ispell dictionaries and affix files
66.SH DESCRIPTION
67.PP
68.IR Ispell (1)
69requires two files to define the language that it is spell-checking.
70The first file is a dictionary containing words for the language,
71and the second is an "affix" file that defines the meaning of special
72flags in the dictionary.
73The two files are combined by
74.I buildhash
75(see
76.IR ispell "(1))"
77and written to a hash file which is not described here.
78.PP
79A raw
80.I ispell
81dictionary (either the main dictionary or your own personal
82dictionary) contains a list of words, one per line.
83Each word may optionally be followed by a slash ("/") and one or more
84flags, which modify the root word as explained below.
85Depending on the options with which
86.I ispell
87was built, case may or
88may not be significant in either the root word or the flags, independently.
89Specifically, if the compile-time option CAPITALIZATION is defined, case
90is significant in the root word;
91if not, case is ignored in the root word.
92If the compile-time option MASKBITS is set to a value of 32, case is ignored
93in the flags;
94otherwise case is significant in the flags.
95Contact your system administrator or
96.I ispell
97maintainer for more information (or use the
98.B \-vv
99flag to find out).
100The dictionary should be sorted with the
101.B \-f
102flag of
103.IR sort (1)
104before the hash file is built;
105this is done automatically by
106.IR munchlist (1),
107which is the normal way of building dictionaries.
108.PP
109If the dictionary contains words that have string characters (see the
110affix-file documentation below), they must be written in the format
111given by the
112.B defstringtype
113statement in the affix file.
114This will be the case for most non-English languages.
115Be careful to use this format, rather than that of your favorite
116formatter, when adding words to a dictionary.  (If you add words to
117your personal dictionary during an
118.I ispell
119session, they will automatically be converted to the correct format.
120This feature can be used to convert an entire dictionary if necessary:)
121.PP
122.RS
123.nf
124        echo qqqqq > dummy.dict
125        buildhash dummy.dict \fIaffix-file\fP dummy.hash
126        awk '{print "*"}END{print "#"}' \fIold-dict-file\fP \e
127        | ispell -a -T \fIold-dict-string-type\fP \e
128          -d ./dummy.hash -p ./\fInew-dict-file\fP \e
129          > /dev/null
130        rm dummy.*
131.fi
132.RE
133.PP
134The case of the root word controls the case of words accepted by
135.IR ispell ,
136as follows:
137.IP (1)
138If the root word appears only in lower case (e.g.,
139.IR bob "),"
140it will be accepted in lower case, capitalized, or all capitals.
141.IP (2)
142If the root word appears capitalized (e.g.,
143.IR Robert "),"
144it will not
145be accepted in
146all-lower case, but will be accepted capitalized or all in capitals.
147.IP (3)
148If the root word appears all in capitals (e.g.,
149.IR UNIX "),"
150it will only be accepted all in capitals.
151.IP (4)
152If the root word appears with a "funny" capitalization (e.g.,
153.IR ITCorp "),"
154a word will be accepted only if it follows that capitalization, or if
155it appears all in capitals.
156.IP (5)
157More than one capitalization of a root word may appear in the dictionary.
158Flags from different capitalizations are combined by OR-ing them together.
159.PP
160Redundant capitalizations (e.g.,
161.I bob
162and
163.IR Bob ")"
164will be combined
165by
166.I buildhash
167and by
168.I ispell
169(for personal dictionaries),
170and can be removed from a raw dictionary by
171.IR munchlist .
172.PP
173For example, the dictionary:
174.PP
175.RS
176.nf
177bob
178Robert
179UNIX
180ITcorp
181ITCorp
182.fi
183.RE
184.PP
185will accept
186.IR bob ,
187.IR Bob ,
188.IR BOB ,
189.IR Robert ,
190.IR ROBERT ,
191.IR UNIX ,
192.IR ITcorp ,
193.IR ITCorp ,
194and
195.IR ITCORP ,
196and will reject all others.
197Some of the unacceptable forms are
198.IR bOb ,
199.IR robert ,
200.IR Unix ,
201and
202.IR ItCorp .
203.PP
204As mentioned above, root words in any dictionary may be extended by flags.
205Each flag is a single alphabetic character, which represents a prefix or
206suffix
207that may be added to the root to form a new word.
208For example, in an English dictionary
209the
210.B D
211flag can be added to
212.I bathe
213to make
214.IR bathed .
215Since flags are represented as a single bit in the hashed dictionary, this
216results in significant space savings.
217The
218.I munchlist
219script will reduce an existing raw dictionary by adding flags when possible.
220.PP
221When a word is extended with an affix, the affix will be accepted only
222if it appears in the same case
223as the initial (prefix) or final (suffix) letter of the word.
224Thus, for example, the entry
225.I UNIX/M
226in the main dictionary
227.RB "(" M
228means
229add an apostrophe and an "s" to make a possessive) would accept
230.I "UNIX'S"
231but would reject
232.IR "UNIX's" .
233If
234.I "UNIX's"
235is legal, it must appear as a separate dictionary entry,
236and it will not be combined by
237.IR munchlist .
238(In general, you don't need to worry about these things;
239.I munchlist
240guarantees that its output dictionary will accept the same set of
241words as its input, so all you have to do is add words to the dictionary
242and occasionally run munchlist to reduce its size).
243.PP
244As mentioned, the affix definition file describes the affixes associated
245with particular flags.
246It also describes the character set used by the language.
247.PP
248Although the affix-definition
249grammar is designed for a line-oriented layout, it is actually
250a free-format yacc grammar and can be laid out weirdly if you want.
251Comments are started by a pound (sharp) sign (#),
252and continue to the end of the line.
253Backslashes are supported in the usual fashion (\fB\e\fInnn\fR, plus
254specials
255.BR \en ,
256.BR \er ,
257.BR \et ,
258.BR \ev ,
259.BR \ef ,
260.BR \eb ,
261and the new hex format \fB\ex\fInn\fR).
262Any character
263with special meaning to the parser can be changed to an uninterpreted
264token by backslashing it;
265for example, you can declare a flag named 'asterisk' or 'colon' with
266.I "flag \e*:"
267or
268.IR "flag \e::" .
269.PP
270The grammar will be presented in a top-down fashion, with discussion
271of each element.
272An affix-definition file must contain exactly one table:
273.PP
274.RS
275.nf
276\fItable\fR     :       [\fIheaders\fR] [\fIprefixes\fR] [\fIsuffixes\fR]
277.fi
278.RE
279.PP
280At least one of
281.I prefixes
282and
283.I suffixes
284is required.
285They can appear in either order.
286.PP
287.RS
288.nf
289\fIheaders\fR   :       [ \fIoptions\fR ] \fIchar-sets\fR
290.fi
291.RE
292.PP
293The headers describe options global to this dictionary and language.
294These include the character sets to be used and the formatter, and
295the defaults for certain
296.I ispell
297flags.
298.PP
299.RS
300.nf
301\fIoptions\fR : { \fIfmtr-stmt\fR | \fIopt-stmt\fR | \fIflag-stmt\fR | \fInum-stmt\fR }
302.fi
303.RE
304.PP
305The options statements define the defaults for certain ispell flags
306and for the character sets used by the formatters.
307.PP
308.RS
309.nf
310\fIfmtr-stmt\fR :       { \fInroff-stmt\fR | \fItex-stmt\fR }
311.fi
312.RE
313.PP
314A
315.I fmtr-stmt
316describes characters that have special meaning to a formatter.
317Normally, this statement is not necessary, but some languages may have
318preempted the usual defaults for use as language-specific characters.
319In this case, these statements may be used to redefine the special characters
320expected by the formatter.
321.PP
322.RS
323.nf
324\fInroff-stmt\fR        :       { \fBnroffchars\fR | \fBtroffchars\fR } \fIstring\fR
325.fi
326.RE
327.PP
328The
329.B nroffchars
330statement allows redefinition of certain
331.I nroff
332control characters.
333The string given must be exactly five characters long, and must list
334substitutions for the left and right parentheses ("()") , the period ("."),
335the backslash ("\e"), and the asterisk ("*").
336(The right parenthesis is not currently used, but is included for
337completeness.)
338For example, the statement:
339.PP
340.RS
341.nf
342\fBnroffchars\fR {}.\e\e*
343.fi
344.RE
345.PP
346would replace the left and right parentheses with left and right curly
347braces for purposes of parsing
348.IR nroff / troff
349strings, with no effect on the others (admittedly a contrived example).
350Note that the backslash is escaped with a backslash.
351.PP
352.RS
353.nf
354\fItex-stmt\fR  :       { \fBTeXchars\fR | \fBtexchars\fR } \fIstring\fR
355.fi
356.RE
357.PP
358The
359.B TeXchars
360statement allows redefinition of certain TeX/LaTeX control characters.
361The string given must be exactly thirteen characters long, and must list
362substitutions for the left and right parentheses ("()") , the left
363and right square brackets ("[]"), the left and right curly braces ("{}"),
364the left and right angle brackets ("<>"),
365the backslash ("\e"), the dollar sign ("$"), the asterisk ("*"),
366the period or dot ("."), and the percent sign ("%").
367For example, the statement:
368.PP
369.RS
370.nf
371\fBtexchars\fR ()\e[\|]<\e><\e>\e\e$*.%
372.fi
373.RE
374.PP
375would replace the functions of the left and right curly braces with the
376left and right angle brackets for purposes of parsing TeX/LaTeX constructs,
377while retaining their functions for the
378.I tib
379bibliographic preprocessor.
380Note that the backslash, the left square bracket, and the right angle bracket
381must be escaped with a backslash.
382.PP
383.RS
384.nf
385\fIopt-stmt\fR  :       { \fIcmpnd-stmt\fR | \fIaff-stmt\fR }
386.sp
387\fIcmpnd-stmt\fR        :       \fBcompoundwords\fR \fIcompound-opt\fR
388.sp
389\fIaff-stmt\fR          :       \fBallaffixes\fR \fIon-or-off\fR
390.sp
391\fIon-or-off\fR :       { \fBon\fR | \fBoff\fR }
392.sp
393\fIcompound-opt\fR :    { \fIon-or-off\fR | \fBcontrolled\fR \fIcharacter\fR }
394.fi
395.RE
396.PP
397An
398.I opt-stmt
399controls certain ispell defaults that are best made language-specific.
400The
401.B allaffixes
402statement controls the default for the
403.B \-P
404and
405.B \-m
406options to
407.I ispell.
408If
409.B allaffixes
410is turned
411.B off
412(the default),
413.I ispell
414will default to the behavior of the
415.I \-P
416flag:
417root/affix suggestions will only be made if there are no "near misses".
418If
419.B allaffixes
420is turned
421.BR on ,
422.I ispell
423will default to the behavior of the
424.I \-m
425flag:
426root/affix suggestions will always be made.
427The
428.B compoundwords
429statement controls the default for the
430.B \-B
431and
432.B \-C
433options to
434.I ispell.
435If
436.B compoundwords
437is turned
438.B off
439(the default),
440.I ispell
441will default to the behavior of the
442.I \-B
443flag:
444run-together words will be reported as errors.
445If
446.B compoundwords
447is turned
448.BR on ,
449.I ispell
450will default to the behavior of the
451.I \-C
452flag:
453run-together words will be considered as compounds if both are in
454the dictionary.
455This is useful for languages such as German and Norwegian, which
456form large numbers of compound words.
457Finally, if
458.B compoundwords
459is set to
460.IR controlled ,
461only words marked with the flag indicated by
462.I character
463(which should not be otherwise used)
464will be allowed to participate in compound formation.
465Because this option requires the flags to be specified in the dictionary,
466it is not available from the command line.
467.PP
468.RS
469.nf
470\fIflag-stmt\fR :       \fBflagmarker\fR \fIcharacter\fR
471.fi
472.RE
473.PP
474The
475.B flagmarker
476statement describes the character which is used to separate affix
477flags from the root word in a raw dictionary file.
478This must be a
479character which is not found in any word (including in string characters;
480see below).
481The default is "/" because this character is not normally
482used to represent special characters in any language.
483.PP
484.RS
485.nf
486\fInum-stmt\fR  :       \fBcompoundmin\fR \fIdigit\fR
487.fi
488.RE
489.PP
490The
491.B compoundmin
492statement controls the length of the two components of a compound
493word.
494This only has an effect if
495.B compoundwords
496is turned
497.B on
498or if the
499.B \-C
500flag is given to
501.IR ispell .
502In that case, only words at least as long as the given minimum will be
503accepted as components of a compound.
504The default is 3 characters.
505.PP
506.RS
507.nf
508\fIchar-sets\fR :       \fInorm-sets\fR [ \fIalt-sets\fR ]
509.fi
510.RE
511.PP
512The character-set section describes the characters that can be part of
513a word, and defines their collating order.
514There must always be a definition of "normal" character sets;  in
515addition, there may be one or more partial definitions of "alternate"
516sets which are used with various text formatters.
517.PP
518.RS
519.nf
520\fInorm-sets\fR :       [ \fIdeftype\fR ] charset-group
521.fi
522.RE
523.PP
524A "normal" character set may optionally begin with a
525definition of the file suffixes that make use of this set.
526Following this are one or more character-set declarations.
527.PP
528.RS
529.nf
530\fIdeftype\fR : \fBdefstringtype\fR \fIname\fR \fIdeformatter\fR \fIsuffix\fR*
531.fi
532.RE
533.PP
534The
535.B defstringtype
536declaration gives a list of file suffixes which should make use of the
537default string characters defined as part of the base character set;
538it is only necessary if string characters are being defined.
539The
540.I name
541parameter
542is a string giving the unique name associated with these suffixes;
543often it is a formatter name.
544If the formatter is a member of the troff family, "nroff" should be
545used for the name associated with the most popular macro package;
546members of the TeX family should use "tex".
547Other names may be chosen freely, but they should be kept simple,
548as they are used in
549.I ispell 's
550.B \-T
551switch to specify a formatter type.
552The
553.I deformatter
554parameter
555specifies the deformatting style to use when processing files with the
556given suffixes.
557Currently, this must be either
558.B tex
559or
560.BR nroff .
561The
562.I suffix
563parameters are a whitespace-separated list of strings which, if
564present at the end of a filename, indicate that the associated set of
565string characters should be used by default for this file.  For
566example, the suffix list for the troff family typically includes
567suffixes such as ".ms", ".me", ".mm", etc.
568.PP
569.RS
570.nf
571\fIcharset-group\fR :   { \fIchar-stmt\fR | \fIstring-stmt\fR | \fIdup-stmt\fR}*
572.fi
573.RE
574.PP
575A
576.I char-stmt
577describes single characters;
578a
579.I string-stmt
580describes characters that must appear together as a string, and which
581usually represent a single character in the target language.
582Either may
583also describe conversion between upper and lower case.
584A
585.I dup-stmt
586is used to describe alternate forms of string characters, so that a
587single dictionary may be used with several formatting
588programs that use different conventions for representing non-ASCII
589characters.
590.PP
591.RS
592.nf
593\fIchar-stmt\fR :       \fBwordchars\fR \fIcharacter-range\fR
594                |       \fBwordchars\fR \fIlowercase-range\fR \fIuppercase-range\fR
595                |       \fBboundarychars\fR \fIcharacter-range\fR
596                |       \fBboundarychars\fR \fIlowercase-range\fR \fIuppercase-range\fR
597\fIstring-stmt\fR       :       \fBstringchar\fR \fIstring\fR
598                |       \fBstringchar\fR \fIlowercase-string\fR \fIuppercase-string\fR
599.fi
600.RE
601.PP
602Characters described with the
603.B boundarychars
604statement are considered
605part of a word only if they appear singly,
606embedded between characters declared with the
607.B wordchars
608or
609.B stringchar
610statements.
611For example, if the hyphen is a boundary character (useful in French),
612the string "foo-bar" would be a single word, but "-foo" would be the
613same as "foo", and "foo--bar" would be two words separated by non-word
614characters.
615.PP
616If two ranges or strings are given in a
617.I char-stmt
618or
619.IR string-stmt ,
620the first describes
621characters that are interpreted as lowercase and the second describes
622uppercase.
623In the case of a
624.B stringchar
625statement, the two strings must be of the same length.
626Also, in a
627.B stringchar
628statement, the actual strings may contain
629both uppercase and characters themselves without difficulty;
630for instance, the statement
631.PP
632.RS
633.nf
634stringchar      "\e\e*(sS"      "\e\e*(Ss"
635.fi
636.RE
637.PP
638is legal and will not interfere with (or be interfered with by) other
639declarations of of "s" and "S" as lower and upper case, respectively.
640.PP
641A final note on string characters:
642some languages collate certain special characters as if they were strings.
643For example, the German "a-umlaut"
644is traditionally sorted as if it were "ae".
645Ispell is not capable of this;
646each character must be treated as an individual entity.
647So in certain cases,
648ispell will sort a list of words into a different order than the standard
649"dictionary" order for the target language.
650.PP
651.RS
652.nf
653\fIalt-sets\fR  :       \fIalttype\fR [ \fIalt-stmt\fR* ]
654.fi
655.RE
656.PP
657Because different formatters use different notations to represent
658non-ASCII characters,
659.I ispell
660must be aware of the representations used by these formatters.
661These are declared as alternate sets of string characters.
662.PP
663.RS
664.nf
665\fIalttype\fR   :       \fBaltstringtype\fR \fIname\fR \fIsuffix\fR*
666.fi
667.RE
668.PP
669The
670.B altstringtype
671statement introduces each set by declaring the associated formatter
672name and filename suffix list.
673This name and list are interpreted exactly as in the
674.B defstringtype
675statement above.
676Following this header are one or more \fIalt-stmt\fRs which declare
677the alternate string characters used by this formatter.
678.PP
679.RS
680.nf
681\fIalt-stmt\fR          :       \fBaltstringchar\fR \fIalt-string\fR \fIstd-string\fR
682.fi
683.RE
684.PP
685The
686.I altstringchar
687statement describes alternate representations for string
688characters.
689For example, the \-mm macro package of
690.I troff
691represents the German "a-umlaut" as
692.IR a\e*: ,
693while
694.I TeX
695uses the sequence \fI\e"a\fR.
696If the
697.I troff
698versions are declared as the standard versions using
699.BR stringchar ,
700the
701.I TeX
702versions may be declared as alternates by using the statement
703.PP
704.RS
705.nf
706altstringchar   \e\e\e"a        a\e\e*\:
707.fi
708.RE
709.PP
710When the
711.B altstringchar
712statement is used to specify alternate forms,
713all forms for a particular formatter must be declared together as a group.
714Also, each formatter or macro package
715must provide a complete set of characters, both
716upper- and lower-case, and the character sequences used for each formatter
717must be completely distinct.
718Character sequences which describe upper- and lower-case versions of
719the same printable character must also be the same length.
720It may be necessary to define some new macros for a given formatter to
721satisfy these restrictions.
722(The current version of
723.I buildhash
724does not enforce these restrictions, but failure to obey them may
725result in errors being introduced into files that are processed with
726.IR ispell .)
727.PP
728An important minor point is that
729.I ispell
730assumes that all characters declared as
731.B wordchars
732or
733.B boundarychars
734will occupy exactly
735one position on the terminal screen.
736.PP
737A single character-set statement can declare either a single character
738or a contiguous range of characters.
739A range is given as in egrep and the shell:
740[a-z] means lowercase alphabetics;
741[^a-z] means all but lowercase, etc.
742All character-set statements are combined (unioned) to produce
743the final list of characters that may be part of a word.
744The collating order of the characters is defined by the order of their
745declaration;
746if a range is used, the characters are considered to have been declared
747in ASCII order.
748Characters that have case are collated next to each other, with the
749uppercase character first.
750.PP
751The
752character-declaration statements have a rather strange behavior caused by its
753need to match each lowercase character with its uppercase equivalent.
754In any given
755.B wordchars
756or
757.B boundarychars
758statement, the characters in each range are
759first sorted into ASCII collating sequence, then matched one-for-one
760with the other range.
761(The two ranges must have the same number of characters).
762Thus, for example, the two statements:
763.PP
764.RS
765.nf
766\fBwordchars\fP [aeiou] [AEIOU]
767\fBwordchars\fP [aeiou] [UOIEA]
768.fi
769.RE
770.PP
771would produce exactly the same effect.
772To get the vowels to match
773up "wrong", you would have to use separate statements:
774.PP
775.RS
776.nf
777\fBwordchars\fP a U
778\fBwordchars\fP e O
779\fBwordchars\fP i I
780\fBwordchars\fP o E
781\fBwordchars\fP u A
782.fi
783.RE
784.PP
785which would cause uppercase 'e' to be 'O', and lowercase 'O' to be 'e'.
786This should normally be a problem only with languages which have been
787forced to use a strange ASCII collating sequence.
788If your uppercase and lowercase letters both collate in the same order,
789you shouldn't have to worry about this "feature".
790.PP
791The prefixes and suffixes sections have exactly the same syntax, except
792for the introductory keyword.
793.PP
794.RS
795.nf
796\fIprefixes\fR  :       \fBprefixes\fI flagdef\fR*
797\fIsuffixes\fR  :       \fBsuffixes\fI flagdef\fR*
798\fIflagdef\fR   :       \fBflag\fR [\fB*\fR|\fB~\fR] \fIchar\fB : \fIrepl\fR*
799.fi
800.RE
801.PP
802A prefix or suffix table consists of an introductory keyword and a list
803of flag definitions.
804Flags can be defined more than once, in which case
805the definitions are combined.
806Each flag controls one or more
807.IR repl s
808(replacements)
809which are conditionally applied to the beginnings or endings of various
810words.
811.PP
812Flags are named by a single character
813.IR char .
814Depending on a configuration option,
815this character can be either any uppercase letter (the default
816configuration) or any 7-bit ASCII character.
817Most languages should be
818able to get along with just 26 flags.
819.PP
820A flag character may be prefixed with one or more option characters.
821(If you wish to use one of the option characters as a flag character,
822simply enclose it in double quotes.)
823.PP
824The asterisk (\fB*\fP) option
825means that this
826flag participates in
827.I cross-product
828formation.
829This only matters if the
830file contains both prefix and suffix tables.
831If so, all prefixes and
832suffixes marked with an asterisk will be applied in all cross-combinations
833to the root word.
834For example, consider the root
835.I fix
836with prefixes
837.I pre
838and
839.IR in ,
840and suffixes
841.I es
842and
843.IR ed .
844If all flags controlling these prefixes and suffixes are marked with an
845asterisk, then the single root
846.I fix
847would also generate
848.IR prefix ,
849.IR prefixes ,
850.IR prefixed ,
851.IR infix ,
852.IR infixes ,
853.IR infixed ,
854.IR fix ,
855.IR fixes ,
856and
857.IR fixed .
858Cross-product formation can produce a large number of words quickly, some
859of which may be illegal, so watch out.
860If cross-products produce illegal
861words,
862.I munchlist
863will not produce those flag combinations, and the flag will not be useful.
864.PP
865.RS
866.nf
867\fIrepl\fR      :       \fIcondition\fR* \fB>\fR [ \fB- \fIstrip-string \fB,\fR ] \fIappend-string\fR
868.fi
869.RE
870.PP
871The \fB~\fR option specifies that the associated flag is only active
872when a compound word is being formed.
873This is useful in a language like German, where the form of a word
874sometimes changes inside a compound.
875.PP
876A
877.I repl
878is a conditional rule for modifying a root word.
879Up to 8
880.I conditions
881may be specified.
882If the
883.I conditions
884are satisfied, the
885rules on the right-hand side of the
886.I repl
887are applied, as follows:
888.IP (1)
889If a strip-string is given, it is first stripped from
890the beginning or ending (as appropriate) of the root word.
891.IP (2)
892Then the append-string is added at that point.
893.PP
894For example, the
895.I condition
896.B .
897means "any word", and the
898.I condition
899.B Y
900means "any word ending in Y".
901The following (suffix) replacements:
902.PP
903.RS
904.nf
905\&.     >       MENT
906Y       >       -Y,IES
907.fi
908.RE
909.PP
910would change
911.I induce
912to
913.I inducement
914and
915.I fly
916to
917.IR flies .
918(If they were controlled by the same flag, they would also change
919.I fly
920to
921.IR flyment ,
922which might not be what was wanted.
923.I Munchlist
924can be used to protect against this sort of problem;
925see the command sequence given below.)
926.PP
927No matter how much you might wish it, the strings on the right must be
928strings of specific characters, not ranges.
929The reasons are rooted deeply in the way
930.I ispell
931works, and it would be difficult or impossible to provide
932for more flexibility.
933For example, you might wish to write:
934.PP
935.RS
936.nf
937[EY]    >       -[EY],IES
938.fi
939.RE
940.PP
941This will not work.
942Instead, you must use two separate rules:
943.PP
944.RS
945.nf
946E       >       -E,IES
947Y       >       -Y,IES
948.fi
949.RE
950.PP
951The application of
952.IR repl s
953can be restricted to certain words with
954.IR conditions :
955.PP
956.RS
957.nf
958\fIcondition\fR :       { \fB.\fR | \fIcharacter\fR | \fIrange\fR }
959.fi
960.RE
961.PP
962A
963.I condition
964is a restriction on the characters that adjoin, and/or are
965replaced by, the right-hand side of the
966.IR repl .
967Up to 8
968.I conditions
969may be given, which should be enough context for anyone.
970The right-hand side will be applied only if the
971.I conditions
972in the
973.I repl
974are satisfied.
975The
976.I conditions
977also implicitly define a length;
978roots shorter than the number of
979.I conditions
980will not pass the test.
981(As a special case, a
982.I condition
983of a single dot "." defines a length of zero,
984so that the rule applies to all words indiscriminately).
985This length is independent of the separate test that insists that
986all flags produce an output word length of at least four.
987.PP
988.I
989Conditions
990that are single characters should be separated by white space.
991For example, to specify words ending in "ED", write:
992.PP
993.RS
994.nf
995E D     >       -ED,ING         # As in covered > covering
996.fi
997.RE
998.PP
999If you write:
1000.PP
1001.RS
1002.nf
1003ED      >       -ED,ING
1004.fi
1005.RE
1006.PP
1007the effect will be the same as:
1008.PP
1009.RS
1010.nf
1011[ED]    >       -ED,ING
1012.fi
1013.RE
1014.PP
1015As a final minor, but important point, it is sometimes useful to rebuild
1016a dictionary file using an incompatible suffix file.
1017For example,
1018suppose you expanded the "R" flag to generate "er" and "ers" (thus
1019making the Z flag somewhat obsolete).
1020To build a new dictionary
1021.I newdict
1022that, using
1023.IR newaffixes ,
1024will accept exactly the same list of
1025words as the old list
1026.I olddict
1027did using
1028.IR oldaffixes ,
1029the
1030.B \-c
1031switch of
1032.I munchlist
1033is useful, as in the following example:
1034.PP
1035.RS
1036.nf
1037$ munchlist -c oldaffixes -l newaffixes olddict > newdict
1038.fi
1039.RE
1040.PP
1041If you use this procedure, your new dictionary will always accept the
1042same list the original did, even if you badly screwed up the affix
1043file.
1044This is because
1045.I munchlist
1046compares the words generated by a flag with the original word list, and
1047refuses to use any flags that generate illegal words.
1048(But don't forget that the
1049.I munchlist
1050step takes a long time and eats up temporary file space).
1051.SH EXAMPLES
1052.PP
1053As an example of conditional suffixes, here is the specification of the
1054.B S
1055flag from the English affix file:
1056.PP
1057.RS
1058.nf
1059flag *S:
1060    [^AEIOU]Y   >       -Y,IES  # As in imply > implies
1061    [AEIOU]Y    >       S               # As in convey > conveys
1062    [SXZH]      >       ES              # As in fix > fixes
1063    [^SXZHY]    >       S               # As in bat > bats
1064.fi
1065.RE
1066.PP
1067The first line applies to words ending in Y, but not in vowel-Y.
1068The second takes care of the vowel-Y words.
1069The third then handles those words that end in a sibilant
1070or near-sibilant, and the last picks up everything else.
1071.PP
1072Note that the
1073.I conditions
1074are written very carefully so that they apply
1075to disjoint sets of words.
1076In particular, note that the fourth line
1077excludes words ending in Y as well as the obvious SXZH.
1078Otherwise, it would convert "imply" into "implys".
1079.PP
1080Although the English affix file does not do so, you can also have a flag
1081generate more than one variation on a root word.
1082For example, we could extend the English "R" flag as follows:
1083.PP
1084.RS
1085.nf
1086flag *R:
1087   E                    >       R               # As in skate > skater
1088   E                    >       RS              # As in skate > skaters
1089   [^AEIOU]Y    >       -Y,IER  # As in multiply > multiplier
1090   [^AEIOU]Y    >       -Y,IERS # As in multiply > multipliers
1091   [AEIOU]Y     >       ER              # As in convey > conveyer
1092   [AEIOU]Y     >       ERS             # As in convey > conveyers
1093   [^EY]                >       ER              # As in build > builder
1094   [^EY]                >       ERS             # As in build > builders
1095.fi
1096.RE
1097.PP
1098This flag would generate both "skater" and "skaters" from "skate".
1099This capability can be very useful in languages that make use of noun, verb,
1100and adjective endings.
1101For instance, one could define a single flag
1102that generated all of the German "weak" verb endings.
1103.SH "SEE ALSO"
1104ispell(1)
Note: See TracBrowser for help on using the repository browser.