1 | .\" |
---|
2 | .\" $Id: ispell.4,v 1.1.1.1 1997-09-03 21:08:11 ghudson Exp $ |
---|
3 | .\" |
---|
4 | .\" Copyright 1992, 1993, Geoff Kuenning, Granada Hills, CA |
---|
5 | .\" All rights reserved. |
---|
6 | .\" |
---|
7 | .\" Redistribution and use in source and binary forms, with or without |
---|
8 | .\" modification, are permitted provided that the following conditions |
---|
9 | .\" are met: |
---|
10 | .\" |
---|
11 | .\" 1. Redistributions of source code must retain the above copyright |
---|
12 | .\" notice, this list of conditions and the following disclaimer. |
---|
13 | .\" 2. Redistributions in binary form must reproduce the above copyright |
---|
14 | .\" notice, this list of conditions and the following disclaimer in the |
---|
15 | .\" documentation and/or other materials provided with the distribution. |
---|
16 | .\" 3. All modifications to the source code must be clearly marked as |
---|
17 | .\" such. Binary redistributions based on modified source code |
---|
18 | .\" must be clearly marked as modified versions in the documentation |
---|
19 | .\" and/or other materials provided with the distribution. |
---|
20 | .\" 4. All advertising materials mentioning features or use of this software |
---|
21 | .\" must display the following acknowledgment: |
---|
22 | .\" This product includes software developed by Geoff Kuenning and |
---|
23 | .\" other unpaid contributors. |
---|
24 | .\" 5. The name of Geoff Kuenning may not be used to endorse or promote |
---|
25 | .\" products derived from this software without specific prior |
---|
26 | .\" written permission. |
---|
27 | .\" |
---|
28 | .\" THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS IS'' AND |
---|
29 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
---|
30 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE |
---|
31 | .\" ARE DISCLAIMED. IN NO EVENT SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE |
---|
32 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL |
---|
33 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS |
---|
34 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) |
---|
35 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT |
---|
36 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY |
---|
37 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF |
---|
38 | .\" SUCH DAMAGE. |
---|
39 | .\" |
---|
40 | .\" $Log: not supported by cvs2svn $ |
---|
41 | .\" Revision 1.30 1995/08/05 23:19:39 geoff |
---|
42 | .\" Fix a place where a line was eaten because it was seen as an nroff |
---|
43 | .\" command. |
---|
44 | .\" |
---|
45 | .\" Revision 1.29 1995/01/08 23:23:45 geoff |
---|
46 | .\" Fix a tiny typo. |
---|
47 | .\" |
---|
48 | .\" Revision 1.28 1994/11/02 06:56:07 geoff |
---|
49 | .\" Remove the anyword feature, which I've decided is a bad idea. |
---|
50 | .\" |
---|
51 | .\" Revision 1.27 1994/10/26 05:12:31 geoff |
---|
52 | .\" Document the new compound-word options for German and Scandinavian |
---|
53 | .\" languages, and the always-OK flag for French. |
---|
54 | .\" |
---|
55 | .\" Revision 1.26 1994/05/25 04:29:19 geoff |
---|
56 | .\" Document the new restriction that boundary characters must appear |
---|
57 | .\" singly. |
---|
58 | .\" |
---|
59 | .\" Revision 1.25 1994/01/25 07:11:42 geoff |
---|
60 | .\" Get rid of all old RCS log lines in preparation for the 3.1 release. |
---|
61 | .\" |
---|
62 | .\" |
---|
63 | .TH ISPELL 4 local |
---|
64 | .SH NAME |
---|
65 | ispell \- format of ispell dictionaries and affix files |
---|
66 | .SH DESCRIPTION |
---|
67 | .PP |
---|
68 | .IR Ispell (1) |
---|
69 | requires two files to define the language that it is spell-checking. |
---|
70 | The first file is a dictionary containing words for the language, |
---|
71 | and the second is an "affix" file that defines the meaning of special |
---|
72 | flags in the dictionary. |
---|
73 | The two files are combined by |
---|
74 | .I buildhash |
---|
75 | (see |
---|
76 | .IR ispell "(1))" |
---|
77 | and written to a hash file which is not described here. |
---|
78 | .PP |
---|
79 | A raw |
---|
80 | .I ispell |
---|
81 | dictionary (either the main dictionary or your own personal |
---|
82 | dictionary) contains a list of words, one per line. |
---|
83 | Each word may optionally be followed by a slash ("/") and one or more |
---|
84 | flags, which modify the root word as explained below. |
---|
85 | Depending on the options with which |
---|
86 | .I ispell |
---|
87 | was built, case may or |
---|
88 | may not be significant in either the root word or the flags, independently. |
---|
89 | Specifically, if the compile-time option CAPITALIZATION is defined, case |
---|
90 | is significant in the root word; |
---|
91 | if not, case is ignored in the root word. |
---|
92 | If the compile-time option MASKBITS is set to a value of 32, case is ignored |
---|
93 | in the flags; |
---|
94 | otherwise case is significant in the flags. |
---|
95 | Contact your system administrator or |
---|
96 | .I ispell |
---|
97 | maintainer for more information (or use the |
---|
98 | .B \-vv |
---|
99 | flag to find out). |
---|
100 | The dictionary should be sorted with the |
---|
101 | .B \-f |
---|
102 | flag of |
---|
103 | .IR sort (1) |
---|
104 | before the hash file is built; |
---|
105 | this is done automatically by |
---|
106 | .IR munchlist (1), |
---|
107 | which is the normal way of building dictionaries. |
---|
108 | .PP |
---|
109 | If the dictionary contains words that have string characters (see the |
---|
110 | affix-file documentation below), they must be written in the format |
---|
111 | given by the |
---|
112 | .B defstringtype |
---|
113 | statement in the affix file. |
---|
114 | This will be the case for most non-English languages. |
---|
115 | Be careful to use this format, rather than that of your favorite |
---|
116 | formatter, when adding words to a dictionary. (If you add words to |
---|
117 | your personal dictionary during an |
---|
118 | .I ispell |
---|
119 | session, they will automatically be converted to the correct format. |
---|
120 | This feature can be used to convert an entire dictionary if necessary:) |
---|
121 | .PP |
---|
122 | .RS |
---|
123 | .nf |
---|
124 | echo qqqqq > dummy.dict |
---|
125 | buildhash dummy.dict \fIaffix-file\fP dummy.hash |
---|
126 | awk '{print "*"}END{print "#"}' \fIold-dict-file\fP \e |
---|
127 | | ispell -a -T \fIold-dict-string-type\fP \e |
---|
128 | -d ./dummy.hash -p ./\fInew-dict-file\fP \e |
---|
129 | > /dev/null |
---|
130 | rm dummy.* |
---|
131 | .fi |
---|
132 | .RE |
---|
133 | .PP |
---|
134 | The case of the root word controls the case of words accepted by |
---|
135 | .IR ispell , |
---|
136 | as follows: |
---|
137 | .IP (1) |
---|
138 | If the root word appears only in lower case (e.g., |
---|
139 | .IR bob ")," |
---|
140 | it will be accepted in lower case, capitalized, or all capitals. |
---|
141 | .IP (2) |
---|
142 | If the root word appears capitalized (e.g., |
---|
143 | .IR Robert ")," |
---|
144 | it will not |
---|
145 | be accepted in |
---|
146 | all-lower case, but will be accepted capitalized or all in capitals. |
---|
147 | .IP (3) |
---|
148 | If the root word appears all in capitals (e.g., |
---|
149 | .IR UNIX ")," |
---|
150 | it will only be accepted all in capitals. |
---|
151 | .IP (4) |
---|
152 | If the root word appears with a "funny" capitalization (e.g., |
---|
153 | .IR ITCorp ")," |
---|
154 | a word will be accepted only if it follows that capitalization, or if |
---|
155 | it appears all in capitals. |
---|
156 | .IP (5) |
---|
157 | More than one capitalization of a root word may appear in the dictionary. |
---|
158 | Flags from different capitalizations are combined by OR-ing them together. |
---|
159 | .PP |
---|
160 | Redundant capitalizations (e.g., |
---|
161 | .I bob |
---|
162 | and |
---|
163 | .IR Bob ")" |
---|
164 | will be combined |
---|
165 | by |
---|
166 | .I buildhash |
---|
167 | and by |
---|
168 | .I ispell |
---|
169 | (for personal dictionaries), |
---|
170 | and can be removed from a raw dictionary by |
---|
171 | .IR munchlist . |
---|
172 | .PP |
---|
173 | For example, the dictionary: |
---|
174 | .PP |
---|
175 | .RS |
---|
176 | .nf |
---|
177 | bob |
---|
178 | Robert |
---|
179 | UNIX |
---|
180 | ITcorp |
---|
181 | ITCorp |
---|
182 | .fi |
---|
183 | .RE |
---|
184 | .PP |
---|
185 | will accept |
---|
186 | .IR bob , |
---|
187 | .IR Bob , |
---|
188 | .IR BOB , |
---|
189 | .IR Robert , |
---|
190 | .IR ROBERT , |
---|
191 | .IR UNIX , |
---|
192 | .IR ITcorp , |
---|
193 | .IR ITCorp , |
---|
194 | and |
---|
195 | .IR ITCORP , |
---|
196 | and will reject all others. |
---|
197 | Some of the unacceptable forms are |
---|
198 | .IR bOb , |
---|
199 | .IR robert , |
---|
200 | .IR Unix , |
---|
201 | and |
---|
202 | .IR ItCorp . |
---|
203 | .PP |
---|
204 | As mentioned above, root words in any dictionary may be extended by flags. |
---|
205 | Each flag is a single alphabetic character, which represents a prefix or |
---|
206 | suffix |
---|
207 | that may be added to the root to form a new word. |
---|
208 | For example, in an English dictionary |
---|
209 | the |
---|
210 | .B D |
---|
211 | flag can be added to |
---|
212 | .I bathe |
---|
213 | to make |
---|
214 | .IR bathed . |
---|
215 | Since flags are represented as a single bit in the hashed dictionary, this |
---|
216 | results in significant space savings. |
---|
217 | The |
---|
218 | .I munchlist |
---|
219 | script will reduce an existing raw dictionary by adding flags when possible. |
---|
220 | .PP |
---|
221 | When a word is extended with an affix, the affix will be accepted only |
---|
222 | if it appears in the same case |
---|
223 | as the initial (prefix) or final (suffix) letter of the word. |
---|
224 | Thus, for example, the entry |
---|
225 | .I UNIX/M |
---|
226 | in the main dictionary |
---|
227 | .RB "(" M |
---|
228 | means |
---|
229 | add an apostrophe and an "s" to make a possessive) would accept |
---|
230 | .I "UNIX'S" |
---|
231 | but would reject |
---|
232 | .IR "UNIX's" . |
---|
233 | If |
---|
234 | .I "UNIX's" |
---|
235 | is legal, it must appear as a separate dictionary entry, |
---|
236 | and it will not be combined by |
---|
237 | .IR munchlist . |
---|
238 | (In general, you don't need to worry about these things; |
---|
239 | .I munchlist |
---|
240 | guarantees that its output dictionary will accept the same set of |
---|
241 | words as its input, so all you have to do is add words to the dictionary |
---|
242 | and occasionally run munchlist to reduce its size). |
---|
243 | .PP |
---|
244 | As mentioned, the affix definition file describes the affixes associated |
---|
245 | with particular flags. |
---|
246 | It also describes the character set used by the language. |
---|
247 | .PP |
---|
248 | Although the affix-definition |
---|
249 | grammar is designed for a line-oriented layout, it is actually |
---|
250 | a free-format yacc grammar and can be laid out weirdly if you want. |
---|
251 | Comments are started by a pound (sharp) sign (#), |
---|
252 | and continue to the end of the line. |
---|
253 | Backslashes are supported in the usual fashion (\fB\e\fInnn\fR, plus |
---|
254 | specials |
---|
255 | .BR \en , |
---|
256 | .BR \er , |
---|
257 | .BR \et , |
---|
258 | .BR \ev , |
---|
259 | .BR \ef , |
---|
260 | .BR \eb , |
---|
261 | and the new hex format \fB\ex\fInn\fR). |
---|
262 | Any character |
---|
263 | with special meaning to the parser can be changed to an uninterpreted |
---|
264 | token by backslashing it; |
---|
265 | for example, you can declare a flag named 'asterisk' or 'colon' with |
---|
266 | .I "flag \e*:" |
---|
267 | or |
---|
268 | .IR "flag \e::" . |
---|
269 | .PP |
---|
270 | The grammar will be presented in a top-down fashion, with discussion |
---|
271 | of each element. |
---|
272 | An affix-definition file must contain exactly one table: |
---|
273 | .PP |
---|
274 | .RS |
---|
275 | .nf |
---|
276 | \fItable\fR : [\fIheaders\fR] [\fIprefixes\fR] [\fIsuffixes\fR] |
---|
277 | .fi |
---|
278 | .RE |
---|
279 | .PP |
---|
280 | At least one of |
---|
281 | .I prefixes |
---|
282 | and |
---|
283 | .I suffixes |
---|
284 | is required. |
---|
285 | They can appear in either order. |
---|
286 | .PP |
---|
287 | .RS |
---|
288 | .nf |
---|
289 | \fIheaders\fR : [ \fIoptions\fR ] \fIchar-sets\fR |
---|
290 | .fi |
---|
291 | .RE |
---|
292 | .PP |
---|
293 | The headers describe options global to this dictionary and language. |
---|
294 | These include the character sets to be used and the formatter, and |
---|
295 | the defaults for certain |
---|
296 | .I ispell |
---|
297 | flags. |
---|
298 | .PP |
---|
299 | .RS |
---|
300 | .nf |
---|
301 | \fIoptions\fR : { \fIfmtr-stmt\fR | \fIopt-stmt\fR | \fIflag-stmt\fR | \fInum-stmt\fR } |
---|
302 | .fi |
---|
303 | .RE |
---|
304 | .PP |
---|
305 | The options statements define the defaults for certain ispell flags |
---|
306 | and for the character sets used by the formatters. |
---|
307 | .PP |
---|
308 | .RS |
---|
309 | .nf |
---|
310 | \fIfmtr-stmt\fR : { \fInroff-stmt\fR | \fItex-stmt\fR } |
---|
311 | .fi |
---|
312 | .RE |
---|
313 | .PP |
---|
314 | A |
---|
315 | .I fmtr-stmt |
---|
316 | describes characters that have special meaning to a formatter. |
---|
317 | Normally, this statement is not necessary, but some languages may have |
---|
318 | preempted the usual defaults for use as language-specific characters. |
---|
319 | In this case, these statements may be used to redefine the special characters |
---|
320 | expected by the formatter. |
---|
321 | .PP |
---|
322 | .RS |
---|
323 | .nf |
---|
324 | \fInroff-stmt\fR : { \fBnroffchars\fR | \fBtroffchars\fR } \fIstring\fR |
---|
325 | .fi |
---|
326 | .RE |
---|
327 | .PP |
---|
328 | The |
---|
329 | .B nroffchars |
---|
330 | statement allows redefinition of certain |
---|
331 | .I nroff |
---|
332 | control characters. |
---|
333 | The string given must be exactly five characters long, and must list |
---|
334 | substitutions for the left and right parentheses ("()") , the period ("."), |
---|
335 | the backslash ("\e"), and the asterisk ("*"). |
---|
336 | (The right parenthesis is not currently used, but is included for |
---|
337 | completeness.) |
---|
338 | For example, the statement: |
---|
339 | .PP |
---|
340 | .RS |
---|
341 | .nf |
---|
342 | \fBnroffchars\fR {}.\e\e* |
---|
343 | .fi |
---|
344 | .RE |
---|
345 | .PP |
---|
346 | would replace the left and right parentheses with left and right curly |
---|
347 | braces for purposes of parsing |
---|
348 | .IR nroff / troff |
---|
349 | strings, with no effect on the others (admittedly a contrived example). |
---|
350 | Note that the backslash is escaped with a backslash. |
---|
351 | .PP |
---|
352 | .RS |
---|
353 | .nf |
---|
354 | \fItex-stmt\fR : { \fBTeXchars\fR | \fBtexchars\fR } \fIstring\fR |
---|
355 | .fi |
---|
356 | .RE |
---|
357 | .PP |
---|
358 | The |
---|
359 | .B TeXchars |
---|
360 | statement allows redefinition of certain TeX/LaTeX control characters. |
---|
361 | The string given must be exactly thirteen characters long, and must list |
---|
362 | substitutions for the left and right parentheses ("()") , the left |
---|
363 | and right square brackets ("[]"), the left and right curly braces ("{}"), |
---|
364 | the left and right angle brackets ("<>"), |
---|
365 | the backslash ("\e"), the dollar sign ("$"), the asterisk ("*"), |
---|
366 | the period or dot ("."), and the percent sign ("%"). |
---|
367 | For example, the statement: |
---|
368 | .PP |
---|
369 | .RS |
---|
370 | .nf |
---|
371 | \fBtexchars\fR ()\e[\|]<\e><\e>\e\e$*.% |
---|
372 | .fi |
---|
373 | .RE |
---|
374 | .PP |
---|
375 | would replace the functions of the left and right curly braces with the |
---|
376 | left and right angle brackets for purposes of parsing TeX/LaTeX constructs, |
---|
377 | while retaining their functions for the |
---|
378 | .I tib |
---|
379 | bibliographic preprocessor. |
---|
380 | Note that the backslash, the left square bracket, and the right angle bracket |
---|
381 | must be escaped with a backslash. |
---|
382 | .PP |
---|
383 | .RS |
---|
384 | .nf |
---|
385 | \fIopt-stmt\fR : { \fIcmpnd-stmt\fR | \fIaff-stmt\fR } |
---|
386 | .sp |
---|
387 | \fIcmpnd-stmt\fR : \fBcompoundwords\fR \fIcompound-opt\fR |
---|
388 | .sp |
---|
389 | \fIaff-stmt\fR : \fBallaffixes\fR \fIon-or-off\fR |
---|
390 | .sp |
---|
391 | \fIon-or-off\fR : { \fBon\fR | \fBoff\fR } |
---|
392 | .sp |
---|
393 | \fIcompound-opt\fR : { \fIon-or-off\fR | \fBcontrolled\fR \fIcharacter\fR } |
---|
394 | .fi |
---|
395 | .RE |
---|
396 | .PP |
---|
397 | An |
---|
398 | .I opt-stmt |
---|
399 | controls certain ispell defaults that are best made language-specific. |
---|
400 | The |
---|
401 | .B allaffixes |
---|
402 | statement controls the default for the |
---|
403 | .B \-P |
---|
404 | and |
---|
405 | .B \-m |
---|
406 | options to |
---|
407 | .I ispell. |
---|
408 | If |
---|
409 | .B allaffixes |
---|
410 | is turned |
---|
411 | .B off |
---|
412 | (the default), |
---|
413 | .I ispell |
---|
414 | will default to the behavior of the |
---|
415 | .I \-P |
---|
416 | flag: |
---|
417 | root/affix suggestions will only be made if there are no "near misses". |
---|
418 | If |
---|
419 | .B allaffixes |
---|
420 | is turned |
---|
421 | .BR on , |
---|
422 | .I ispell |
---|
423 | will default to the behavior of the |
---|
424 | .I \-m |
---|
425 | flag: |
---|
426 | root/affix suggestions will always be made. |
---|
427 | The |
---|
428 | .B compoundwords |
---|
429 | statement controls the default for the |
---|
430 | .B \-B |
---|
431 | and |
---|
432 | .B \-C |
---|
433 | options to |
---|
434 | .I ispell. |
---|
435 | If |
---|
436 | .B compoundwords |
---|
437 | is turned |
---|
438 | .B off |
---|
439 | (the default), |
---|
440 | .I ispell |
---|
441 | will default to the behavior of the |
---|
442 | .I \-B |
---|
443 | flag: |
---|
444 | run-together words will be reported as errors. |
---|
445 | If |
---|
446 | .B compoundwords |
---|
447 | is turned |
---|
448 | .BR on , |
---|
449 | .I ispell |
---|
450 | will default to the behavior of the |
---|
451 | .I \-C |
---|
452 | flag: |
---|
453 | run-together words will be considered as compounds if both are in |
---|
454 | the dictionary. |
---|
455 | This is useful for languages such as German and Norwegian, which |
---|
456 | form large numbers of compound words. |
---|
457 | Finally, if |
---|
458 | .B compoundwords |
---|
459 | is set to |
---|
460 | .IR controlled , |
---|
461 | only words marked with the flag indicated by |
---|
462 | .I character |
---|
463 | (which should not be otherwise used) |
---|
464 | will be allowed to participate in compound formation. |
---|
465 | Because this option requires the flags to be specified in the dictionary, |
---|
466 | it is not available from the command line. |
---|
467 | .PP |
---|
468 | .RS |
---|
469 | .nf |
---|
470 | \fIflag-stmt\fR : \fBflagmarker\fR \fIcharacter\fR |
---|
471 | .fi |
---|
472 | .RE |
---|
473 | .PP |
---|
474 | The |
---|
475 | .B flagmarker |
---|
476 | statement describes the character which is used to separate affix |
---|
477 | flags from the root word in a raw dictionary file. |
---|
478 | This must be a |
---|
479 | character which is not found in any word (including in string characters; |
---|
480 | see below). |
---|
481 | The default is "/" because this character is not normally |
---|
482 | used to represent special characters in any language. |
---|
483 | .PP |
---|
484 | .RS |
---|
485 | .nf |
---|
486 | \fInum-stmt\fR : \fBcompoundmin\fR \fIdigit\fR |
---|
487 | .fi |
---|
488 | .RE |
---|
489 | .PP |
---|
490 | The |
---|
491 | .B compoundmin |
---|
492 | statement controls the length of the two components of a compound |
---|
493 | word. |
---|
494 | This only has an effect if |
---|
495 | .B compoundwords |
---|
496 | is turned |
---|
497 | .B on |
---|
498 | or if the |
---|
499 | .B \-C |
---|
500 | flag is given to |
---|
501 | .IR ispell . |
---|
502 | In that case, only words at least as long as the given minimum will be |
---|
503 | accepted as components of a compound. |
---|
504 | The default is 3 characters. |
---|
505 | .PP |
---|
506 | .RS |
---|
507 | .nf |
---|
508 | \fIchar-sets\fR : \fInorm-sets\fR [ \fIalt-sets\fR ] |
---|
509 | .fi |
---|
510 | .RE |
---|
511 | .PP |
---|
512 | The character-set section describes the characters that can be part of |
---|
513 | a word, and defines their collating order. |
---|
514 | There must always be a definition of "normal" character sets; in |
---|
515 | addition, there may be one or more partial definitions of "alternate" |
---|
516 | sets which are used with various text formatters. |
---|
517 | .PP |
---|
518 | .RS |
---|
519 | .nf |
---|
520 | \fInorm-sets\fR : [ \fIdeftype\fR ] charset-group |
---|
521 | .fi |
---|
522 | .RE |
---|
523 | .PP |
---|
524 | A "normal" character set may optionally begin with a |
---|
525 | definition of the file suffixes that make use of this set. |
---|
526 | Following this are one or more character-set declarations. |
---|
527 | .PP |
---|
528 | .RS |
---|
529 | .nf |
---|
530 | \fIdeftype\fR : \fBdefstringtype\fR \fIname\fR \fIdeformatter\fR \fIsuffix\fR* |
---|
531 | .fi |
---|
532 | .RE |
---|
533 | .PP |
---|
534 | The |
---|
535 | .B defstringtype |
---|
536 | declaration gives a list of file suffixes which should make use of the |
---|
537 | default string characters defined as part of the base character set; |
---|
538 | it is only necessary if string characters are being defined. |
---|
539 | The |
---|
540 | .I name |
---|
541 | parameter |
---|
542 | is a string giving the unique name associated with these suffixes; |
---|
543 | often it is a formatter name. |
---|
544 | If the formatter is a member of the troff family, "nroff" should be |
---|
545 | used for the name associated with the most popular macro package; |
---|
546 | members of the TeX family should use "tex". |
---|
547 | Other names may be chosen freely, but they should be kept simple, |
---|
548 | as they are used in |
---|
549 | .I ispell 's |
---|
550 | .B \-T |
---|
551 | switch to specify a formatter type. |
---|
552 | The |
---|
553 | .I deformatter |
---|
554 | parameter |
---|
555 | specifies the deformatting style to use when processing files with the |
---|
556 | given suffixes. |
---|
557 | Currently, this must be either |
---|
558 | .B tex |
---|
559 | or |
---|
560 | .BR nroff . |
---|
561 | The |
---|
562 | .I suffix |
---|
563 | parameters are a whitespace-separated list of strings which, if |
---|
564 | present at the end of a filename, indicate that the associated set of |
---|
565 | string characters should be used by default for this file. For |
---|
566 | example, the suffix list for the troff family typically includes |
---|
567 | suffixes such as ".ms", ".me", ".mm", etc. |
---|
568 | .PP |
---|
569 | .RS |
---|
570 | .nf |
---|
571 | \fIcharset-group\fR : { \fIchar-stmt\fR | \fIstring-stmt\fR | \fIdup-stmt\fR}* |
---|
572 | .fi |
---|
573 | .RE |
---|
574 | .PP |
---|
575 | A |
---|
576 | .I char-stmt |
---|
577 | describes single characters; |
---|
578 | a |
---|
579 | .I string-stmt |
---|
580 | describes characters that must appear together as a string, and which |
---|
581 | usually represent a single character in the target language. |
---|
582 | Either may |
---|
583 | also describe conversion between upper and lower case. |
---|
584 | A |
---|
585 | .I dup-stmt |
---|
586 | is used to describe alternate forms of string characters, so that a |
---|
587 | single dictionary may be used with several formatting |
---|
588 | programs that use different conventions for representing non-ASCII |
---|
589 | characters. |
---|
590 | .PP |
---|
591 | .RS |
---|
592 | .nf |
---|
593 | \fIchar-stmt\fR : \fBwordchars\fR \fIcharacter-range\fR |
---|
594 | | \fBwordchars\fR \fIlowercase-range\fR \fIuppercase-range\fR |
---|
595 | | \fBboundarychars\fR \fIcharacter-range\fR |
---|
596 | | \fBboundarychars\fR \fIlowercase-range\fR \fIuppercase-range\fR |
---|
597 | \fIstring-stmt\fR : \fBstringchar\fR \fIstring\fR |
---|
598 | | \fBstringchar\fR \fIlowercase-string\fR \fIuppercase-string\fR |
---|
599 | .fi |
---|
600 | .RE |
---|
601 | .PP |
---|
602 | Characters described with the |
---|
603 | .B boundarychars |
---|
604 | statement are considered |
---|
605 | part of a word only if they appear singly, |
---|
606 | embedded between characters declared with the |
---|
607 | .B wordchars |
---|
608 | or |
---|
609 | .B stringchar |
---|
610 | statements. |
---|
611 | For example, if the hyphen is a boundary character (useful in French), |
---|
612 | the string "foo-bar" would be a single word, but "-foo" would be the |
---|
613 | same as "foo", and "foo--bar" would be two words separated by non-word |
---|
614 | characters. |
---|
615 | .PP |
---|
616 | If two ranges or strings are given in a |
---|
617 | .I char-stmt |
---|
618 | or |
---|
619 | .IR string-stmt , |
---|
620 | the first describes |
---|
621 | characters that are interpreted as lowercase and the second describes |
---|
622 | uppercase. |
---|
623 | In the case of a |
---|
624 | .B stringchar |
---|
625 | statement, the two strings must be of the same length. |
---|
626 | Also, in a |
---|
627 | .B stringchar |
---|
628 | statement, the actual strings may contain |
---|
629 | both uppercase and characters themselves without difficulty; |
---|
630 | for instance, the statement |
---|
631 | .PP |
---|
632 | .RS |
---|
633 | .nf |
---|
634 | stringchar "\e\e*(sS" "\e\e*(Ss" |
---|
635 | .fi |
---|
636 | .RE |
---|
637 | .PP |
---|
638 | is legal and will not interfere with (or be interfered with by) other |
---|
639 | declarations of of "s" and "S" as lower and upper case, respectively. |
---|
640 | .PP |
---|
641 | A final note on string characters: |
---|
642 | some languages collate certain special characters as if they were strings. |
---|
643 | For example, the German "a-umlaut" |
---|
644 | is traditionally sorted as if it were "ae". |
---|
645 | Ispell is not capable of this; |
---|
646 | each character must be treated as an individual entity. |
---|
647 | So in certain cases, |
---|
648 | ispell will sort a list of words into a different order than the standard |
---|
649 | "dictionary" order for the target language. |
---|
650 | .PP |
---|
651 | .RS |
---|
652 | .nf |
---|
653 | \fIalt-sets\fR : \fIalttype\fR [ \fIalt-stmt\fR* ] |
---|
654 | .fi |
---|
655 | .RE |
---|
656 | .PP |
---|
657 | Because different formatters use different notations to represent |
---|
658 | non-ASCII characters, |
---|
659 | .I ispell |
---|
660 | must be aware of the representations used by these formatters. |
---|
661 | These are declared as alternate sets of string characters. |
---|
662 | .PP |
---|
663 | .RS |
---|
664 | .nf |
---|
665 | \fIalttype\fR : \fBaltstringtype\fR \fIname\fR \fIsuffix\fR* |
---|
666 | .fi |
---|
667 | .RE |
---|
668 | .PP |
---|
669 | The |
---|
670 | .B altstringtype |
---|
671 | statement introduces each set by declaring the associated formatter |
---|
672 | name and filename suffix list. |
---|
673 | This name and list are interpreted exactly as in the |
---|
674 | .B defstringtype |
---|
675 | statement above. |
---|
676 | Following this header are one or more \fIalt-stmt\fRs which declare |
---|
677 | the alternate string characters used by this formatter. |
---|
678 | .PP |
---|
679 | .RS |
---|
680 | .nf |
---|
681 | \fIalt-stmt\fR : \fBaltstringchar\fR \fIalt-string\fR \fIstd-string\fR |
---|
682 | .fi |
---|
683 | .RE |
---|
684 | .PP |
---|
685 | The |
---|
686 | .I altstringchar |
---|
687 | statement describes alternate representations for string |
---|
688 | characters. |
---|
689 | For example, the \-mm macro package of |
---|
690 | .I troff |
---|
691 | represents the German "a-umlaut" as |
---|
692 | .IR a\e*: , |
---|
693 | while |
---|
694 | .I TeX |
---|
695 | uses the sequence \fI\e"a\fR. |
---|
696 | If the |
---|
697 | .I troff |
---|
698 | versions are declared as the standard versions using |
---|
699 | .BR stringchar , |
---|
700 | the |
---|
701 | .I TeX |
---|
702 | versions may be declared as alternates by using the statement |
---|
703 | .PP |
---|
704 | .RS |
---|
705 | .nf |
---|
706 | altstringchar \e\e\e"a a\e\e*\: |
---|
707 | .fi |
---|
708 | .RE |
---|
709 | .PP |
---|
710 | When the |
---|
711 | .B altstringchar |
---|
712 | statement is used to specify alternate forms, |
---|
713 | all forms for a particular formatter must be declared together as a group. |
---|
714 | Also, each formatter or macro package |
---|
715 | must provide a complete set of characters, both |
---|
716 | upper- and lower-case, and the character sequences used for each formatter |
---|
717 | must be completely distinct. |
---|
718 | Character sequences which describe upper- and lower-case versions of |
---|
719 | the same printable character must also be the same length. |
---|
720 | It may be necessary to define some new macros for a given formatter to |
---|
721 | satisfy these restrictions. |
---|
722 | (The current version of |
---|
723 | .I buildhash |
---|
724 | does not enforce these restrictions, but failure to obey them may |
---|
725 | result in errors being introduced into files that are processed with |
---|
726 | .IR ispell .) |
---|
727 | .PP |
---|
728 | An important minor point is that |
---|
729 | .I ispell |
---|
730 | assumes that all characters declared as |
---|
731 | .B wordchars |
---|
732 | or |
---|
733 | .B boundarychars |
---|
734 | will occupy exactly |
---|
735 | one position on the terminal screen. |
---|
736 | .PP |
---|
737 | A single character-set statement can declare either a single character |
---|
738 | or a contiguous range of characters. |
---|
739 | A range is given as in egrep and the shell: |
---|
740 | [a-z] means lowercase alphabetics; |
---|
741 | [^a-z] means all but lowercase, etc. |
---|
742 | All character-set statements are combined (unioned) to produce |
---|
743 | the final list of characters that may be part of a word. |
---|
744 | The collating order of the characters is defined by the order of their |
---|
745 | declaration; |
---|
746 | if a range is used, the characters are considered to have been declared |
---|
747 | in ASCII order. |
---|
748 | Characters that have case are collated next to each other, with the |
---|
749 | uppercase character first. |
---|
750 | .PP |
---|
751 | The |
---|
752 | character-declaration statements have a rather strange behavior caused by its |
---|
753 | need to match each lowercase character with its uppercase equivalent. |
---|
754 | In any given |
---|
755 | .B wordchars |
---|
756 | or |
---|
757 | .B boundarychars |
---|
758 | statement, the characters in each range are |
---|
759 | first sorted into ASCII collating sequence, then matched one-for-one |
---|
760 | with the other range. |
---|
761 | (The two ranges must have the same number of characters). |
---|
762 | Thus, for example, the two statements: |
---|
763 | .PP |
---|
764 | .RS |
---|
765 | .nf |
---|
766 | \fBwordchars\fP [aeiou] [AEIOU] |
---|
767 | \fBwordchars\fP [aeiou] [UOIEA] |
---|
768 | .fi |
---|
769 | .RE |
---|
770 | .PP |
---|
771 | would produce exactly the same effect. |
---|
772 | To get the vowels to match |
---|
773 | up "wrong", you would have to use separate statements: |
---|
774 | .PP |
---|
775 | .RS |
---|
776 | .nf |
---|
777 | \fBwordchars\fP a U |
---|
778 | \fBwordchars\fP e O |
---|
779 | \fBwordchars\fP i I |
---|
780 | \fBwordchars\fP o E |
---|
781 | \fBwordchars\fP u A |
---|
782 | .fi |
---|
783 | .RE |
---|
784 | .PP |
---|
785 | which would cause uppercase 'e' to be 'O', and lowercase 'O' to be 'e'. |
---|
786 | This should normally be a problem only with languages which have been |
---|
787 | forced to use a strange ASCII collating sequence. |
---|
788 | If your uppercase and lowercase letters both collate in the same order, |
---|
789 | you shouldn't have to worry about this "feature". |
---|
790 | .PP |
---|
791 | The prefixes and suffixes sections have exactly the same syntax, except |
---|
792 | for the introductory keyword. |
---|
793 | .PP |
---|
794 | .RS |
---|
795 | .nf |
---|
796 | \fIprefixes\fR : \fBprefixes\fI flagdef\fR* |
---|
797 | \fIsuffixes\fR : \fBsuffixes\fI flagdef\fR* |
---|
798 | \fIflagdef\fR : \fBflag\fR [\fB*\fR|\fB~\fR] \fIchar\fB : \fIrepl\fR* |
---|
799 | .fi |
---|
800 | .RE |
---|
801 | .PP |
---|
802 | A prefix or suffix table consists of an introductory keyword and a list |
---|
803 | of flag definitions. |
---|
804 | Flags can be defined more than once, in which case |
---|
805 | the definitions are combined. |
---|
806 | Each flag controls one or more |
---|
807 | .IR repl s |
---|
808 | (replacements) |
---|
809 | which are conditionally applied to the beginnings or endings of various |
---|
810 | words. |
---|
811 | .PP |
---|
812 | Flags are named by a single character |
---|
813 | .IR char . |
---|
814 | Depending on a configuration option, |
---|
815 | this character can be either any uppercase letter (the default |
---|
816 | configuration) or any 7-bit ASCII character. |
---|
817 | Most languages should be |
---|
818 | able to get along with just 26 flags. |
---|
819 | .PP |
---|
820 | A flag character may be prefixed with one or more option characters. |
---|
821 | (If you wish to use one of the option characters as a flag character, |
---|
822 | simply enclose it in double quotes.) |
---|
823 | .PP |
---|
824 | The asterisk (\fB*\fP) option |
---|
825 | means that this |
---|
826 | flag participates in |
---|
827 | .I cross-product |
---|
828 | formation. |
---|
829 | This only matters if the |
---|
830 | file contains both prefix and suffix tables. |
---|
831 | If so, all prefixes and |
---|
832 | suffixes marked with an asterisk will be applied in all cross-combinations |
---|
833 | to the root word. |
---|
834 | For example, consider the root |
---|
835 | .I fix |
---|
836 | with prefixes |
---|
837 | .I pre |
---|
838 | and |
---|
839 | .IR in , |
---|
840 | and suffixes |
---|
841 | .I es |
---|
842 | and |
---|
843 | .IR ed . |
---|
844 | If all flags controlling these prefixes and suffixes are marked with an |
---|
845 | asterisk, then the single root |
---|
846 | .I fix |
---|
847 | would also generate |
---|
848 | .IR prefix , |
---|
849 | .IR prefixes , |
---|
850 | .IR prefixed , |
---|
851 | .IR infix , |
---|
852 | .IR infixes , |
---|
853 | .IR infixed , |
---|
854 | .IR fix , |
---|
855 | .IR fixes , |
---|
856 | and |
---|
857 | .IR fixed . |
---|
858 | Cross-product formation can produce a large number of words quickly, some |
---|
859 | of which may be illegal, so watch out. |
---|
860 | If cross-products produce illegal |
---|
861 | words, |
---|
862 | .I munchlist |
---|
863 | will not produce those flag combinations, and the flag will not be useful. |
---|
864 | .PP |
---|
865 | .RS |
---|
866 | .nf |
---|
867 | \fIrepl\fR : \fIcondition\fR* \fB>\fR [ \fB- \fIstrip-string \fB,\fR ] \fIappend-string\fR |
---|
868 | .fi |
---|
869 | .RE |
---|
870 | .PP |
---|
871 | The \fB~\fR option specifies that the associated flag is only active |
---|
872 | when a compound word is being formed. |
---|
873 | This is useful in a language like German, where the form of a word |
---|
874 | sometimes changes inside a compound. |
---|
875 | .PP |
---|
876 | A |
---|
877 | .I repl |
---|
878 | is a conditional rule for modifying a root word. |
---|
879 | Up to 8 |
---|
880 | .I conditions |
---|
881 | may be specified. |
---|
882 | If the |
---|
883 | .I conditions |
---|
884 | are satisfied, the |
---|
885 | rules on the right-hand side of the |
---|
886 | .I repl |
---|
887 | are applied, as follows: |
---|
888 | .IP (1) |
---|
889 | If a strip-string is given, it is first stripped from |
---|
890 | the beginning or ending (as appropriate) of the root word. |
---|
891 | .IP (2) |
---|
892 | Then the append-string is added at that point. |
---|
893 | .PP |
---|
894 | For example, the |
---|
895 | .I condition |
---|
896 | .B . |
---|
897 | means "any word", and the |
---|
898 | .I condition |
---|
899 | .B Y |
---|
900 | means "any word ending in Y". |
---|
901 | The following (suffix) replacements: |
---|
902 | .PP |
---|
903 | .RS |
---|
904 | .nf |
---|
905 | \&. > MENT |
---|
906 | Y > -Y,IES |
---|
907 | .fi |
---|
908 | .RE |
---|
909 | .PP |
---|
910 | would change |
---|
911 | .I induce |
---|
912 | to |
---|
913 | .I inducement |
---|
914 | and |
---|
915 | .I fly |
---|
916 | to |
---|
917 | .IR flies . |
---|
918 | (If they were controlled by the same flag, they would also change |
---|
919 | .I fly |
---|
920 | to |
---|
921 | .IR flyment , |
---|
922 | which might not be what was wanted. |
---|
923 | .I Munchlist |
---|
924 | can be used to protect against this sort of problem; |
---|
925 | see the command sequence given below.) |
---|
926 | .PP |
---|
927 | No matter how much you might wish it, the strings on the right must be |
---|
928 | strings of specific characters, not ranges. |
---|
929 | The reasons are rooted deeply in the way |
---|
930 | .I ispell |
---|
931 | works, and it would be difficult or impossible to provide |
---|
932 | for more flexibility. |
---|
933 | For example, you might wish to write: |
---|
934 | .PP |
---|
935 | .RS |
---|
936 | .nf |
---|
937 | [EY] > -[EY],IES |
---|
938 | .fi |
---|
939 | .RE |
---|
940 | .PP |
---|
941 | This will not work. |
---|
942 | Instead, you must use two separate rules: |
---|
943 | .PP |
---|
944 | .RS |
---|
945 | .nf |
---|
946 | E > -E,IES |
---|
947 | Y > -Y,IES |
---|
948 | .fi |
---|
949 | .RE |
---|
950 | .PP |
---|
951 | The application of |
---|
952 | .IR repl s |
---|
953 | can be restricted to certain words with |
---|
954 | .IR conditions : |
---|
955 | .PP |
---|
956 | .RS |
---|
957 | .nf |
---|
958 | \fIcondition\fR : { \fB.\fR | \fIcharacter\fR | \fIrange\fR } |
---|
959 | .fi |
---|
960 | .RE |
---|
961 | .PP |
---|
962 | A |
---|
963 | .I condition |
---|
964 | is a restriction on the characters that adjoin, and/or are |
---|
965 | replaced by, the right-hand side of the |
---|
966 | .IR repl . |
---|
967 | Up to 8 |
---|
968 | .I conditions |
---|
969 | may be given, which should be enough context for anyone. |
---|
970 | The right-hand side will be applied only if the |
---|
971 | .I conditions |
---|
972 | in the |
---|
973 | .I repl |
---|
974 | are satisfied. |
---|
975 | The |
---|
976 | .I conditions |
---|
977 | also implicitly define a length; |
---|
978 | roots shorter than the number of |
---|
979 | .I conditions |
---|
980 | will not pass the test. |
---|
981 | (As a special case, a |
---|
982 | .I condition |
---|
983 | of a single dot "." defines a length of zero, |
---|
984 | so that the rule applies to all words indiscriminately). |
---|
985 | This length is independent of the separate test that insists that |
---|
986 | all flags produce an output word length of at least four. |
---|
987 | .PP |
---|
988 | .I |
---|
989 | Conditions |
---|
990 | that are single characters should be separated by white space. |
---|
991 | For example, to specify words ending in "ED", write: |
---|
992 | .PP |
---|
993 | .RS |
---|
994 | .nf |
---|
995 | E D > -ED,ING # As in covered > covering |
---|
996 | .fi |
---|
997 | .RE |
---|
998 | .PP |
---|
999 | If you write: |
---|
1000 | .PP |
---|
1001 | .RS |
---|
1002 | .nf |
---|
1003 | ED > -ED,ING |
---|
1004 | .fi |
---|
1005 | .RE |
---|
1006 | .PP |
---|
1007 | the effect will be the same as: |
---|
1008 | .PP |
---|
1009 | .RS |
---|
1010 | .nf |
---|
1011 | [ED] > -ED,ING |
---|
1012 | .fi |
---|
1013 | .RE |
---|
1014 | .PP |
---|
1015 | As a final minor, but important point, it is sometimes useful to rebuild |
---|
1016 | a dictionary file using an incompatible suffix file. |
---|
1017 | For example, |
---|
1018 | suppose you expanded the "R" flag to generate "er" and "ers" (thus |
---|
1019 | making the Z flag somewhat obsolete). |
---|
1020 | To build a new dictionary |
---|
1021 | .I newdict |
---|
1022 | that, using |
---|
1023 | .IR newaffixes , |
---|
1024 | will accept exactly the same list of |
---|
1025 | words as the old list |
---|
1026 | .I olddict |
---|
1027 | did using |
---|
1028 | .IR oldaffixes , |
---|
1029 | the |
---|
1030 | .B \-c |
---|
1031 | switch of |
---|
1032 | .I munchlist |
---|
1033 | is useful, as in the following example: |
---|
1034 | .PP |
---|
1035 | .RS |
---|
1036 | .nf |
---|
1037 | $ munchlist -c oldaffixes -l newaffixes olddict > newdict |
---|
1038 | .fi |
---|
1039 | .RE |
---|
1040 | .PP |
---|
1041 | If you use this procedure, your new dictionary will always accept the |
---|
1042 | same list the original did, even if you badly screwed up the affix |
---|
1043 | file. |
---|
1044 | This is because |
---|
1045 | .I munchlist |
---|
1046 | compares the words generated by a flag with the original word list, and |
---|
1047 | refuses to use any flags that generate illegal words. |
---|
1048 | (But don't forget that the |
---|
1049 | .I munchlist |
---|
1050 | step takes a long time and eats up temporary file space). |
---|
1051 | .SH EXAMPLES |
---|
1052 | .PP |
---|
1053 | As an example of conditional suffixes, here is the specification of the |
---|
1054 | .B S |
---|
1055 | flag from the English affix file: |
---|
1056 | .PP |
---|
1057 | .RS |
---|
1058 | .nf |
---|
1059 | flag *S: |
---|
1060 | [^AEIOU]Y > -Y,IES # As in imply > implies |
---|
1061 | [AEIOU]Y > S # As in convey > conveys |
---|
1062 | [SXZH] > ES # As in fix > fixes |
---|
1063 | [^SXZHY] > S # As in bat > bats |
---|
1064 | .fi |
---|
1065 | .RE |
---|
1066 | .PP |
---|
1067 | The first line applies to words ending in Y, but not in vowel-Y. |
---|
1068 | The second takes care of the vowel-Y words. |
---|
1069 | The third then handles those words that end in a sibilant |
---|
1070 | or near-sibilant, and the last picks up everything else. |
---|
1071 | .PP |
---|
1072 | Note that the |
---|
1073 | .I conditions |
---|
1074 | are written very carefully so that they apply |
---|
1075 | to disjoint sets of words. |
---|
1076 | In particular, note that the fourth line |
---|
1077 | excludes words ending in Y as well as the obvious SXZH. |
---|
1078 | Otherwise, it would convert "imply" into "implys". |
---|
1079 | .PP |
---|
1080 | Although the English affix file does not do so, you can also have a flag |
---|
1081 | generate more than one variation on a root word. |
---|
1082 | For example, we could extend the English "R" flag as follows: |
---|
1083 | .PP |
---|
1084 | .RS |
---|
1085 | .nf |
---|
1086 | flag *R: |
---|
1087 | E > R # As in skate > skater |
---|
1088 | E > RS # As in skate > skaters |
---|
1089 | [^AEIOU]Y > -Y,IER # As in multiply > multiplier |
---|
1090 | [^AEIOU]Y > -Y,IERS # As in multiply > multipliers |
---|
1091 | [AEIOU]Y > ER # As in convey > conveyer |
---|
1092 | [AEIOU]Y > ERS # As in convey > conveyers |
---|
1093 | [^EY] > ER # As in build > builder |
---|
1094 | [^EY] > ERS # As in build > builders |
---|
1095 | .fi |
---|
1096 | .RE |
---|
1097 | .PP |
---|
1098 | This flag would generate both "skater" and "skaters" from "skate". |
---|
1099 | This capability can be very useful in languages that make use of noun, verb, |
---|
1100 | and adjective endings. |
---|
1101 | For instance, one could define a single flag |
---|
1102 | that generated all of the German "weak" verb endings. |
---|
1103 | .SH "SEE ALSO" |
---|
1104 | ispell(1) |
---|