source: trunk/third/ispell/ispell.1X @ 10334

Revision 10334, 39.4 KB checked in by ghudson, 27 years ago (diff)
This commit was generated by cvs2svn to compensate for changes in r10333, which included commits to RCS files with non-trunk default branches.
Line 
1.\"
2.\" $Id: ispell.1X,v 1.1.1.1 1997-09-03 21:08:11 ghudson Exp $
3.\"
4.\" Copyright 1992, 1993, Geoff Kuenning, Granada Hills, CA
5.\" All rights reserved.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\"
11.\" 1. Redistributions of source code must retain the above copyright
12.\"    notice, this list of conditions and the following disclaimer.
13.\" 2. Redistributions in binary form must reproduce the above copyright
14.\"    notice, this list of conditions and the following disclaimer in the
15.\"    documentation and/or other materials provided with the distribution.
16.\" 3. All modifications to the source code must be clearly marked as
17.\"    such.  Binary redistributions based on modified source code
18.\"    must be clearly marked as modified versions in the documentation
19.\"    and/or other materials provided with the distribution.
20.\" 4. All advertising materials mentioning features or use of this software
21.\"    must display the following acknowledgment:
22.\"      This product includes software developed by Geoff Kuenning and
23.\"      other unpaid contributors.
24.\" 5. The name of Geoff Kuenning may not be used to endorse or promote
25.\"    products derived from this software without specific prior
26.\"    written permission.
27.\"
28.\" THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS IS'' AND
29.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
30.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
31.\" ARE DISCLAIMED.  IN NO EVENT SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE
32.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
33.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
34.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
35.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
36.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
37.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
38.\" SUCH DAMAGE.
39.\"
40.\" $Log: not supported by cvs2svn $
41.\" Revision 1.80  1995/01/08  23:23:31  geoff
42.\" Document the new personal-dictionary behavior (dictionary named after
43.\" the hash file is preferred).
44.\"
45.\" Revision 1.79  1994/10/25  05:46:02  geoff
46.\" Document the new DICTIONARY variable, and improve the documentation of
47.\" the -d flag.
48.\"
49.\" Revision 1.78  1994/09/16  05:06:58  geoff
50.\" Make it clear that the + command doesn't change the string-character
51.\" type.
52.\"
53.\" Revision 1.77  1994/04/27  01:50:35  geoff
54.\" Remove the bug about the tex parser getting confused by \endxxx.
55.\"
56.\" Revision 1.76  1994/03/21  01:54:08  geoff
57.\" Document the '&' command in -a mode.
58.\"
59.\" Revision 1.75  1994/03/15  06:24:26  geoff
60.\" Document the changes to the +/-/~ commands and the -T switch.
61.\"
62.\" Revision 1.74  1994/01/25  07:11:39  geoff
63.\" Get rid of all old RCS log lines in preparation for the 3.1 release.
64.\"
65.\"
66.TH ISPELL 1 local
67.SH NAME
68ispell, buildhash, munchlist, findaffix, tryaffix, icombine, ijoin \- Interactive
69spelling checking
70.SH SYNOPSIS
71.B ispell
72.RI [ common-flags ]
73.RB [ \-M | \-N ]
74.RB [ \-L \fIcontext\fP ]
75.RB [ \-V ]
76files
77.br
78.B ispell
79.RI [ common-flags ]
80.B \-l
81.br
82.B ispell
83.RI [ common-flags ]
84.RB [ \-f
85file]
86.RB [ \-s ]
87.RB { \-a | \-A }
88.br
89.B ispell
90.RB [ \-d
91.IR file ]
92.RB [ \-w
93.IR chars ]
94.B \-c
95.br
96.B ispell
97.RB [ \-d
98.IR file ]
99.RB [ \-w
100.IR chars ]
101.BR \-e [ e ]
102.br
103.B ispell
104.RB [ \-d
105.IR file ]
106.B \-D
107.br
108.B ispell
109.BR \-v [ v ]
110.IP \fIcommon-flags\fP:
111.RB [ \-t ]
112.RB [ \-n ]
113.RB [ \-b ]
114.RB [ \-x ]
115.RB [ \-B ]
116.RB [ \-C ]
117.RB [ \-P ]
118.RB [ \-m ]
119.RB [ \-S ]
120.RB [ \-d
121.IR file ]
122.RB [ \-p
123.IR file ]
124.RB [ \-w
125.IR chars ]
126.RB [ \-W
127.IR n ]
128.RB [ \-T
129.IR type ]
130.PP
131.B buildhash
132.RB [ \-s ]
133.I
134dict-file affix-file hash-file
135.br
136.B buildhash
137.B \-s
138.I
139count affix-file
140.if n .TP 10
141.if t .PP
142.B munchlist
143.RB [ \-l
144.IR aff-file ]
145.RB [ \-c
146.IR conv-file ]
147.RB [ \-T
148.IR suffix ]
149.if n .br
150.RB [ \-s
151.IR hash-file ]
152.RB [ \-D ]
153.RB [ \-v ]
154.RB [ \-w
155.IR chars ]
156.RI [ files ]
157.if n .TP 10
158.if t .PP
159.B findaffix
160.RB [ \-p | \-s ]
161.RB [ \-f ]
162.RB [ \-c ]
163.RB [ \-m
164.IR min ]
165.RB [ \-M
166.IR max ]
167.RB [ \-e
168.IR elim ]
169.if n .br
170.RB [ \-t
171.IR tabchar ]
172.RB [ \-l
173.IR low ]
174.RI [ files ]
175.PP
176.B tryaffix
177.RB [ \-p | \-s]
178.RB [ \-c ]
179.I expanded-file
180.IR affix [ +addition ]
181...
182.PP
183.B icombine
184.RB [ \-T
185.IR type ]
186.RI [ aff-file ]
187.PP
188.B ijoin
189.RB [ \-s | \-u ]
190.I join-options
191.I file1
192.I file2
193.SH DESCRIPTION
194.PP
195.I Ispell
196is fashioned after the
197.I spell
198program from ITS (called
199.I ispell
200on Twenex systems.)  The most common usage is "ispell filename".  In this
201case,
202.I ispell
203will display each word which does not appear in the dictionary at the
204top of the screen and allow you to change it.  If there are "near
205misses" in the dictionary (words which differ by only a single letter, a
206missing or extra letter, a pair of transposed letters, or a missing
207space or hyphen), then they are
208also displayed on following lines.
209As well as "near misses", ispell may display other guesses
210at ways to make the word from a known root, with each guess preceded
211by question marks.
212Finally, the line containing the
213word and the previous line
214are printed at the bottom of the screen.  If your terminal can
215display in reverse video, the word itself is highlighted.  You have the
216option of replacing the word completely, or choosing one of the
217suggested words.  Commands are single characters as follows
218(case is ignored):
219.PP
220.RS
221.IP R
222Replace the misspelled word completely.
223.IP Space
224Accept the word this time only.
225.IP A
226Accept the word for the rest of this
227.I ispell
228session.
229.IP I
230Accept the word, capitalized as it is in the
231file, and update private dictionary.
232.IP U
233Accept the word, and add an uncapitalized (actually, all lower-case)
234version to the private dictionary.
235.IP 0-\fIn\fR
236Replace with one of the suggested words.
237.IP L
238Look up words in system dictionary (controlled by the WORDS
239compilation option).
240.IP X
241Write the rest of this file, ignoring misspellings, and start next file.
242.IP Q
243Exit immediately and leave the file unchanged.
244.IP !
245Shell escape.
246.IP ^L
247Redraw screen.
248.IP ^Z
249Suspend ispell.
250.IP ?
251Give help screen.
252.RE
253.PP
254If the
255.B \-M
256switch is specified,
257a one-line mini-menu at the bottom of the screen will
258summarize these options.
259Conversely, the
260.B \-N
261switch may be used to suppress the mini-menu.
262(The minimenu is displayed by default if
263.I ispell
264was compiled with the MINIMENU option,
265but these two switches will always override the default).
266.PP
267If the
268.B \-L
269flag is given, the specified number is used as the number of
270lines of context to be shown at the bottom of the screen
271(The default is to calculate the amount of context as a certain percentage
272of the screen size).
273The amount of context is subject to a system-imposed limit.
274.PP
275If the
276.B \-V
277flag is given, characters that are not in the 7-bit ANSI printable
278character set will always be displayed in the style of "cat -v", even if
279.I ispell
280thinks that these characters are legal ISO Latin-1 on your system.
281This is useful when working with older terminals.
282Without this switch,
283.I ispell
284will display 8-bit characters "as is" if they have been defined as
285string characters for the chosen file type.
286.PP
287"Normal" mode, as well as the
288.BR \-l ,
289.BR \-a ,
290and
291.B \-A
292options (see below) also
293accepts the following "common" flags on the command line:
294.RS
295.IP \fB\-t\fR
296The input file is in TeX or LaTeX format.
297.IP \fB\-n\fR
298The input file is in nroff/troff format.
299.IP \fB\-b\fR
300Create a backup file by appending ".bak"
301to the name of the input file.
302.IP \fB\-x\fR
303Don't create a backup file.
304.IP \fB\-B\fR
305Report run-together words with missing blanks as spelling errors.
306.IP \fB\-C\fR
307Consider run-together words as legal compounds.
308.IP \fB\-P\fR
309Don't generate extra root/affix combinations.
310.IP \fB\-m\fR
311Make possible root/affix combinations that
312aren't in the dictionary.
313.IP \fB\-S\fR
314Sort the list of guesses by probable correctness.
315.IP "\fB\-d\fR file"
316Specify an alternate dictionary file.
317For example, use
318.B "\-d deutsch"
319to choose a German dictionary in a German installation.
320.IP "\fB\-p\fR file"
321Specify an alternate personal dictionary.
322.IP "\fB\-w\fR chars"
323Specify additional characters that can be part of a word.
324.IP "\fB\-W\fR n"
325Specify length of words that are always legal.
326.IP "\fB-T\fR type"
327Assume a given formatter type for all files.
328.RE
329.PP
330The
331.B \-n
332and
333.B \-t
334options select whether
335.I ispell
336runs in nroff/troff
337.RB ( \-n )
338or TeX/LaTeX
339.RB ( \-t )
340input mode.
341(The default is controlled by the DEFTEXFLAG installation option.)
342TeX/LaTeX mode is also automatically selected if an input file has
343the extension ".tex", unless overridden by the
344.B \-n
345switch.
346In TeX/LaTeX mode, whenever a backslash ("\e") is found,
347.I ispell
348will skip to the next whitespace or TeX/LaTeX delimiter.  Certain commands
349contain arguments which should not be checked, such as labels and reference
350keys as are found in the \ecite command, since they contain arbitrary,
351non-word arguments.  Spell checking is also suppressed when in math mode.
352Thus, for example, given
353.PP
354.RS
355\echapter {This is a Ckapter}
356\ecite{SCH86}
357.RE
358.PP
359.I ispell
360will find "Ckapter" but not "SCH".
361The
362.B \-t
363option does not recognize the TeX comment character "%", so comments are
364also spell-checked.
365It also assumes
366correct LaTeX syntax.  Arguments to infrequently used commands and some
367optional arguments are sometimes checked unnecessarily.
368The bibliography will not be checked if
369.I ispell
370was compiled with
371.B IGNOREBIB
372defined.  Otherwise, the bibliography will be checked but the reference
373key will not.
374.PP
375References for the
376.IR tib (1)
377bibliography system, that is, text between a ``[.'' or ``<.'' and
378``.]'' or ``.>'' will always be ignored in TeX/LaTeX mode.
379.PP
380The
381.B \-b
382and
383.B \-x
384options control whether
385.I ispell
386leaves a backup (.bak) file for each input file.
387The .bak file contains
388the pre-corrected text.  If there are file opening / writing errors,
389the .bak file may be left for recovery purposes even with the
390.B \-x
391option.
392The default for this option is controlled by the DEFNOBACKUPFLAG
393installation option.
394.PP
395The
396.B \-B
397and
398.B \-C
399options control how
400.I ispell
401handles run-together words, such as "notthe" for "not the".
402If
403.B \-B
404is specified, such words will be considered as errors, and
405.I ispell
406will list variations with an inserted blank or hyphen as possible
407replacements.
408If
409.B \-C
410is specified, run-together words will be considered to be
411legal compounds, so long as both components are in the dictionary, and
412each component is at least as long as a language-dependent minimum (3 characters, by default).
413This is useful for languages such as German and Norwegian, where
414many compound words are formed by concatenation.
415(Note that compounds formed from three or more root words will still
416be considered errors).
417The default for this option is language-dependent;
418in a multi-lingual installation the default may vary depending on
419which dictionary you choose.
420.PP
421The
422.B \-P
423and
424.B \-m
425options control when
426.I ispell
427automatically generates suggested root/affix combinations for possible
428addition to your personal dictionary.
429(These are the entries in the "guess" list which are preceded by question
430marks.)
431If
432.B \-P
433is specified, such guesses are displayed only if
434.I ispell
435cannot generate any possibilities that match the current dictionary.
436If
437.B \-m
438is specified, such guesses are always displayed.
439This can be useful if the dictionary has a limited word list, or a word
440list with few suffixes.
441However, you should be careful when using this option, as it can
442generate guesses that produce illegal words.
443The default for this option is controlled by the dictionary file used.
444.PP
445The
446.B \-S
447option suppresses
448.IR ispell "'s"
449normal behavior of sorting the list of possible replacement words.
450Some people may prefer this, since it somewhat enhances the probability
451that the correct word will be low-numbered.
452.PP
453The
454.B \-d
455option is used to specify an alternate hashed dictionary file,
456other than the default.
457If the filename does not contain a "/",
458the library directory for the default dictionary file is prefixed;
459thus, to use a dictionary in the local directory "-d ./xxx.hash" must
460be used.
461This is useful to allow dictionaries for alternate languages.
462Unlike previous versions of
463.IR ispell ,
464a dictionary of
465.IR /dev/null
466is illegal, because the dictionary contains the affix table.
467If you need an effectively empty dictionary, create a one-entry list
468with an unlikely string (e.g., "qqqqq").
469.PP
470The
471.B \-p
472option is used to specify an alternate personal dictionary file.
473If the file name does not begin with "/", $HOME is prefixed.  Also, the
474shell variable WORDLIST may be set, which renames the personal dictionary
475in the same manner.  The command line overrides any WORDLIST setting.
476If neither the
477.B \-p
478switch nor the WORDLIST environment variable is given,
479.I ispell
480will search for a personal dictionary in both the current directory
481and $HOME, creating one in $HOME if none is found.
482The preferred name is constructed by appending ".ispell_" to the base name
483of the hash file.
484For example, if you use the English dictionary, your personal
485dictionary would be named ".ispell_english".
486However, if the file ".ispell_words" exists, it will be used as the
487personal dictionary regardless of the language hash file chosen.
488This feature is included primarily for backwards compatibility.
489.PP
490If the
491.B \-p
492option is
493.I not
494specified,
495.I ispell
496will look for personal dictionaries in both the current directory and
497the home directory.
498If dictionaries exist in both places, they will be merged.
499If any words are added to the personal dictionary, they will be
500written to the current directory if a dictionary already existed in
501that place;
502otherwise they will be written to the dictionary in the home directory.
503.PP
504The
505.B \-w
506option may be used to specify characters other than alphabetics
507which may also appear in words.  For instance,
508.B \-w
509"&" will allow "AT&T"
510to be picked up.  Underscores are useful in many technical documents.
511There is an admittedly crude provision in this option for 8-bit international
512characters.
513Non-printing characters may be specified in the usual way by inserting a
514backslash followed by the octal character code;
515e.g., "\e014" for a form feed.
516Alternatively, if "n" appears in the character string, the (up to)
517three characters
518following are a DECIMAL code 0 - 255, for the character.
519For example, to include bells and form feeds in your words (an admittedly
520silly thing to do, but aren't most pedagogical examples):
521.PP
522.RS
523n007n012
524.RE
525.PP
526Numeric digits other than the three following "n" are simply numeric
527characters.  Use of "n" does not conflict with anything because actual
528alphabetics have no meaning - alphabetics are already accepted.
529.I Ispell
530will typically be used with input from a file, meaning that preserving
531parity for possible 8 bit characters from the input text is OK.  If you
532specify the -l option, and actually type text from the terminal, this may
533create problems if your stty settings preserve parity.
534.PP
535The
536.B \-W
537option may be used to change the length of words that
538.I ispell
539always accepts as legal.
540Normally,
541.I ispell
542will accept all 1-character words as legal, which is equivalent to
543specifying "\fB\-W 1\fR."
544(The default for this switch is actually controlled by the MINWORD
545installation option, so it may vary at your installation.)
546If you want all words to be checked against the dictionary, regardless
547of length, you might want to specify "\fB\-W 0\fR."
548On the other hand, if your document specifies a lot of three-letter acronyms,
549you would specify "\fB\-W 3\fR" to accept all words of three letters or
550less.
551Regardless of the setting of this option,
552.I ispell
553will only generate words that are in the dictionary as suggested replacements
554for words;
555this prevents the list from becoming too long.
556Obviously, this option can be very dangerous, since short misspellings may
557be missed.
558If you use this option a lot, you should probably make a last pass without it
559before you publish your document, to protect yourself against errors.
560.PP
561The
562.B \-T
563option is used to specify a default formatter type for use in
564generating string characters.
565This switch overrides the default type determined from
566the file name.
567The
568.I type
569argument may be either one of the unique names defined in the language
570affix file (e.g.,
571.BR nroff )
572or a file suffix including the dot (e.g.,
573.BR .tex ).
574If no
575.B \-T
576option appears and no type can be determined from the file name, the default
577string character type declared in the
578language affix file will be used.
579.PP
580The
581.B \-l
582or "list" option to
583.I ispell
584is used to produce a list of misspelled words from the standard input.
585.PP
586The
587.B \-a
588option
589is intended to be used from other programs through a pipe.  In this
590mode,
591.I ispell
592prints a one-line version identification message, and then begins
593reading lines of input.  For each input line,
594a single line is written to the standard output for each word
595checked for spelling on the line.  If the word
596was found in the main dictionary, or your personal dictionary, then the
597line contains only a '*'.  If the word was found through affix removal,
598then the line contains a '+', a space, and the root word.
599If the word was found through compound formation (concatenation of two
600words, controlled by the
601.B \-C
602option), then the line contains only a '\-'.
603.PP
604If the word
605is not in the dictionary, but there are near misses, then the line
606contains an '&', a space, the misspelled word, a space, the number of
607near misses,
608the number of
609characters between the beginning of the line and the
610beginning of the misspelled word, a colon, another space,
611and a list of the near
612misses separated by
613commas and spaces.
614Following the near misses (and identified only by the count of near
615misses), if the word could be formed by adding
616(illegal) affixes to a known root,
617is a list of suggested derivations, again separated by commas and spaces.
618If there are no near misses at all, the line format is the same, except
619that the '&' is replaced by '?' (and the near-miss count is always zero).
620The suggested derivations following the near misses are in the form:
621.PP
622.RS
623[prefix+] root [-prefix] [-suffix] [+suffix]
624.RE
625.PP
626(e.g., "re+fry-y+ies" to get "refries")
627where each optional
628.I pfx
629and
630.I sfx
631is a string.
632Also, each near miss or guess is capitalized the same as the input
633word unless such capitalization is illegal;
634in the latter case each near miss is capitalized correctly
635according to the dictionary.
636.PP
637Finally, if the word does not appear in the dictionary, and
638there are no near misses, then the line contains a '#', a space,
639the misspelled word, a space,
640and the character offset from the beginning of the line.
641Each sentence of text input is terminated
642with an additional blank line, indicating that
643.I ispell
644has completed processing the input line.
645.PP
646These output lines can be summarized as follows:
647.PP
648.RS
649.IP OK:
650*
651.IP Root:
652+ <root>
653.IP Compound:
654\-
655.IP Miss:
656& <original> <count> <offset>: <miss>, <miss>, ..., <guess>, ...
657.IP Guess:
658? <original> 0 <offset>: <guess>, <guess>, ...
659.IP None:
660# <original> <offset>
661.RE
662.PP
663For example, a dummy dictionary containing the words "fray", "Frey",
664"fry", and "refried" might produce the following response to the
665command "echo 'frqy refries | ispell -a -m -d ./test.hash":
666.RS
667.nf
668(#) International Ispell Version 3.0.05 (beta), 08/10/91
669& frqy 3 0: fray, Frey, fry
670& refries 1 5: refried, re+fry-y+ies
671.fi
672.RE
673.PP
674This mode
675is also suitable for interactive use when you want to figure out the
676spelling of a single word.
677.PP
678The
679.B \-A
680option works just like
681.BR \-a ,
682except that if a line begins with the string "&Include_File&", the rest
683of the line is taken as the name of a file to read for further words.
684Input returns to the original file when the include file is exhausted.
685Inclusion may be nested up to five deep.
686The key string may be changed with the environment variable
687.B INCLUDE_STRING
688(the ampersands, if any, must be included).
689.PP
690When in the
691.B \-a
692mode,
693.I ispell
694will also accept lines of single words prefixed with any
695of '*', '&', '@', '+', '-', '~', '#', '!', '%', or '^'.
696A line starting with '*' tells
697.I ispell
698to insert the word into the user's dictionary (similar to the I command).
699A line starting with '&' tells
700.I ispell
701to insert an all-lowercase version of the word into the user's
702dictionary (similar to the U command).
703A line starting with '@' causes
704.I ispell
705to accept this word in the future (similar to the A command).
706A line starting with '+', followed immediately by
707.B tex
708or
709.B nroff
710will cause
711.I ispell
712to parse future input according the syntax of that formatter.
713A line consisting solely of a '+' will place
714.I ispell
715in TeX/LaTeX mode (similar to the
716.B \-t
717option) and '-' returns
718.I ispell
719to nroff/troff mode (but these commands are obsolete).
720However, string character type is
721.I not
722changed;
723the '~' command must be used to do this.
724A line starting with '~' causes
725.I ispell
726to set internal parameters (in particular, the default string
727character type) based on the filename given in the rest of the line.
728(A file suffix is sufficient, but the period must be included.
729Instead of a file name or suffix, a unique name, as listed in the language
730affix file, may be specified.)
731However, the formatter parsing is
732.I not
733changed;  the '+' command must be used to change the formatter.
734A line prefixed with '#' will cause the
735personal dictionary to be saved.
736A line prefixed with '!' will turn on
737.I terse
738mode (see below), and a line prefixed with '%' will return
739.I ispell
740to normal (non-terse) mode.
741Any input following the prefix
742characters '+', '-', '#', '!', or '%' is ignored, as is any input
743following the filename on a '~' line.
744To allow spell-checking of lines beginning with these characters, a
745line starting with '^' has that character removed before it is passed
746to the spell-checking code.
747It is recommended that programmatic interfaces prefix every data line
748with an uparrow to protect themselves against future changes in
749.IR ispell .
750.PP
751To summarize these:
752.PP
753.RS
754.IP *
755Add to personal dictionary
756.IP @
757Accept word, but leave out of dictionary
758.IP #
759Save current personal dictionary
760.IP ~
761Set parameters based on filename
762.IP +
763Enter TeX mode
764.IP -
765Exit TeX mode
766.IP !
767Enter terse mode
768.IP %
769Exit terse mode
770.IP ^
771Spell-check rest of line
772.fi
773.RE
774.PP
775In
776.I terse
777mode,
778.I ispell
779will not print lines beginning with '*', '+', or '\-', all of which
780indicate correct words.
781This significantly improves running speed when the driving program is
782going to ignore correct words anyway.
783.PP
784The
785.B \-s
786option is only valid in conjunction with the
787.B \-a
788or
789.B \-A
790options, and only on BSD-derived systems.
791If specified,
792.I ispell
793will stop itself with a
794.B SIGTSTP
795signal after each line of input.
796It will not read more input until it receives a
797.B SIGCONT
798signal.
799This may be useful for handshaking with certain text editors.
800.PP
801The
802.B \-f
803option is only valid in conjunction with the
804.B \-a
805or
806.B \-A
807options.
808If
809.B \-f
810is specified,
811.I ispell
812will write its results to the given file, rather than to standard output.
813.PP
814The
815.B \-v
816option causes
817.I ispell
818to print its current version identification on the standard output
819and exit.
820If the switch is doubled,
821.I ispell
822will also print the options that it was compiled with.
823.PP
824The
825.BR \-c ,
826.BR \-e [ 1-4 ],
827and
828.B \-D
829options of
830.IR ispell ,
831are primarily intended for use by the
832.I munchlist
833shell script.
834The
835.B \-c
836switch causes a list of words to be read from the standard input.
837For each word, a list of possible root words and affixes will be
838written to the standard output.
839Some of the root words will be illegal and must be filtered from the
840output by other means;
841the
842.I munchlist
843script does this.
844As an example, the command:
845.PP
846.RS
847echo BOTHER | ispell -c
848.RE
849.PP
850produces:
851.PP
852.RS
853.nf
854BOTHER BOTHE/R BOTH/R
855.fi
856.RE
857.PP
858The
859.B \-e
860switch is the reverse of
861.BR \-c ;
862it expands affix flags to produce a list of words.
863For example, the command:
864.PP
865.RS
866echo BOTH/R | ispell -e
867.RE
868.PP
869produces:
870.PP
871.RS
872.nf
873BOTH BOTHER
874.fi
875.RE
876.PP
877An optional expansion level can also be specified.  A level of 1
878.RB ( \-e1 )
879is the same as
880.B \-e
881alone.
882A level of 2 causes the original root/affix combination to be
883prepended to the line:
884.PP
885.RS
886.nf
887BOTH/R BOTH BOTHER
888.fi
889.RE
890.PP
891A level of 3 causes multiple lines to be output, one for each
892generated word, with the original root/affix combination followed by
893the word it creates:
894.PP
895.RS
896.nf
897BOTH/R BOTH
898BOTH/R BOTHER
899.fi
900.RE
901.PP
902A level of 4 causes a floating-point number to be appended to each of
903the level-3 lines, giving the ratio between the length of the root and
904the total length of all generated words including the root:
905.PP
906.RS
907.nf
908BOTH/R BOTH 2.500000
909BOTH/R BOTHER 2.500000
910.fi
911.RE
912.PP
913Finally, the
914.B \-D
915flag causes the affix tables from the dictionary file
916to be dumped to standard output.
917.PP
918Unless your system administrator has suppressed the feature to save space,
919.I ispell
920is aware of the correct capitalizations of words in the dictionary and
921in your personal dictionary.
922As well as recognizing words that must be capitalized (e.g., George) and
923words that must be all-capitals (e.g., NASA), it can also handle words
924with "unusual" capitalization (e.g., "ITCorp" or "TeX").
925If a word is capitalized incorrectly, the list of possibilities will
926include all acceptable capitalizations.
927(More than one capitalization may be acceptable;
928for example, my dictionary lists both "ITCorp" and "ITcorp".)
929.PP
930Normally, this feature will not cause you surprises, but there is one
931circumstance you need to be aware of.
932If you use "I" to
933add a word to your dictionary that is at the beginning of a sentence
934(e.g., the first word of this paragraph if "normally" were not in the
935dictionary), it will be marked as "capitalization required".
936A subsequent usage of this word without capitalization (e.g., the quoted word
937in the previous sentence) will be considered a misspelling by
938.IR ispell ,
939and it will suggest the capitalized version.
940You must then compare the actual spellings by eye, and then type "I"
941to add the uncapitalized variant to your personal dictionary.
942You can avoid this problem by using "U" to add the original word,
943rather than "I".
944.PP
945The rules for capitalization are as follows:
946.IP (1)
947Any word may appear in all capitals, as in headings.
948.IP (2)
949Any word that is in the dictionary in all-lowercase form may appear
950either in lowercase or capitalized (as at the beginning of a sentence).
951.IP (3)
952Any word that has "funny" capitalization (i.e., it contains both cases
953and there is an uppercase character besides the first) must appear
954exactly as in the dictionary, except as permitted by rule (1).
955If the word is acceptable in all-lowercase, it must appear thus in a
956dictionary entry.
957.SS buildhash
958.PP
959The
960.I buildhash
961program builds hashed dictionary files for later use by
962.I ispell.
963The raw word list (with affix flags) is given in
964.IR dict-file ,
965and the the affix flags are defined by
966.IR affix-file .
967The hashed output is written to
968.IR hash-file .
969The formats of the two input files are described in
970.IR ispell (4).
971The
972.B \-s
973(silent) option suppresses the usual status messages that are written
974to the standard error device.
975.SS munchlist
976.PP
977The
978.I munchlist
979shell script is used to reduce the size of dictionary files,
980primarily personal dictionary files.
981It is also capable of combining dictionaries from various sources.
982The given
983.I files
984are read (standard input if no arguments are given),
985reduced to a minimal set of roots and affixes that will match the
986same list of words, and written to standard output.
987.PP
988Input for munchlist contains of raw words (e.g from your personal
989dictionary files) or root and affix combinations (probably generated
990in earlier munchlist runs).  Each word or root/affix combination must
991be on a separate line.
992.PP
993The
994.B \-D
995(debug) option leaves temporary files around under standard names instead
996of deleting them, so that the script can be debugged.
997Warning:
998this option can eat up an enormous amount of temporary file space.
999.PP
1000The
1001.B \-v
1002(verbose) option causes progress messages to be reported to stderr so
1003you won't get nervous that
1004.I munchlist
1005has hung.
1006.PP
1007If the
1008.B \-s
1009(strip) option is specified, words that are in the specified
1010.I hash-file
1011are removed from the word list.
1012This can be useful with personal dictionaries.
1013.PP
1014The
1015.B \-l
1016option can be used to specify an alternate
1017.I affix-file
1018for munching dictionaries in languages other than English.
1019.PP
1020The
1021.B \-c
1022option can be used to convert dictionaries that were built with an
1023older affix file, without risk of accidentally introducing unintended
1024affix combinations into the dictionary.
1025.PP
1026The
1027.B \-T
1028option allows dictionaries to be converted to a canonical
1029string-character format.
1030The suffix specified is looked up in the affix file
1031.RB ( \-l
1032switch)
1033to determine the string-character format used for the input file;
1034the output always uses the canonical string-character format.
1035For example, a dictionary collected from TeX source files might be
1036converted to canonical format by specifying
1037.BR "\-T tex" .
1038.PP
1039The
1040.B \-w
1041option is passed on to
1042.IR ispell .
1043.SS findaffix
1044.PP
1045The
1046.I findaffix
1047shell script is an aid to writers of new language descriptions in choosing
1048affixes.
1049The given dictionary
1050.I files
1051(standard input if none are given) are examined for possible prefixes
1052.RB ( \-p
1053switch) or suffixes
1054.RB ( \-s
1055switch, the default).
1056Each commonly-occurring affix is presented along with
1057a count of the number of times it appears
1058and an estimate of the number of bytes that would be saved in a dictionary
1059hash file if it were added to the language table.
1060Only affixes that generate legal roots (found in the original input)
1061are listed.
1062.PP
1063If the "-c" option is not given, the output lines are in the
1064following format:
1065.IP
1066strip/add/count/bytes
1067.PP
1068where
1069.I strip
1070is the string that should be stripped from a root
1071word before adding the affix,
1072.I add
1073is the affix to be added,
1074.I count
1075is a count of the number of times that this
1076.IR strip / add
1077combination appears, and
1078.I bytes
1079is an estimate of the number of bytes that
1080might be saved in the raw dictionary file if this combination is
1081added to the affix file.
1082The field separator in the output will
1083be the tab character specified by the
1084.B -t
1085switch;  the default is a slash ("/").
1086.PP
1087If the
1088.B \-c
1089("clean output") option is given, the appearance of
1090the output is made visually cleaner (but harder to post-process)
1091by changing it to:
1092.IP
1093-strip+add<tab>count<tab>bytes
1094.PP
1095where
1096.IR strip ,
1097.IR add ,
1098.IR count ,
1099and
1100.I bytes
1101are as before, and
1102.I "<tab>"
1103represents the ASCII tab character.
1104.PP
1105The method used to generate possible affixes will also generate
1106longer affixes which have common headers or trailers.  For example,
1107the two words "moth" and "mother" will generate not only the obvious
1108substitution "+er" but also "-h+her" and "-th+ther" (and possibly
1109even longer ones, depending on the value of
1110.IR min ).
1111To prevent
1112cluttering the output with such affixes, any affix pair that shares
1113a common header (or, for prefixes, trailer) string longer than
1114.I elim
1115characters (default 1) will be suppressed.
1116You may want to set "elim" to a value greater than 1 if your language has string
1117characters;
1118usually the need for this parameter will become obvious
1119when you examine the output of your
1120.I findaffix
1121run.
1122.PP
1123Normally, the affixes are sorted according to the estimate of bytes saved.
1124The
1125.B \-f
1126switch may be used to cause the affixes to be sorted by frequency of
1127appearance.
1128.PP
1129To save output file space,
1130affixes which occur fewer than 10 times are eliminated;
1131this limit may be changed with the
1132.B \-l
1133switch.
1134The
1135.B \-M
1136switch specifies a maximum affix length (default 8).
1137Affixes longer than this will not be reported.
1138(This saves on temporary disk space and makes the script run faster.)
1139.PP
1140Affixes which generate stems shorter than 3 characters are suppressed.
1141(A stem is the word after the
1142.I strip
1143string has been removed, and before the
1144.I add
1145string has been added.)
1146This reduces both the running time and the size of the output file.
1147This limit may be changed with the
1148.B \-m
1149switch.
1150The minimum stem length should only be set to 1 if you have a
1151.I lot
1152of free time and disk space (in the range of many days and hundreds of
1153megabytes).
1154.PP
1155The
1156.I findaffix
1157script requires a non-blank field-separator character for internal
1158use.
1159Normally, this character is a slash ("/"), but if the slash
1160appears as a character in the input word list, a different character
1161can be specified with the
1162.B \-t
1163switch.
1164.PP
1165Ispell dictionaries should be expanded before being fed to
1166.IR findaffix ;
1167in addition, characters that are not in the English alphabet (if any) should
1168be translated to lowercase.
1169.SS tryaffix
1170.PP
1171The
1172.I tryaffix
1173shell script is used to estimate the effectiveness of a proposed
1174prefix
1175.RB ( \-p
1176switch) or suffix
1177.RB ( \-s
1178switch, the default) with a given
1179.IR expanded-file .
1180Only one affix can be tried with each execution of
1181.IR tryaffix ,
1182although multiple arguments can be used to describe varying forms of the
1183same affix flag (e.g., the
1184.B D
1185flag for English can add either
1186.I D
1187or
1188.I ED
1189depending on whether a trailing E is already present).
1190Each word in the expanded dictionary that ends (or begins) with the chosen
1191suffix (or prefix) has that suffix (prefix) removed;
1192the dictionary is then searched for root words that match the stripped word.
1193Normally, all matching roots are written to standard output, but if the
1194.B \-c
1195(count) flag is given, only a statistical summary of the results is written.
1196The statistics given are a count of words the affix potentially applies to
1197and an estimate of the number of dictionary bytes that a flag using the
1198affix would save.
1199The estimate will be high if the flag generates words
1200that are currently generated by other affix flags
1201(e.g., in English,
1202.I bathers
1203can be generated by either
1204.I bath/X
1205or
1206.IR bather/S ).
1207.P
1208The dictionary file,
1209.IR expanded-file ,
1210must already be expanded (using the
1211.B \-e
1212switch of
1213.IR ispell )
1214and sorted, and things will usually work best if uppercase
1215has been folded to lower with 'tr'.
1216.PP
1217The
1218.I affix
1219arguments are things to be stripped from the dictionary
1220file to produce trial roots:
1221for English,
1222.I con
1223(prefix) and
1224.I ing
1225(suffix) are examples.
1226The
1227.I addition
1228parts of the argument are letters that would have
1229been stripped off the root before adding the affix.
1230For example, in English the affix
1231.I ing
1232normally strips
1233.I e
1234for words ending in that letter (e.g.,
1235.I like
1236becomes
1237.IR liking )
1238so we might run:
1239.PP
1240.RS
1241.nf
1242tryaffix ing ing+e
1243.fi
1244.RE
1245.PP
1246to cover both cases.
1247.PP
1248All of the shell scripts contain documentation as commentary at the
1249beginning;
1250sometimes these comments contain useful information beyond the scope
1251of this manual page.
1252.PP
1253It is possible to install
1254.I ispell
1255in such a way as to only support ASCII range text if desired.
1256.SS icombine
1257The
1258.I icombine
1259program is a helper for
1260.IR munchlist .
1261It reads a list of words in dictionary format (roots plus flags) from
1262the standard input, and produces a reduced list on standard output
1263which combines common roots found on adjacent entries.
1264Identical roots which have differing flags will have their flags
1265combined, and roots which have differing capitalizations will be
1266combined in a way which only preserves important capitalization
1267information.
1268The optional
1269.I aff-file
1270specifies a language file which defines the character sets used and
1271the meanings of the various flags.
1272The
1273.B \-T
1274switch can be used to select among alternative string character types
1275by giving a dummy suffix that can be found in an
1276.B altstringtype
1277statement.
1278.SS ijoin
1279The
1280.I ijoin
1281program is a re-implementation of
1282.IR join (1)
1283which handles long lines and 8-bit characters correctly.
1284The
1285.B \-s
1286switch specifies that the
1287.IR sort (1)
1288program used to prepare the input to
1289.I ijoin
1290uses signed comparisons on 8-bit characters;
1291the
1292.B \-u
1293switch specifies that
1294.IR sort (1)
1295uses unsigned comparisons.
1296All other options and behaviors of
1297.IR join (1)
1298are duplicated as exactly as possible based on the manual page, except
1299that
1300.I ijoin
1301will not handle newline as a field separator.
1302See the
1303.IR join (1)
1304manual page for more information.
1305.SH ENVIRONMENT
1306.IP DICTIONARY
1307Default dictionary to use, if no
1308.B \-d
1309flag is given.
1310.IP WORDLIST
1311Personal dictionary file name
1312.IP INCLUDE_STRING
1313Code for file inclusion under the
1314.B \-A
1315option
1316.IP TMPDIR
1317Directory used for some of munchlist's temporary files
1318.SH FILES
1319.IP !!LIBDIR!!/!!DEFHASH!!
1320Hashed dictionary (may be found in some other local directory,
1321depending on the system).
1322.IP !!LIBDIR!!/!!DEFLANG!!
1323Affix-definition file for
1324.I munchlist
1325.IP "/usr/dict/web2 or /usr/dict/words"
1326For the Lookup function (depending on the WORDS compilation option).
1327.IP $HOME/.ispell_\fIhashfile\fP
1328User's private dictionary
1329.IP .ispell_\fIhashfile\fP
1330Directory-specific private dictionary
1331.SH SEE ALSO
1332.IR spell (1),
1333.IR egrep (1),
1334.IR look (1),
1335.IR join (1),
1336.IR sort (1),
1337.IR sq (1L),
1338.IR tib (1L),
1339.IR ispell (4L),
1340.IR english (4L)
1341.SH BUGS
1342It takes several to many seconds for
1343.I ispell
1344to read in the hash table, depending on size.
1345.sp
1346When all options are enabled,
1347.I ispell
1348may take several seconds to generate all the guesses at corrections for
1349a misspelled word;
1350on slower machines this time is long enough to be annoying.
1351.sp
1352The hash table is stored as a quarter-megabyte (or larger) array, so a PDP-11
1353or 286 version does not seem likely.
1354.sp
1355.I Ispell
1356should understand more
1357.I troff
1358syntax, and deal more intelligently with contractions.
1359.sp
1360Although small personal dictionaries are sorted before they are written out,
1361the order of capitalizations of the same word is somewhat random.
1362.sp
1363When the
1364.B \-x
1365flag is specified,
1366.I ispell
1367will unlink any existing .bak file.
1368.sp
1369There are too many flags, and many of them have non-mnemonic names.
1370.sp
1371.I Munchlist
1372does not deal very gracefully with dictionaries which contain
1373"non-word" characters.
1374Such characters ought to be deleted from the dictionary with a warning
1375message.
1376.sp
1377.I Findaffix
1378and
1379.I munchlist
1380require tremendous amounts of temporary file space for
1381large dictionaries.
1382They do respect the TMPDIR environment variable, so this space can be
1383redirected.
1384However, a lot of the temporary space needed is for sorting, so TMPDIR
1385is only a partial help on systems with an uncooperative
1386.IR sort (1).
1387("Cooperative" is defined as accepting the undocumented -T switch).
1388At its peak usage,
1389.I munchlist
1390takes 10 to 40 times the original
1391dictionary's size in Kb.
1392(The larger ratio is for dictionaries that already have heavy affix
1393use, such as the one distributed with
1394.IR ispell ).
1395.I Munchlist
1396is also very slow;
1397munching a normal-sized dictionary (15K roots, 45K expanded words) takes
1398around an hour on a small workstation.
1399(Most of this time is spent in
1400.IR sort (1),
1401and
1402.I munchlist
1403can run much faster on machines that have a more modern
1404.I sort
1405that makes better use of the memory available to it.)
1406.I Findaffix
1407is even worse;
1408the smallest English dictionary cannot be processed with this script in
1409a mere 50Kb of free space, and even after specifying switches to
1410reduce the temporary space required, the script will run for over 24 hours
1411on a small workstation.
1412.SH AUTHOR
1413Pace Willisson (pace@mit-vax), 1983, based on the PDP-10 assembly version.
1414That version was written by
1415R. E. Gorin in 1971,
1416and later revised by W. E. Matson (1974) and W. B. Ackerman (1978).
1417.P
1418Collected, revised, and enhanced for the Usenet by Walt Buehring, 1987.
1419.P
1420Table-driven multi-lingual version by Geoff Kuenning, 1987-88.
1421.P
1422Large dictionaries provided by Bob Devine (vianet!devine).
1423.P
1424A complete list of contributors is too large to list here, but is
1425distributed with the ispell sources in the file "Contributors".
1426.SH VERSION
1427The version of ispell described by this manual page is
1428International Ispell Version 3.1.00, 10/08/93.
Note: See TracBrowser for help on using the repository browser.