source: trunk/third/evolution/libibex/TODO @ 16770

Revision 16770, 1.9 KB checked in by ghudson, 23 years ago (diff)
This commit was generated by cvs2svn to compensate for changes in r16769, which included commits to RCS files with non-trunk default branches.
Line 
1Stability
2---------
3* ibex_open should never crash, and should never return NULL without
4errno being set. Should check for errors when reading.
5
6
7Performance
8-----------
9* Profiling, keep thinking about data structures, etc.
10
11* Check memory usage
12
13* See if writing the "inverse image" of long ref streams helps
14compression without hurting performance now. (ie, if a word appears in
15more than half of the files, write out the list of files it _doesn't_
16appear in). (I tried this before, and it wasn't working well, but the
17file format and data structures have changed a lot.)
18
19* We could save a noticeable chunk of time if normalize_word computed
20the hash of the word and then we could pass that into
21g_hash_table_insert somehow.
22
23* Make a copy of the buffer to be indexed (or provide interface for
24caller to say ibex can munge the provided data) and then use that
25rather than constantly copying things. ?
26
27
28Functionality
29-------------
30* ibex file locking
31
32* specify file mode in ibex_open
33
34* ibex_find* need to normalize the search words... should this be done
35by the caller or by ibex_find?
36
37* Needs to be some way to do a secondary search after getting results
38back from ibex_find* (ie, for "foo near bar"). This either has to be
39done by ibex, or requires us to export the normalize interface.
40
41* Does there need to be an ibex_find_any, or is that easy enough for the
42caller to do?
43
44* utf8_trans needs to cover at least two more code pages. This is
45tricky because it's not clear whether some of the letters there should
46be translated to ASCII or left as UTF8. This requires some
47investigation.
48
49* ibex_index_* need to ignore HTML tags.
50  NAME = [A-Za-z][A-Za-z0-9.-]*
51  </?{NAME}(\s*{NAME}(\s*=\s*({NAME}|"[^"]*"|'[^']*')))*>
52  <!(--([^-]*|-[^-])--\s*)*>
53
54  ugh. ok, simplifying, we get:
55  <[^!](([^"'>]*("[^"]*"|'[^']*'))*> or
56  <!(--([^-]*|-[^-])--\s*)*>
57
58  which is still not simple. sigh.
59
60* ibex_index_* need to recognize and ignore "non-text". Particularly
61BinHex and uuencoding.
Note: See TracBrowser for help on using the repository browser.