source: main/waeup.sirp/trunk/src/waeup/sirp/index/README.txt @ 6333

Last change on this file since 6333 was 6211, checked in by uli, 14 years ago

Provide a unique field index for catalogs.

File size: 4.7 KB
RevLine 
[6211]1Field Indexes
2=============
3
4Field indexes index orderable values.  Note that they don't check for
5orderability. That is, all of the values added to the index must be
6orderable together. It is up to applications to provide only mutually
7orderable values.
8
9    >>> from zope.index.field import FieldIndex
10
11    >>> index = FieldIndex()
12    >>> index.index_doc(0, 6)
13    >>> index.index_doc(1, 26)
14    >>> index.index_doc(2, 94)
15    >>> index.index_doc(3, 68)
16    >>> index.index_doc(4, 30)
17    >>> index.index_doc(5, 68)
18    >>> index.index_doc(6, 82)
19    >>> index.index_doc(7, 30)
20    >>> index.index_doc(8, 43)
21    >>> index.index_doc(9, 15)
22
23Field indexes are searched with apply.  The argument is a tuple
24with a minimum and maximum value:
25
26    >>> index.apply((30, 70))
27    IFSet([3, 4, 5, 7, 8])
28
29A common mistake is to pass a single value.  If anything other than a
30two-tuple is passed, a type error is raised:
31
32    >>> index.apply('hi')
33    Traceback (most recent call last):
34    ...
35    TypeError: ('two-length tuple expected', 'hi')
36
37
38Open-ended ranges can be provided by provinding None as an end point:
39
40    >>> index.apply((30, None))
41    IFSet([2, 3, 4, 5, 6, 7, 8])
42
43    >>> index.apply((None, 70))
44    IFSet([0, 1, 3, 4, 5, 7, 8, 9])
45
46    >>> index.apply((None, None))
47    IFSet([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
48
49To do an exact value search, supply equal minimum and maximum values:
50
51    >>> index.apply((30, 30))
52    IFSet([4, 7])
53
54    >>> index.apply((70, 70))
55    IFSet([])
56
57Field indexes support basic statistics:
58
59    >>> index.documentCount()
60    10
61    >>> index.wordCount()
62    8
63
64Documents can be reindexed:
65
66    >>> index.apply((15, 15))
67    IFSet([9])
68    >>> index.index_doc(9, 14)
69
70    >>> index.apply((15, 15))
71    IFSet([])
72    >>> index.apply((14, 14))
73    IFSet([9])
74
75Documents can be unindexed:
76
77    >>> index.unindex_doc(7)
78    >>> index.documentCount()
79    9
80    >>> index.wordCount()
81    8
82    >>> index.unindex_doc(8)
83    >>> index.documentCount()
84    8
85    >>> index.wordCount()
86    7
87
88    >>> index.apply((30, 70))
89    IFSet([3, 4, 5])
90
91Unindexing a document id that isn't present is ignored:
92
93    >>> index.unindex_doc(8)
94    >>> index.unindex_doc(80)
95    >>> index.documentCount()
96    8
97    >>> index.wordCount()
98    7
99
100We can also clear the index entirely:
101
102    >>> index.clear()
103    >>> index.documentCount()
104    0
105    >>> index.wordCount()
106    0
107
108    >>> index.apply((30, 70))
109    IFSet([])
110
111Sorting
112-------
113
114Field indexes also implement IIndexSort interface that
115provides a method for sorting document ids by their indexed
116values.
117
118    >>> index.index_doc(1, 9)
119    >>> index.index_doc(2, 8)
120    >>> index.index_doc(3, 7)
121    >>> index.index_doc(4, 6)
122    >>> index.index_doc(5, 5)
123    >>> index.index_doc(6, 4)
124    >>> index.index_doc(7, 3)
125    >>> index.index_doc(8, 2)
126    >>> index.index_doc(9, 1)
127
128    >>> list(index.sort([4, 2, 9, 7, 3, 1, 5]))
129    [9, 7, 5, 4, 3, 2, 1]
130
131We can also specify the ``reverse`` argument to reverse results:
132
133    >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], reverse=True))
134    [1, 2, 3, 4, 5, 7, 9]
135
136And as per IIndexSort, we can limit results by specifying the ``limit``
137argument:
138
139    >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], limit=3))
140    [9, 7, 5]
141
142If we pass an id that is not indexed by this index, it won't be included
143in the result.
144
145    >>> list(index.sort([2, 10]))
146    [2]
147
148    >>> index.clear()
149
150Bugfix testing:
151---------------
152Happened at least once that the value dropped out of the forward index,
153but the index still contains the object, the unindex broke
154
155    >>> index.index_doc(0, 6)
156    >>> index.index_doc(1, 26)
157    >>> index.index_doc(2, 94)
158    >>> index.index_doc(3, 68)
159    >>> index.index_doc(4, 30)
160    >>> index.index_doc(5, 68)
161    >>> index.index_doc(6, 82)
162    >>> index.index_doc(7, 30)
163    >>> index.index_doc(8, 43)
164    >>> index.index_doc(9, 15)
165
166    >>> index.apply((None, None))
167    IFSet([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
168
169Here is the damage:
170
171    >>> del index._fwd_index[68]
172
173Unindex should succeed:
174
175    >>> index.unindex_doc(5)
176    >>> index.unindex_doc(3)
177
178    >>> index.apply((None, None))
179    IFSet([0, 1, 2, 4, 6, 7, 8, 9])
180
181
182Optimizations
183-------------
184
185There is an optimization which makes sure that nothing is changed in the
186internal data structures if the value of the ducument was not changed.
187
188To test this optimization we patch the index instance to make sure unindex_doc
189is not called.
190
191    >>> def unindex_doc(doc_id):
192    ...     raise KeyError
193    >>> index.unindex_doc = unindex_doc
194
195Now we get a KeyError if we try to change the value.
196
197    >>> index.index_doc(9, 14)
198    Traceback (most recent call last):
199    ...
200    KeyError
201
202Leaving the value unchange doesn't call unindex_doc.
203
204    >>> index.index_doc(9, 15)
205    >>> index.apply((15, 15))
206    IFSet([9])
Note: See TracBrowser for help on using the repository browser.