1 | Field Indexes |
---|
2 | ============= |
---|
3 | |
---|
4 | Field indexes index orderable values. Note that they don't check for |
---|
5 | orderability. That is, all of the values added to the index must be |
---|
6 | orderable together. It is up to applications to provide only mutually |
---|
7 | orderable values. |
---|
8 | |
---|
9 | >>> from zope.index.field import FieldIndex |
---|
10 | |
---|
11 | >>> index = FieldIndex() |
---|
12 | >>> index.index_doc(0, 6) |
---|
13 | >>> index.index_doc(1, 26) |
---|
14 | >>> index.index_doc(2, 94) |
---|
15 | >>> index.index_doc(3, 68) |
---|
16 | >>> index.index_doc(4, 30) |
---|
17 | >>> index.index_doc(5, 68) |
---|
18 | >>> index.index_doc(6, 82) |
---|
19 | >>> index.index_doc(7, 30) |
---|
20 | >>> index.index_doc(8, 43) |
---|
21 | >>> index.index_doc(9, 15) |
---|
22 | |
---|
23 | Field indexes are searched with apply. The argument is a tuple |
---|
24 | with a minimum and maximum value: |
---|
25 | |
---|
26 | >>> index.apply((30, 70)) |
---|
27 | IFSet([3, 4, 5, 7, 8]) |
---|
28 | |
---|
29 | A common mistake is to pass a single value. If anything other than a |
---|
30 | two-tuple is passed, a type error is raised: |
---|
31 | |
---|
32 | >>> index.apply('hi') |
---|
33 | Traceback (most recent call last): |
---|
34 | ... |
---|
35 | TypeError: ('two-length tuple expected', 'hi') |
---|
36 | |
---|
37 | |
---|
38 | Open-ended ranges can be provided by provinding None as an end point: |
---|
39 | |
---|
40 | >>> index.apply((30, None)) |
---|
41 | IFSet([2, 3, 4, 5, 6, 7, 8]) |
---|
42 | |
---|
43 | >>> index.apply((None, 70)) |
---|
44 | IFSet([0, 1, 3, 4, 5, 7, 8, 9]) |
---|
45 | |
---|
46 | >>> index.apply((None, None)) |
---|
47 | IFSet([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) |
---|
48 | |
---|
49 | To do an exact value search, supply equal minimum and maximum values: |
---|
50 | |
---|
51 | >>> index.apply((30, 30)) |
---|
52 | IFSet([4, 7]) |
---|
53 | |
---|
54 | >>> index.apply((70, 70)) |
---|
55 | IFSet([]) |
---|
56 | |
---|
57 | Field indexes support basic statistics: |
---|
58 | |
---|
59 | >>> index.documentCount() |
---|
60 | 10 |
---|
61 | >>> index.wordCount() |
---|
62 | 8 |
---|
63 | |
---|
64 | Documents can be reindexed: |
---|
65 | |
---|
66 | >>> index.apply((15, 15)) |
---|
67 | IFSet([9]) |
---|
68 | >>> index.index_doc(9, 14) |
---|
69 | |
---|
70 | >>> index.apply((15, 15)) |
---|
71 | IFSet([]) |
---|
72 | >>> index.apply((14, 14)) |
---|
73 | IFSet([9]) |
---|
74 | |
---|
75 | Documents can be unindexed: |
---|
76 | |
---|
77 | >>> index.unindex_doc(7) |
---|
78 | >>> index.documentCount() |
---|
79 | 9 |
---|
80 | >>> index.wordCount() |
---|
81 | 8 |
---|
82 | >>> index.unindex_doc(8) |
---|
83 | >>> index.documentCount() |
---|
84 | 8 |
---|
85 | >>> index.wordCount() |
---|
86 | 7 |
---|
87 | |
---|
88 | >>> index.apply((30, 70)) |
---|
89 | IFSet([3, 4, 5]) |
---|
90 | |
---|
91 | Unindexing a document id that isn't present is ignored: |
---|
92 | |
---|
93 | >>> index.unindex_doc(8) |
---|
94 | >>> index.unindex_doc(80) |
---|
95 | >>> index.documentCount() |
---|
96 | 8 |
---|
97 | >>> index.wordCount() |
---|
98 | 7 |
---|
99 | |
---|
100 | We can also clear the index entirely: |
---|
101 | |
---|
102 | >>> index.clear() |
---|
103 | >>> index.documentCount() |
---|
104 | 0 |
---|
105 | >>> index.wordCount() |
---|
106 | 0 |
---|
107 | |
---|
108 | >>> index.apply((30, 70)) |
---|
109 | IFSet([]) |
---|
110 | |
---|
111 | Sorting |
---|
112 | ------- |
---|
113 | |
---|
114 | Field indexes also implement IIndexSort interface that |
---|
115 | provides a method for sorting document ids by their indexed |
---|
116 | values. |
---|
117 | |
---|
118 | >>> index.index_doc(1, 9) |
---|
119 | >>> index.index_doc(2, 8) |
---|
120 | >>> index.index_doc(3, 7) |
---|
121 | >>> index.index_doc(4, 6) |
---|
122 | >>> index.index_doc(5, 5) |
---|
123 | >>> index.index_doc(6, 4) |
---|
124 | >>> index.index_doc(7, 3) |
---|
125 | >>> index.index_doc(8, 2) |
---|
126 | >>> index.index_doc(9, 1) |
---|
127 | |
---|
128 | >>> list(index.sort([4, 2, 9, 7, 3, 1, 5])) |
---|
129 | [9, 7, 5, 4, 3, 2, 1] |
---|
130 | |
---|
131 | We can also specify the ``reverse`` argument to reverse results: |
---|
132 | |
---|
133 | >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], reverse=True)) |
---|
134 | [1, 2, 3, 4, 5, 7, 9] |
---|
135 | |
---|
136 | And as per IIndexSort, we can limit results by specifying the ``limit`` |
---|
137 | argument: |
---|
138 | |
---|
139 | >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], limit=3)) |
---|
140 | [9, 7, 5] |
---|
141 | |
---|
142 | If we pass an id that is not indexed by this index, it won't be included |
---|
143 | in the result. |
---|
144 | |
---|
145 | >>> list(index.sort([2, 10])) |
---|
146 | [2] |
---|
147 | |
---|
148 | >>> index.clear() |
---|
149 | |
---|
150 | Bugfix testing: |
---|
151 | --------------- |
---|
152 | Happened at least once that the value dropped out of the forward index, |
---|
153 | but the index still contains the object, the unindex broke |
---|
154 | |
---|
155 | >>> index.index_doc(0, 6) |
---|
156 | >>> index.index_doc(1, 26) |
---|
157 | >>> index.index_doc(2, 94) |
---|
158 | >>> index.index_doc(3, 68) |
---|
159 | >>> index.index_doc(4, 30) |
---|
160 | >>> index.index_doc(5, 68) |
---|
161 | >>> index.index_doc(6, 82) |
---|
162 | >>> index.index_doc(7, 30) |
---|
163 | >>> index.index_doc(8, 43) |
---|
164 | >>> index.index_doc(9, 15) |
---|
165 | |
---|
166 | >>> index.apply((None, None)) |
---|
167 | IFSet([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) |
---|
168 | |
---|
169 | Here is the damage: |
---|
170 | |
---|
171 | >>> del index._fwd_index[68] |
---|
172 | |
---|
173 | Unindex should succeed: |
---|
174 | |
---|
175 | >>> index.unindex_doc(5) |
---|
176 | >>> index.unindex_doc(3) |
---|
177 | |
---|
178 | >>> index.apply((None, None)) |
---|
179 | IFSet([0, 1, 2, 4, 6, 7, 8, 9]) |
---|
180 | |
---|
181 | |
---|
182 | Optimizations |
---|
183 | ------------- |
---|
184 | |
---|
185 | There is an optimization which makes sure that nothing is changed in the |
---|
186 | internal data structures if the value of the ducument was not changed. |
---|
187 | |
---|
188 | To test this optimization we patch the index instance to make sure unindex_doc |
---|
189 | is not called. |
---|
190 | |
---|
191 | >>> def unindex_doc(doc_id): |
---|
192 | ... raise KeyError |
---|
193 | >>> index.unindex_doc = unindex_doc |
---|
194 | |
---|
195 | Now we get a KeyError if we try to change the value. |
---|
196 | |
---|
197 | >>> index.index_doc(9, 14) |
---|
198 | Traceback (most recent call last): |
---|
199 | ... |
---|
200 | KeyError |
---|
201 | |
---|
202 | Leaving the value unchange doesn't call unindex_doc. |
---|
203 | |
---|
204 | >>> index.index_doc(9, 15) |
---|
205 | >>> index.apply((15, 15)) |
---|
206 | IFSet([9]) |
---|