[6211] | 1 | Field Indexes |
---|
| 2 | ============= |
---|
| 3 | |
---|
| 4 | Field indexes index orderable values. Note that they don't check for |
---|
| 5 | orderability. That is, all of the values added to the index must be |
---|
| 6 | orderable together. It is up to applications to provide only mutually |
---|
| 7 | orderable values. |
---|
| 8 | |
---|
| 9 | >>> from zope.index.field import FieldIndex |
---|
| 10 | |
---|
| 11 | >>> index = FieldIndex() |
---|
| 12 | >>> index.index_doc(0, 6) |
---|
| 13 | >>> index.index_doc(1, 26) |
---|
| 14 | >>> index.index_doc(2, 94) |
---|
| 15 | >>> index.index_doc(3, 68) |
---|
| 16 | >>> index.index_doc(4, 30) |
---|
| 17 | >>> index.index_doc(5, 68) |
---|
| 18 | >>> index.index_doc(6, 82) |
---|
| 19 | >>> index.index_doc(7, 30) |
---|
| 20 | >>> index.index_doc(8, 43) |
---|
| 21 | >>> index.index_doc(9, 15) |
---|
| 22 | |
---|
| 23 | Field indexes are searched with apply. The argument is a tuple |
---|
| 24 | with a minimum and maximum value: |
---|
| 25 | |
---|
| 26 | >>> index.apply((30, 70)) |
---|
| 27 | IFSet([3, 4, 5, 7, 8]) |
---|
| 28 | |
---|
| 29 | A common mistake is to pass a single value. If anything other than a |
---|
| 30 | two-tuple is passed, a type error is raised: |
---|
| 31 | |
---|
| 32 | >>> index.apply('hi') |
---|
| 33 | Traceback (most recent call last): |
---|
| 34 | ... |
---|
| 35 | TypeError: ('two-length tuple expected', 'hi') |
---|
| 36 | |
---|
| 37 | |
---|
| 38 | Open-ended ranges can be provided by provinding None as an end point: |
---|
| 39 | |
---|
| 40 | >>> index.apply((30, None)) |
---|
| 41 | IFSet([2, 3, 4, 5, 6, 7, 8]) |
---|
| 42 | |
---|
| 43 | >>> index.apply((None, 70)) |
---|
| 44 | IFSet([0, 1, 3, 4, 5, 7, 8, 9]) |
---|
| 45 | |
---|
| 46 | >>> index.apply((None, None)) |
---|
| 47 | IFSet([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) |
---|
| 48 | |
---|
| 49 | To do an exact value search, supply equal minimum and maximum values: |
---|
| 50 | |
---|
| 51 | >>> index.apply((30, 30)) |
---|
| 52 | IFSet([4, 7]) |
---|
| 53 | |
---|
| 54 | >>> index.apply((70, 70)) |
---|
| 55 | IFSet([]) |
---|
| 56 | |
---|
| 57 | Field indexes support basic statistics: |
---|
| 58 | |
---|
| 59 | >>> index.documentCount() |
---|
| 60 | 10 |
---|
| 61 | >>> index.wordCount() |
---|
| 62 | 8 |
---|
| 63 | |
---|
| 64 | Documents can be reindexed: |
---|
| 65 | |
---|
| 66 | >>> index.apply((15, 15)) |
---|
| 67 | IFSet([9]) |
---|
| 68 | >>> index.index_doc(9, 14) |
---|
| 69 | |
---|
| 70 | >>> index.apply((15, 15)) |
---|
| 71 | IFSet([]) |
---|
| 72 | >>> index.apply((14, 14)) |
---|
| 73 | IFSet([9]) |
---|
| 74 | |
---|
| 75 | Documents can be unindexed: |
---|
| 76 | |
---|
| 77 | >>> index.unindex_doc(7) |
---|
| 78 | >>> index.documentCount() |
---|
| 79 | 9 |
---|
| 80 | >>> index.wordCount() |
---|
| 81 | 8 |
---|
| 82 | >>> index.unindex_doc(8) |
---|
| 83 | >>> index.documentCount() |
---|
| 84 | 8 |
---|
| 85 | >>> index.wordCount() |
---|
| 86 | 7 |
---|
| 87 | |
---|
| 88 | >>> index.apply((30, 70)) |
---|
| 89 | IFSet([3, 4, 5]) |
---|
| 90 | |
---|
| 91 | Unindexing a document id that isn't present is ignored: |
---|
| 92 | |
---|
| 93 | >>> index.unindex_doc(8) |
---|
| 94 | >>> index.unindex_doc(80) |
---|
| 95 | >>> index.documentCount() |
---|
| 96 | 8 |
---|
| 97 | >>> index.wordCount() |
---|
| 98 | 7 |
---|
| 99 | |
---|
| 100 | We can also clear the index entirely: |
---|
| 101 | |
---|
| 102 | >>> index.clear() |
---|
| 103 | >>> index.documentCount() |
---|
| 104 | 0 |
---|
| 105 | >>> index.wordCount() |
---|
| 106 | 0 |
---|
| 107 | |
---|
| 108 | >>> index.apply((30, 70)) |
---|
| 109 | IFSet([]) |
---|
| 110 | |
---|
| 111 | Sorting |
---|
| 112 | ------- |
---|
| 113 | |
---|
| 114 | Field indexes also implement IIndexSort interface that |
---|
| 115 | provides a method for sorting document ids by their indexed |
---|
| 116 | values. |
---|
| 117 | |
---|
| 118 | >>> index.index_doc(1, 9) |
---|
| 119 | >>> index.index_doc(2, 8) |
---|
| 120 | >>> index.index_doc(3, 7) |
---|
| 121 | >>> index.index_doc(4, 6) |
---|
| 122 | >>> index.index_doc(5, 5) |
---|
| 123 | >>> index.index_doc(6, 4) |
---|
| 124 | >>> index.index_doc(7, 3) |
---|
| 125 | >>> index.index_doc(8, 2) |
---|
| 126 | >>> index.index_doc(9, 1) |
---|
| 127 | |
---|
| 128 | >>> list(index.sort([4, 2, 9, 7, 3, 1, 5])) |
---|
| 129 | [9, 7, 5, 4, 3, 2, 1] |
---|
| 130 | |
---|
| 131 | We can also specify the ``reverse`` argument to reverse results: |
---|
| 132 | |
---|
| 133 | >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], reverse=True)) |
---|
| 134 | [1, 2, 3, 4, 5, 7, 9] |
---|
| 135 | |
---|
| 136 | And as per IIndexSort, we can limit results by specifying the ``limit`` |
---|
| 137 | argument: |
---|
| 138 | |
---|
| 139 | >>> list(index.sort([4, 2, 9, 7, 3, 1, 5], limit=3)) |
---|
| 140 | [9, 7, 5] |
---|
| 141 | |
---|
| 142 | If we pass an id that is not indexed by this index, it won't be included |
---|
| 143 | in the result. |
---|
| 144 | |
---|
| 145 | >>> list(index.sort([2, 10])) |
---|
| 146 | [2] |
---|
| 147 | |
---|
| 148 | >>> index.clear() |
---|
| 149 | |
---|
| 150 | Bugfix testing: |
---|
| 151 | --------------- |
---|
| 152 | Happened at least once that the value dropped out of the forward index, |
---|
| 153 | but the index still contains the object, the unindex broke |
---|
| 154 | |
---|
| 155 | >>> index.index_doc(0, 6) |
---|
| 156 | >>> index.index_doc(1, 26) |
---|
| 157 | >>> index.index_doc(2, 94) |
---|
| 158 | >>> index.index_doc(3, 68) |
---|
| 159 | >>> index.index_doc(4, 30) |
---|
| 160 | >>> index.index_doc(5, 68) |
---|
| 161 | >>> index.index_doc(6, 82) |
---|
| 162 | >>> index.index_doc(7, 30) |
---|
| 163 | >>> index.index_doc(8, 43) |
---|
| 164 | >>> index.index_doc(9, 15) |
---|
| 165 | |
---|
| 166 | >>> index.apply((None, None)) |
---|
| 167 | IFSet([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) |
---|
| 168 | |
---|
| 169 | Here is the damage: |
---|
| 170 | |
---|
| 171 | >>> del index._fwd_index[68] |
---|
| 172 | |
---|
| 173 | Unindex should succeed: |
---|
| 174 | |
---|
| 175 | >>> index.unindex_doc(5) |
---|
| 176 | >>> index.unindex_doc(3) |
---|
| 177 | |
---|
| 178 | >>> index.apply((None, None)) |
---|
| 179 | IFSet([0, 1, 2, 4, 6, 7, 8, 9]) |
---|
| 180 | |
---|
| 181 | |
---|
| 182 | Optimizations |
---|
| 183 | ------------- |
---|
| 184 | |
---|
| 185 | There is an optimization which makes sure that nothing is changed in the |
---|
| 186 | internal data structures if the value of the ducument was not changed. |
---|
| 187 | |
---|
| 188 | To test this optimization we patch the index instance to make sure unindex_doc |
---|
| 189 | is not called. |
---|
| 190 | |
---|
| 191 | >>> def unindex_doc(doc_id): |
---|
| 192 | ... raise KeyError |
---|
| 193 | >>> index.unindex_doc = unindex_doc |
---|
| 194 | |
---|
| 195 | Now we get a KeyError if we try to change the value. |
---|
| 196 | |
---|
| 197 | >>> index.index_doc(9, 14) |
---|
| 198 | Traceback (most recent call last): |
---|
| 199 | ... |
---|
| 200 | KeyError |
---|
| 201 | |
---|
| 202 | Leaving the value unchange doesn't call unindex_doc. |
---|
| 203 | |
---|
| 204 | >>> index.index_doc(9, 15) |
---|
| 205 | >>> index.apply((15, 15)) |
---|
| 206 | IFSet([9]) |
---|