Changeset 7063 for main/waeup.sirp/trunk/src/waeup/sirp/imagestorage.py
- Timestamp:
- 9 Nov 2011, 15:42:45 (13 years ago)
- Location:
- main/waeup.sirp/trunk
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
main/waeup.sirp/trunk
- Property svn:mergeinfo changed
/main/waeup.sirp/branches/ulif-extimgstore (added) merged: 7001-7002,7010-7011,7016,7031-7041,7043-7044,7046-7055
- Property svn:mergeinfo changed
-
main/waeup.sirp/trunk/src/waeup/sirp/imagestorage.py
r6980 r7063 21 21 ## 22 22 """A storage for image files. 23 24 A few words about storing files with ``waeup.sirp``. The need for this 25 feature arised initially from the need to store passport files for 26 applicants and students. These files are dynamic (can be changed 27 anytime), mean a lot of traffic and cost a lot of memory/disk space. 28 29 **Design Basics** 30 31 While one *can* store images and similar 'large binary objects' aka 32 blobs in the ZODB, this approach quickly becomes cumbersome and 33 difficult to understand. The worst approach here would be to store 34 images as regular byte-stream objects. ZODB supports this but 35 obviously access is slow (data must be looked up in the one 36 ``Data.fs`` file, each file has to be sent to the ZEO server and back, 37 etc.). 38 39 A bit less worse is the approach to store images in the ZODB but as 40 Blobs. ZODB supports storing blobs in separate files in order to 41 accelerate lookup/retrieval of these files. The files, however, have 42 to be sent to the ZEO server (and back on lookups) which means a 43 bottleneck and will easily result in an increased number of 44 ``ConflictErrors`` even on simple reads. 45 46 The advantage of both ZODB-geared approaches is, of course, complete 47 database consistency. ZODB will guarantee that your files are 48 available under some object name and can be handled as any other 49 Python object. 50 51 Another approach is to leave the ZODB behind and to store images and 52 other files in filesystem directly. This is faster (no ZEO contacts, 53 etc.), reduces probability of `ConflictErrors`, keeps the ZODB 54 smaller, and enables direct access (over filesystem) to the 55 files. Furthermore steps might be better understandable for 56 third-party developers. We opted for this last option. 57 58 **External File Store** 59 60 Our implementation for storing-files-API is defined in 61 :class:`ExtFileStore`. An instance of this file storage (which is also 62 able to store non-image files) is available at runtime as a global 63 utility implementing :class:`waeup.sirp.interfaces.IExtFileStore`. 64 65 The main task of this central component is to maintain a filesystem 66 root path for all files to be stored. It also provides methods to 67 store/get files under certain file ids which identify certain files 68 locally. 69 70 So, to store a file away, you can do something like this: 71 72 >>> from StringIO import StringIO 73 >>> from zope.component import getUtility 74 >>> from waeup.sirp.interfaces import IExtFileStore 75 >>> store = getUtility(IExtFileStore) 76 >>> store.createFile('myfile.txt', StringIO('some file content')) 77 78 All you need is a filename and the file-like object containing the 79 real file data. 80 81 This will store the file somewhere (you shouldn't make too much 82 assumptions about the real filesystem path here). 83 84 Later, we can get the file back like this: 85 86 >>> store.getFile('myfile.txt') 87 <open file ...> 88 89 What we get back is a file or file-like object already opened for 90 reading: 91 92 >>> store.getFile('myfile.txt').read() 93 'some file content' 94 95 **Handlers: Special Places for Special Files** 96 97 The file store supports special handling for certain files. For 98 example we want applicant images to be stored in a different directory 99 than student images, etc. Because the file store cannot know all 100 details about these special tratment of certain files, it looks up 101 helpers (handlers) to provide the information it needs for really 102 storing the files at the correct location. 103 104 That a file stored in filestore needs special handling can be 105 indicated by special filenames. These filenames start with a marker like 106 this:: 107 108 __<MARKER-STRING>__real-filename.jpg 109 110 Please note the double underscores before and after the marker 111 string. They indicate that all in between is a marker. 112 113 If you store a file in file store with such a filename (we call this a 114 `file_id` to distuingish it from real world filenames), the file store 115 will look up a handler for ``<MARKER-STRING>`` and pass it the file to 116 store. The handler then will return the internal path to store the 117 file and possibly do additional things as well like validating the 118 file or similar. 119 120 Examples for such a file store handler can be found in the 121 :mod:`waeup.sirp.applicants.applicant` module. Please see also the 122 :class:`DefaultFileStoreHandler` class below for more details. 123 124 The file store looks up handlers by utility lookups: it looks for a 125 named utiliy providing 126 :class:`waeup.sirp.interfaces.IFileStoreHandler` and named like the 127 marker string (without leading/trailing underscores) in lower 128 case. For example if the file id would be 129 130 ``__IMG_USER__manfred.jpg`` 131 132 then the looked up utility should be registered under name 133 134 ``img_user`` 135 136 and provide :class:`waeup.sirp.interfaces.IFileStoreHandler`. If no 137 such utility can be found, a default handler is used instead 138 (see :class:`DefaultFileStoreHandler`). 139 140 **Context Adapters: Knowing Your Family** 141 142 Often the internal filename or file id of a file depends on a 143 context. For example when we store passport photographs of applicants, 144 then each image belongs to a certain applicant instance. It is not 145 difficult to maintain such a connection manually: Say every applicant 146 had an id, then we could put this id into the filename as well and 147 would build the filename to store/get the connected file by using that 148 filename. You then would create filenames of a format like this:: 149 150 __<MARKER-STRING>__applicant0001.jpg 151 152 where ``applicant0001`` would tell exactly which applicant you can see 153 on the photograph. You notice that the internal file id might have 154 nothing to do with once uploaded filenames. The id above could have 155 been uploaded with filename ``manfred.jpg`` but with the new file id 156 we are able to find the file again later. 157 158 Unfortunately it might soon get boring or cumbersome to retype this 159 building of filenames for a certain type of context, especially if 160 your filenames take more of the context into account than only a 161 simple id. 162 163 Therefore you can define filename building for a context as an adapter 164 that then could be looked up by other components simply by doing 165 something like: 166 167 >>> from waeup.sirp.interfaces import IFileStoreNameChooser 168 >>> file_id = IFileStoreNameChooser(my_context_obj) 169 170 If you later want to change the way file ids are created from a 171 certain context, you only have to change the adapter implementation 172 accordingly. 173 174 Note, that this is only a convenience component. You don't have to 175 define context adapters but it makes things easier for others if you 176 do, as you don't have to remember the exact file id creation method 177 all the time and can change things quick and in only one location if 178 you need to do so. 179 180 Please see the :class:`FileStoreNameChooser` default implementation 181 below for details. 182 23 183 """ 24 184 import grok 25 import hashlib26 185 import os 27 import transaction 28 import warnings 29 from StringIO import StringIO 30 from ZODB.blob import Blob 31 from persistent import Persistent 186 import tempfile 187 from hurry.file import HurryFile 32 188 from hurry.file.interfaces import IFileRetrieval 33 from waeup.sirp.image import WAeUPImageFile 34 from waeup.sirp.utils.helpers import cmp_files 35 36 def md5digest(fd): 37 """Get an MD5 hexdigest for the file stored in `fd`. 38 39 `fd` 40 a file object open for reading. 41 189 from zope.component import queryUtility 190 from zope.interface import Interface 191 from waeup.sirp.interfaces import ( 192 IFileStoreNameChooser, IExtFileStore, IFileStoreHandler,) 193 194 class FileStoreNameChooser(grok.Adapter): 195 """Default file store name chooser. 196 197 File store name choosers pick a file id, a string, for a certain 198 context object. They are normally registered as adapters for a 199 certain content type and know how to build the file id for this 200 special type of context. 201 202 Provides the :class:`waeup.sirp.interfaces.IFileStoreNameChooser` 203 interface. 204 205 This default file name chosser accepts almost every name as long 206 as it is a string or unicode object. 42 207 """ 43 return hashlib.md5(fd.read()).hexdigest() 44 45 class Basket(grok.Container): 46 """A basket holds a set of image files with same hash. 208 grok.context(Interface) 209 grok.implements(IFileStoreNameChooser) 210 211 def checkName(self, name): 212 """Check whether an object name is valid. 213 214 Raises a user error if the name is not valid. 215 216 For the default file store name chooser any name is valid. 217 """ 218 if isinstance(name, basestring): 219 return True 220 return False 221 222 def chooseName(self, name): 223 """Choose a unique valid name for the object. 224 225 The given name and object may be taken into account when 226 choosing the name. 227 228 chooseName is expected to always choose a valid name (that 229 would pass the checkName test) and never raise an error. 230 231 For this default name chooser we return the given name if it 232 is valid or ``unknown_file`` else. 233 """ 234 if self.checkName(name): 235 return name 236 return u'unknown_file' 237 238 class ExtFileStore(object): 239 """External file store. 240 241 External file stores are meant to store files 'externally' of the 242 ZODB, i.e. in filesystem. 243 244 Most important attribute of the external file store is the `root` 245 path which gives the path to the location where files will be 246 stored within. 247 248 By default `root` is a ``'media/'`` directory in the root of the 249 datacenter root of a site. 250 251 The `root` attribute is 'read-only' because you normally don't 252 want to change this path -- it is dynamic. That means, if you call 253 the file store from 'within' a site, the root path will be located 254 inside this site (a :class:`waeup.sirp.University` instance). If 255 you call it from 'outside' a site some temporary dir (always the 256 same during lifetime of the file store instance) will be used. The 257 term 'temporary' tells what you can expect from this path 258 persistence-wise. 259 260 If you insist, you can pass a root path on initialization to the 261 constructor but when calling from within a site afterwards, the 262 site will override your setting for security measures. This way 263 you can safely use one file store for different sites in a Zope 264 instance simultanously and files from one site won't show up in 265 another. 266 267 An ExtFileStore instance is available as a global utility 268 implementing :class:`waeup.sirp.interfaces.IExtFileStore`. 269 270 To add and retrieve files from the storage, use the appropriate 271 methods below. 47 272 """ 48 273 49 def _del(self): 50 """Remove temporary files associated with local blobs. 51 52 A basket holds files as Blob objects. Unfortunately, if a 53 basket was not committed (put into ZODB), those blobs linger 54 around as real files in some temporary directory and won't be 55 removed. 56 57 This is a helper function to remove all those uncommitted 58 blobs that has to be called explicitly, for instance in tests. 59 """ 60 key_list = list(self.keys()) 61 for key in key_list: 62 item = self[key] 63 if getattr(item, '_p_oid', None): 64 # Don't mess around with blobs in ZODB 65 continue 66 fd = item.open('r') 67 name = getattr(fd, 'name', None) 68 fd.close() 69 if name is not None and os.path.exists(name): 70 os.unlink(name) 71 del self[key] 274 grok.implements(IExtFileStore) 275 276 _root = None 277 278 @property 279 def root(self): 280 """Root dir of this storage. 281 282 The root dir is a readonly value determined dynamically. It 283 holds media files for sites or other components. 284 285 If a site is available we return a ``media/`` dir in the 286 datacenter storage dir. 287 288 Otherwise we create a temporary dir which will be remembered 289 on next call. 290 291 If a site exists and has a datacenter, it has always 292 precedence over temporary dirs, also after a temporary 293 directory was created. 294 295 Please note that retrieving `root` is expensive. You might 296 want to store a copy once retrieved in order to minimize the 297 number of calls to `root`. 298 299 """ 300 site = grok.getSite() 301 if site is not None: 302 root = os.path.join(site['datacenter'].storage, 'media') 303 return root 304 if self._root is None: 305 self._root = tempfile.mkdtemp() 306 return self._root 307 308 def __init__(self, root=None): 309 self._root = root 72 310 return 73 311 74 def getInternalId(self, fd): 75 """Get the basket-internal id for the file stored in `fd`. 76 77 `fd` must be a file open for reading. If an (byte-wise) equal 78 file can be found in the basket, its internal id (basket id) 79 is returned, ``None`` otherwise. 80 """ 81 fd.seek(0) 82 for key, val in self.items(): 83 fd_stored = val.open('r') 84 file_len = os.stat(fd_stored.name)[6] 85 if file_len == 0: 86 # Nasty workaround. Blobs seem to suffer from being emptied 87 # accidentally. 88 site = grok.getSite() 89 if site is not None: 90 site.logger.warn( 91 'Empty Blob detected: %s' % fd_stored.name) 92 warnings.warn("EMPTY BLOB DETECTED: %s" % fd_stored.name) 93 fd_stored.close() 94 val.open('w').write(fd.read()) 95 return key 96 fd_stored.seek(0) 97 if cmp_files(fd, fd_stored): 98 fd_stored.close() 99 return key 100 fd_stored.close() 101 return None 102 103 @property 104 def curr_id(self): 105 """The current basket id. 106 107 An integer number which is not yet in use. If there are 108 already `maxint` entries in the basket, a :exc:`ValueError` is 109 raised. The latter is _highly_ unlikely. It would mean to have 110 more than 2**32 hash collisions, i.e. so many files with the 111 same MD5 sum. 112 """ 113 num = 1 114 while True: 115 if str(num) not in self.keys(): 116 return str(num) 117 num += 1 118 if num <= 0: 119 name = getattr(self, '__name__', None) 120 raise ValueError('Basket full: %s' % name) 121 122 def storeFile(self, fd, filename): 123 """Store the file in `fd` into the basket. 124 125 The file will be stored in a Blob. 126 """ 127 fd.seek(0) 128 internal_id = self.getInternalId(fd) # Moves file pointer! 129 if internal_id is None: 130 internal_id = self.curr_id 131 fd.seek(0) 132 self[internal_id] = Blob() 133 transaction.commit() # Urgently needed to make the Blob 134 # persistent. Took me ages to find 135 # out that solution, which makes some 136 # design flaw in ZODB Blobs likely. 137 self[internal_id].open('w').write(fd.read()) 138 fd.seek(0) 139 self._p_changed = True 140 return internal_id 141 142 def retrieveFile(self, basket_id): 143 """Retrieve a file open for reading with basket id `basket_id`. 144 145 If there is no such id, ``None`` is returned. It is the 146 callers responsibility to close the open file. 147 """ 148 if basket_id in self.keys(): 149 return self[basket_id].open('r') 150 return None 151 152 class ImageStorage(grok.Container): 153 """A container for image files. 312 def getFile(self, file_id): 313 """Get a file stored under file ID `file_id`. 314 315 Returns a file already opened for reading. 316 317 If the file cannot be found ``None`` is returned. 318 319 This methods takes into account registered handlers for any 320 marker put into the file_id. 321 322 .. seealso:: :class:`DefaultFileStoreHandler` 323 """ 324 marker, filename, base, ext = self.extractMarker(file_id) 325 handler = queryUtility(IFileStoreHandler, name=marker, 326 default=DefaultFileStoreHandler()) 327 path = handler.pathFromFileID(self, self.root, file_id) 328 if not os.path.exists(path): 329 return None 330 fd = open(path, 'rb') 331 return fd 332 333 def getFileByContext(self, context): 334 """Get a file for given context. 335 336 Returns a file already opened for reading. 337 338 If the file cannot be found ``None`` is returned. 339 340 This method takes into account registered handlers and file 341 name choosers for context types. 342 343 This is a convenience method that internally calls 344 :meth:`getFile`. 345 346 .. seealso:: :class:`FileStoreNameChooser`, 347 :class:`DefaultFileStoreHandler`. 348 """ 349 file_id = IFileStoreNameChooser(context).chooseName() 350 return self.getFile(file_id) 351 352 def createFile(self, filename, f): 353 """Store a file. 354 """ 355 file_id = filename 356 root = self.root # Calls to self.root are expensive 357 marker, filename, base, ext = self.extractMarker(file_id) 358 handler = queryUtility(IFileStoreHandler, name=marker, 359 default=DefaultFileStoreHandler()) 360 f, path, file_obj = handler.createFile( 361 self, root, file_id, filename, f) 362 dirname = os.path.dirname(path) 363 if not os.path.exists(dirname): 364 os.makedirs(dirname, 0755) 365 open(path, 'wb').write(f.read()) 366 return file_obj 367 368 def extractMarker(self, file_id): 369 """split filename into marker, filename, basename, and extension. 370 371 A marker is a leading part of a string of form 372 ``__MARKERNAME__`` followed by the real filename. This way we 373 can put markers into a filename to request special processing. 374 375 Returns a quadruple 376 377 ``(marker, filename, basename, extension)`` 378 379 where ``marker`` is the marker in lowercase, filename is the 380 complete trailing real filename, ``basename`` is the basename 381 of the filename and ``extension`` the filename extension of 382 the trailing filename. See examples below. 383 384 Example: 385 386 >>> extractMarker('__MaRkEr__sample.jpg') 387 ('marker', 'sample.jpg', 'sample', '.jpg') 388 389 If no marker is contained, we assume the whole string to be a 390 real filename: 391 392 >>> extractMarker('no-marker.txt') 393 ('', 'no-marker.txt', 'no-marker', '.txt') 394 395 Filenames without extension give an empty extension string: 396 397 >>> extractMarker('no-marker') 398 ('', 'no-marker', 'no-marker', '') 399 400 """ 401 if not isinstance(file_id, basestring) or not file_id: 402 return ('', '', '', '') 403 parts = file_id.split('__', 2) 404 marker = '' 405 if len(parts) == 3 and parts[0] == '': 406 marker = parts[1].lower() 407 file_id = parts[2] 408 basename, ext = os.path.splitext(file_id) 409 return (marker, file_id, basename, ext) 410 411 grok.global_utility(ExtFileStore, provides=IExtFileStore) 412 413 class DefaultStorage(ExtFileStore): 414 """Default storage for files. 415 416 Registered globally as utility for 417 :class:`hurry.file.interfaces.IFileRetrieval`. 154 418 """ 155 def _del(self): 156 for basket in self.values(): 157 try: 158 basket._del() 159 except: 160 pass 161 162 def storeFile(self, fd, filename): 163 fd.seek(0) 164 digest = md5digest(fd) 165 fd.seek(0) 166 if not digest in self.keys(): 167 self[digest] = Basket() 168 basket_id = self[digest].storeFile(fd, filename) 169 full_id = "%s-%s" % (digest, basket_id) 170 return full_id 171 172 def retrieveFile(self, file_id): 173 if not '-' in file_id: 174 return None 175 full_id, basket_id = file_id.split('-', 1) 176 if not full_id in self.keys(): 177 return None 178 return self[full_id].retrieveFile(basket_id) 179 180 class ImageStorageFileRetrieval(Persistent): 181 grok.implements(IFileRetrieval) 182 183 def getImageStorage(self): 184 site = grok.getSite() 185 if site is None: 186 return None 187 return site.get('images', None) 188 189 def isImageStorageEnabled(self): 190 site = grok.getSite() 191 if site is None: 192 return False 193 if site.get('images', None) is None: 194 return False 195 return True 196 197 def getFile(self, data): 198 # ImageStorage is disabled, so give fall-back behaviour for 199 # testing without ImageStorage 200 if not self.isImageStorageEnabled(): 201 return StringIO(data) 202 storage = self.getImageStorage() 203 if storage is None: 204 raise ValueError('Cannot find an image storage') 205 result = storage.retrieveFile(data) 206 if result is None: 207 return StringIO(data) 208 return storage.retrieveFile(data) 209 210 def createFile(self, filename, f): 211 if not self.isImageStorageEnabled(): 212 return WAeUPImageFile(filename, f.read()) 213 storage = self.getImageStorage() 214 if storage is None: 215 raise ValueError('Cannot find an image storage') 216 file_id = storage.storeFile(f, filename) 217 return WAeUPImageFile(filename, file_id) 419 grok.provides(IFileRetrieval) 420 421 grok.global_utility(DefaultStorage, provides=IFileRetrieval) 422 423 class DefaultFileStoreHandler(grok.GlobalUtility): 424 """A default handler for external file store. 425 426 This handler is the fallback called by external file stores when 427 there is no or an unknown marker in the file id. 428 429 Registered globally as utility for 430 :class:`waeup.sirp.interfaces.IFileStoreHandler`. 431 """ 432 grok.implements(IFileStoreHandler) 433 434 def pathFromFileID(self, store, root, file_id): 435 """Return the root path of external file store appended by file id. 436 """ 437 return os.path.join(root, file_id) 438 439 def createFile(self, store, root, filename, file_id, f): 440 """Infos about what to store exactly and where. 441 442 When a file should be handled by an external file storage, it 443 looks up any handlers (like this one), passes runtime infos 444 like the storage object, root path, filename, file_id, and the 445 raw file object itself. 446 447 The handler can then change the file, raise exceptions or 448 whatever and return the result. 449 450 This handler returns the input file as-is, a path returned by 451 :meth:`pathFromFileID` and an instance of 452 :class:`hurry.file.HurryFile` for further operations. 453 454 Please note: although a handler has enough infos to store the 455 file itself, it should leave that task to the calling file 456 store. 457 """ 458 path = self.pathFromFileID(store, root, file_id) 459 return f, path, HurryFile(filename, file_id)
Note: See TracChangeset for help on using the changeset viewer.