source: main/waeup.sirp/branches/ulif-extimgstore/src/waeup/sirp/imagestorage.py @ 7011

Last change on this file since 7011 was 7010, checked in by uli, 13 years ago
  • Give an overview over file handling with the external file store.
  • Minor fixes.
File size: 20.3 KB
Line 
1##
2## imagestorage.py
3## Login : <uli@pu.smp.net>
4## Started on  Mon Jul  4 16:02:14 2011 Uli Fouquet
5## $Id$
6##
7## Copyright (C) 2011 Uli Fouquet
8## This program is free software; you can redistribute it and/or modify
9## it under the terms of the GNU General Public License as published by
10## the Free Software Foundation; either version 2 of the License, or
11## (at your option) any later version.
12##
13## This program is distributed in the hope that it will be useful,
14## but WITHOUT ANY WARRANTY; without even the implied warranty of
15## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
16## GNU General Public License for more details.
17##
18## You should have received a copy of the GNU General Public License
19## along with this program; if not, write to the Free Software
20## Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
21##
22"""A storage for image files.
23
24A few words about storing files with ``waeup.sirp``. The need for this
25feature arised initially from the need to store passport files for
26applicants and students. These files are dynamic (can be changed
27anytime), mean a lot of traffic and cost a lot of memory/disk space.
28
29**Design Basics**
30
31While one *can* store images and similar 'large binary objects' aka
32blobs in the ZODB, this approach quickly becomes cumbersome and
33difficult to understand. The worst approach here would be to store
34images as regular byte-stream objects. ZODB supports this but
35obviously access is slow (data must be looked up in the one
36``Data.fs`` file, each file has to be sent to the ZEO server and back,
37etc.).
38
39A bit less worse is the approach to store images in the ZODB but as
40Blobs. ZODB supports storing blobs in separate files in order to
41accelerate lookup/retrieval of these files. The files, however, have
42to be sent to the ZEO server (and back on lookups) which means a
43bottleneck and will easily result in an increased number of
44``ConflictErrors`` even on simple reads.
45
46The advantage of both ZODB-geared approaches is, of course, complete
47database consistency. ZODB will guarantee that your files are
48available under some object name and can be handled as any other
49Python object.
50
51Another approach is to leave the ZODB behind and to store images and
52other files in filesystem directly. This is faster (no ZEO contacts,
53etc.), reduces probability of `ConflictErrors`, keeps the ZODB
54smaller, and enables direct access (over filesystem) to the
55files. Furthermore steps might be better understandable for
56third-party developers. We opted for this last option.
57
58**External File Store**
59
60Our implementation for storing-files-API is defined in
61:class:`ExtFileStore`. An instance of this file storage (which is also
62able to store non-image files) is available at runtime as a global
63utility implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
64
65The main task of this central component is to maintain a filesystem
66root path for all files to be stored. It also provides methods to
67store/get files under certain file ids which identify certain files
68locally.
69
70So, to store a file away, you can do something like this:
71
72  >>> from StringIO import StringIO
73  >>> from zope.component import getUtility
74  >>> from waeup.sirp.interfaces import IExtFileStore
75  >>> store = getUtility(IExtFileStore)
76  >>> store.createFile('myfile.txt', StringIO('some file content'))
77
78All you need is a filename and the file-like object containing the
79real file data.
80
81This will store the file somewhere (you shouldn't make too much
82assumptions about the real filesystem path here).
83
84Later, we can get the file back like this:
85
86  >>> store.getFile('myfile.txt')
87  <open file ...>
88
89What we get back is a file or file-like object already opened for
90reading:
91
92  >>> store.getFile('myfile.txt').read()
93  'some file content'
94
95**Handlers: Special Places for Special Files**
96
97The file store supports special handling for certain files. For
98example we want applicant images to be stored in a different directory
99than student images, etc. Because the file store cannot know all
100details about these special tratment of certain files, it looks up
101helpers (handlers) to provide the information it needs for really
102storing the files at the correct location.
103
104That a file stored in filestore needs special handling can be
105indicated by special filenames. These filenames start with a marker like
106this::
107
108  __<MARKER-STRING>__real-filename.jpg
109
110Please note the double underscores before and after the marker
111string. They indicate that all in between is a marker.
112
113If you store a file in file store with such a filename (we call this a
114`file_id` to distuingish it from real world filenames), the file store
115will look up a handler for ``<MARKER-STRING>`` and pass it the file to
116store. The handler then will return the internal path to store the
117file and possibly do additional things as well like validating the
118file or similar.
119
120Examples for such a file store handler can be found in the
121:mod:`waeup.sirp.applicants.applicant` module. Please see also the
122:class:`DefaultFileStoreHandler` class below for more details.
123
124The file store looks up handlers by utility lookups: it looks for a
125named utiliy providing
126:class:`waeup.sirp.interfaces.IFileStoreHandler` and named like the
127marker string (without leading/trailing underscores) in lower
128case. For example if the file id would be
129
130  ``__IMG_USER__manfred.jpg``
131
132then the looked up utility should be registered under name
133
134  ``img_user``
135
136and provide :class:`waeup.sirp.interfaces.IFileStoreHandler`. If no
137such utility can be found, a default handler is used instead
138(see :class:`DefaultFileStoreHandler`).
139
140**Context Adapters: Knowing Your Family**
141
142Often the internal filename or file id of a file depends on a
143context. For example when we store passport photographs of applicants,
144then each image belongs to a certain applicant instance. It is not
145difficult to maintain such a connection manually: Say every applicant
146had an id, then we could put this id into the filename as well and
147would build the filename to store/get the connected file by using that
148filename. You then would create filenames of a format like this::
149
150  __<MARKER-STRING>__applicant0001.jpg
151
152where ``applicant0001`` would tell exactly which applicant you can see
153on the photograph. You notice that the internal file id might have
154nothing to do with once uploaded filenames. The id above could have
155been uploaded with filename ``manfred.jpg`` but with the new file id
156we are able to find the file again later.
157
158Unfortunately it might soon get boring or cumbersome to retype this
159building of filenames for a certain type of context, especially if
160your filenames take more of the context into account than only a
161simple id.
162
163Therefore you can define filename building for a context as an adapter
164that then could be looked up by other components simply by doing
165something like:
166
167  >>> from waeup.sirp.interfaces import IFileStoreNameChooser
168  >>> file_id = IFileStoreNameChooser(my_context_obj)
169
170If you later want to change the way file ids are created from a
171certain context, you only have to change the adapter implementation
172accordingly.
173
174Note, that this is only a convenience component. You don't have to
175define context adapters but it makes things easier for others if you
176do, as you don't have to remember the exact file id creation method
177all the time and can change things quick and in only one location if
178you need to do so.
179
180Please see the :class:`FileStoreNameChooser` default implementation
181below for details.
182
183"""
184import grok
185import hashlib
186import os
187import tempfile
188import transaction
189import warnings
190from StringIO import StringIO
191from ZODB.blob import Blob
192from persistent import Persistent
193from hurry.file import HurryFile
194from hurry.file.interfaces import IFileRetrieval
195from zope.component import queryUtility
196from zope.interface import Interface
197from waeup.sirp.image import WAeUPImageFile
198from waeup.sirp.interfaces import (
199    IFileStoreNameChooser, IExtFileStore, IFileStoreHandler,)
200from waeup.sirp.utils.helpers import cmp_files
201
202def md5digest(fd):
203    """Get an MD5 hexdigest for the file stored in `fd`.
204
205    `fd`
206      a file object open for reading.
207
208    """
209    return hashlib.md5(fd.read()).hexdigest()
210
211class FileStoreNameChooser(grok.Adapter):
212    grok.context(Interface)
213    grok.implements(IFileStoreNameChooser)
214
215    def checkName(self, name):
216        """Check whether an object name is valid.
217
218        Raises a user error if the name is not valid.
219        """
220        pass
221
222    def chooseName(self, name):
223        """Choose a unique valid name for the object.
224
225        The given name and object may be taken into account when
226        choosing the name.
227
228        chooseName is expected to always choose a valid name (that
229        would pass the checkName test) and never raise an error.
230        """
231        return u'unknown_file'
232
233class Basket(grok.Container):
234    """A basket holds a set of image files with same hash.
235    """
236
237    def _del(self):
238        """Remove temporary files associated with local blobs.
239
240        A basket holds files as Blob objects. Unfortunately, if a
241        basket was not committed (put into ZODB), those blobs linger
242        around as real files in some temporary directory and won't be
243        removed.
244
245        This is a helper function to remove all those uncommitted
246        blobs that has to be called explicitly, for instance in tests.
247        """
248        key_list = self.keys()
249        for key in key_list:
250            item = self[key]
251            if getattr(item, '_p_oid', None):
252                # Don't mess around with blobs in ZODB
253                continue
254            fd = item.open('r')
255            name = getattr(fd, 'name', None)
256            fd.close()
257            if name is not None and os.path.exists(name):
258                os.unlink(name)
259            del self[key]
260        return
261
262    def getInternalId(self, fd):
263        """Get the basket-internal id for the file stored in `fd`.
264
265        `fd` must be a file open for reading. If an (byte-wise) equal
266        file can be found in the basket, its internal id (basket id)
267        is returned, ``None`` otherwise.
268        """
269        fd.seek(0)
270        for key, val in self.items():
271            fd_stored = val.open('r')
272            file_len = os.stat(fd_stored.name)[6]
273            if file_len == 0:
274                # Nasty workaround. Blobs seem to suffer from being emptied
275                # accidentally.
276                site = grok.getSite()
277                if site is not None:
278                    site.logger.warn(
279                        'Empty Blob detected: %s' % fd_stored.name)
280                warnings.warn("EMPTY BLOB DETECTED: %s" % fd_stored.name)
281                fd_stored.close()
282                val.open('w').write(fd.read())
283                return key
284            fd_stored.seek(0)
285            if cmp_files(fd, fd_stored):
286                fd_stored.close()
287                return key
288            fd_stored.close()
289        return None
290
291    @property
292    def curr_id(self):
293        """The current basket id.
294
295        An integer number which is not yet in use. If there are
296        already `maxint` entries in the basket, a :exc:`ValueError` is
297        raised. The latter is _highly_ unlikely. It would mean to have
298        more than 2**32 hash collisions, i.e. so many files with the
299        same MD5 sum.
300        """
301        num = 1
302        while True:
303            if str(num) not in self.keys():
304                return str(num)
305            num += 1
306            if num <= 0:
307                name = getattr(self, '__name__', None)
308                raise ValueError('Basket full: %s' % name)
309
310    def storeFile(self, fd, filename):
311        """Store the file in `fd` into the basket.
312
313        The file will be stored in a Blob.
314        """
315        fd.seek(0)
316        internal_id = self.getInternalId(fd) # Moves file pointer!
317        if internal_id is None:
318            internal_id = self.curr_id
319            fd.seek(0)
320            self[internal_id] = Blob()
321            transaction.commit() # Urgently needed to make the Blob
322                                 # persistent. Took me ages to find
323                                 # out that solution, which makes some
324                                 # design flaw in ZODB Blobs likely.
325            self[internal_id].open('w').write(fd.read())
326            fd.seek(0)
327            self._p_changed = True
328        return internal_id
329
330    def retrieveFile(self, basket_id):
331        """Retrieve a file open for reading with basket id `basket_id`.
332
333        If there is no such id, ``None`` is returned. It is the
334        callers responsibility to close the open file.
335        """
336        if basket_id in self.keys():
337            return self[basket_id].open('r')
338        return None
339
340class ImageStorage(grok.Container):
341    """A container for image files.
342
343    .. deprecated:: 0.2
344
345       Use :class:`waeup.sirp.ExtFileStore` instead.
346    """
347    def _del(self):
348        for basket in self.values():
349            try:
350                basket._del()
351            except:
352                pass
353
354    def storeFile(self, fd, filename):
355        fd.seek(0)
356        digest = md5digest(fd)
357        fd.seek(0)
358        if not digest in self.keys():
359            self[digest] = Basket()
360        basket_id = self[digest].storeFile(fd, filename)
361        full_id = "%s-%s" % (digest, basket_id)
362        return full_id
363
364    def retrieveFile(self, file_id):
365        if not '-' in file_id:
366            return None
367        full_id, basket_id = file_id.split('-', 1)
368        if not full_id in self.keys():
369            return None
370        return self[full_id].retrieveFile(basket_id)
371
372class ImageStorageFileRetrieval(Persistent):
373    grok.implements(IFileRetrieval)
374
375    def getImageStorage(self):
376        site = grok.getSite()
377        if site is None:
378            return None
379        return site.get('images', None)
380
381    def isImageStorageEnabled(self):
382        site = grok.getSite()
383        if site is None:
384            return False
385        if site.get('images', None) is None:
386            return False
387        return True
388
389    def getFile(self, data):
390        # ImageStorage is disabled, so give fall-back behaviour for
391        # testing without ImageStorage
392        if not self.isImageStorageEnabled():
393            return StringIO(data)
394        storage = self.getImageStorage()
395        if storage is None:
396            raise ValueError('Cannot find an image storage')
397        result = storage.retrieveFile(data)
398        if result is None:
399            return StringIO(data)
400        return storage.retrieveFile(data)
401
402    def createFile(self, filename, f):
403        if not self.isImageStorageEnabled():
404            return WAeUPImageFile(filename, f.read())
405        storage = self.getImageStorage()
406        if storage is None:
407            raise ValueError('Cannot find an image storage')
408        file_id = storage.storeFile(f, filename)
409        return WAeUPImageFile(filename, file_id)
410
411
412class ExtFileStore(object):
413    """External file store.
414
415    External file stores are meant to store files 'externally' of the
416    ZODB, i.e. in filesystem.
417
418    Most important attribute of the external file store is the `root`
419    path which gives the path to the location where files will be
420    stored within.
421
422    By default `root` is a ``'media/'`` directory in the root of the
423    datacenter root of a site.
424
425    The `root` attribute is 'read-only' because you normally don't
426    want to change this path -- it is dynamic. That means, if you call
427    the file store from 'within' a site, the root path will be located
428    inside this site (a :class:`waeup.sirp.University` instance). If
429    you call it from 'outside' a site some temporary dir (always the
430    same during lifetime of the file store instance) will be used. The
431    term 'temporary' tells what you can expect from this path
432    persistence-wise.
433
434    If you insist, you can pass a root path on initialization to the
435    constructor but when calling from within a site afterwards, the
436    site will override your setting for security measures. This way
437    you can safely use one file store for different sites in a Zope
438    instance simultanously and files from one site won't show up in
439    another.
440
441    An ExtFileStore instance is available as a global utility
442    implementing :iface:`waeup.sirp.interfaces.IExtFileStore`.
443
444    To add and retrieve files from the storage, use the appropriate
445    methods below.
446    """
447
448    grok.implements(IExtFileStore)
449
450    _root = None
451
452    @property
453    def root(self):
454        """Root dir of this storage.
455
456        The root dir is a readonly value determined dynamically. It
457        holds media files for sites or other components.
458
459        If a site is available we return a ``media/`` dir in the
460        datacenter storage dir.
461
462        Otherwise we create a temporary dir which will be remembered
463        on next call.
464
465        If a site exists and has a datacenter, it has always
466        precedence over temporary dirs, also after a temporary
467        directory was created.
468
469        Please note that retrieving `root` is expensive. You might
470        want to store a copy once retrieved in order to minimize the
471        number of calls to `root`.
472
473        """
474        site = grok.getSite()
475        if site is not None:
476            root = os.path.join(site['datacenter'].storage, 'media')
477            return root
478        if self._root is None:
479            self._root = tempfile.mkdtemp()
480        return self._root
481
482    def __init__(self, root=None):
483        self._root = root
484        return
485
486    def getFile(self, file_id):
487        """Get a file stored under file ID `file_id`.
488
489        If the file cannot be found ``None`` is returned.
490        """
491        marker, filename, base, ext = self.extractMarker(file_id)
492        handler = queryUtility(IFileStoreHandler, name=marker,
493                               default=DefaultFileStoreHandler())
494        path = handler.pathFromFileID(self, self.root, file_id)
495        if not os.path.exists(path):
496            return None
497        fd = open(path, 'rb')
498        return fd
499
500    def createFile(self, filename, f):
501        """Store a file.
502        """
503        file_id = filename
504        root = self.root # Calls to self.root are expensive
505        marker, filename, base, ext = self.extractMarker(file_id)
506        handler = queryUtility(IFileStoreHandler, name=marker,
507                               default=DefaultFileStoreHandler())
508        f, path, file_obj = handler.createFile(
509            self, root, file_id, filename, f)
510        dirname = os.path.dirname(path)
511        if not os.path.exists(dirname):
512            os.makedirs(dirname, 0755)
513        open(path, 'wb').write(f.read())
514        return file_obj
515
516    def extractMarker(self, file_id):
517        """split filename into marker, filename, basename, and extension.
518
519        A marker is a leading part of a string of form
520        ``__MARKERNAME__`` followed by the real filename. This way we
521        can put markers into a filename to request special processing.
522
523        Returns a quadruple
524
525          ``(marker, filename, basename, extension)``
526
527        where ``marker`` is the marker in lowercase, filename is the
528        complete trailing real filename, ``basename`` is the basename
529        of the filename and ``extension`` the filename extension of
530        the trailing filename. See examples below.
531
532        Example:
533
534           >>> extractMarker('__MaRkEr__sample.jpg')
535           ('marker', 'sample.jpg', 'sample', '.jpg')
536
537        If no marker is contained, we assume the whole string to be a
538        real filename:
539
540           >>> extractMarker('no-marker.txt')
541           ('', 'no-marker.txt', 'no-marker', '.txt')
542
543        Filenames without extension give an empty extension string:
544
545           >>> extractMarker('no-marker')
546           ('', 'no-marker', 'no-marker', '')
547
548        """
549        if not isinstance(file_id, basestring) or not file_id:
550            return ('', '', '', '')
551        parts = file_id.split('__', 2)
552        marker = ''
553        if len(parts) == 3 and parts[0] == '':
554            marker = parts[1].lower()
555            file_id = parts[2]
556        basename, ext = os.path.splitext(file_id)
557        return (marker, file_id, basename, ext)
558
559grok.global_utility(ExtFileStore, provides=IExtFileStore)
560
561class DefaultStorage(ExtFileStore):
562    grok.provides(IFileRetrieval)
563
564grok.global_utility(DefaultStorage, provides=IFileRetrieval)
565
566class DefaultFileStoreHandler(grok.GlobalUtility):
567    grok.implements(IFileStoreHandler)
568
569    def pathFromFileID(self, store, root, file_id):
570        return os.path.join(root, file_id)
571
572    def createFile(self, store, root, filename, file_id, f):
573        path = self.pathFromFileID(store, root, file_id)
574        return f, path, HurryFile(filename, file_id)
Note: See TracBrowser for help on using the repository browser.