Ignore:
Timestamp:
9 Nov 2011, 15:42:45 (13 years ago)
Author:
uli
Message:

Merge changes from branch ulif-extimgstore back into trunk.
Beside external image storage also waeupdocs should work again.

Location:
main/waeup.sirp/trunk
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • main/waeup.sirp/trunk

  • main/waeup.sirp/trunk/src/waeup/sirp/imagestorage.py

    r6980 r7063  
    2121##
    2222"""A storage for image files.
     23
     24A few words about storing files with ``waeup.sirp``. The need for this
     25feature arised initially from the need to store passport files for
     26applicants and students. These files are dynamic (can be changed
     27anytime), mean a lot of traffic and cost a lot of memory/disk space.
     28
     29**Design Basics**
     30
     31While one *can* store images and similar 'large binary objects' aka
     32blobs in the ZODB, this approach quickly becomes cumbersome and
     33difficult to understand. The worst approach here would be to store
     34images as regular byte-stream objects. ZODB supports this but
     35obviously access is slow (data must be looked up in the one
     36``Data.fs`` file, each file has to be sent to the ZEO server and back,
     37etc.).
     38
     39A bit less worse is the approach to store images in the ZODB but as
     40Blobs. ZODB supports storing blobs in separate files in order to
     41accelerate lookup/retrieval of these files. The files, however, have
     42to be sent to the ZEO server (and back on lookups) which means a
     43bottleneck and will easily result in an increased number of
     44``ConflictErrors`` even on simple reads.
     45
     46The advantage of both ZODB-geared approaches is, of course, complete
     47database consistency. ZODB will guarantee that your files are
     48available under some object name and can be handled as any other
     49Python object.
     50
     51Another approach is to leave the ZODB behind and to store images and
     52other files in filesystem directly. This is faster (no ZEO contacts,
     53etc.), reduces probability of `ConflictErrors`, keeps the ZODB
     54smaller, and enables direct access (over filesystem) to the
     55files. Furthermore steps might be better understandable for
     56third-party developers. We opted for this last option.
     57
     58**External File Store**
     59
     60Our implementation for storing-files-API is defined in
     61:class:`ExtFileStore`. An instance of this file storage (which is also
     62able to store non-image files) is available at runtime as a global
     63utility implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
     64
     65The main task of this central component is to maintain a filesystem
     66root path for all files to be stored. It also provides methods to
     67store/get files under certain file ids which identify certain files
     68locally.
     69
     70So, to store a file away, you can do something like this:
     71
     72  >>> from StringIO import StringIO
     73  >>> from zope.component import getUtility
     74  >>> from waeup.sirp.interfaces import IExtFileStore
     75  >>> store = getUtility(IExtFileStore)
     76  >>> store.createFile('myfile.txt', StringIO('some file content'))
     77
     78All you need is a filename and the file-like object containing the
     79real file data.
     80
     81This will store the file somewhere (you shouldn't make too much
     82assumptions about the real filesystem path here).
     83
     84Later, we can get the file back like this:
     85
     86  >>> store.getFile('myfile.txt')
     87  <open file ...>
     88
     89What we get back is a file or file-like object already opened for
     90reading:
     91
     92  >>> store.getFile('myfile.txt').read()
     93  'some file content'
     94
     95**Handlers: Special Places for Special Files**
     96
     97The file store supports special handling for certain files. For
     98example we want applicant images to be stored in a different directory
     99than student images, etc. Because the file store cannot know all
     100details about these special tratment of certain files, it looks up
     101helpers (handlers) to provide the information it needs for really
     102storing the files at the correct location.
     103
     104That a file stored in filestore needs special handling can be
     105indicated by special filenames. These filenames start with a marker like
     106this::
     107
     108  __<MARKER-STRING>__real-filename.jpg
     109
     110Please note the double underscores before and after the marker
     111string. They indicate that all in between is a marker.
     112
     113If you store a file in file store with such a filename (we call this a
     114`file_id` to distuingish it from real world filenames), the file store
     115will look up a handler for ``<MARKER-STRING>`` and pass it the file to
     116store. The handler then will return the internal path to store the
     117file and possibly do additional things as well like validating the
     118file or similar.
     119
     120Examples for such a file store handler can be found in the
     121:mod:`waeup.sirp.applicants.applicant` module. Please see also the
     122:class:`DefaultFileStoreHandler` class below for more details.
     123
     124The file store looks up handlers by utility lookups: it looks for a
     125named utiliy providing
     126:class:`waeup.sirp.interfaces.IFileStoreHandler` and named like the
     127marker string (without leading/trailing underscores) in lower
     128case. For example if the file id would be
     129
     130  ``__IMG_USER__manfred.jpg``
     131
     132then the looked up utility should be registered under name
     133
     134  ``img_user``
     135
     136and provide :class:`waeup.sirp.interfaces.IFileStoreHandler`. If no
     137such utility can be found, a default handler is used instead
     138(see :class:`DefaultFileStoreHandler`).
     139
     140**Context Adapters: Knowing Your Family**
     141
     142Often the internal filename or file id of a file depends on a
     143context. For example when we store passport photographs of applicants,
     144then each image belongs to a certain applicant instance. It is not
     145difficult to maintain such a connection manually: Say every applicant
     146had an id, then we could put this id into the filename as well and
     147would build the filename to store/get the connected file by using that
     148filename. You then would create filenames of a format like this::
     149
     150  __<MARKER-STRING>__applicant0001.jpg
     151
     152where ``applicant0001`` would tell exactly which applicant you can see
     153on the photograph. You notice that the internal file id might have
     154nothing to do with once uploaded filenames. The id above could have
     155been uploaded with filename ``manfred.jpg`` but with the new file id
     156we are able to find the file again later.
     157
     158Unfortunately it might soon get boring or cumbersome to retype this
     159building of filenames for a certain type of context, especially if
     160your filenames take more of the context into account than only a
     161simple id.
     162
     163Therefore you can define filename building for a context as an adapter
     164that then could be looked up by other components simply by doing
     165something like:
     166
     167  >>> from waeup.sirp.interfaces import IFileStoreNameChooser
     168  >>> file_id = IFileStoreNameChooser(my_context_obj)
     169
     170If you later want to change the way file ids are created from a
     171certain context, you only have to change the adapter implementation
     172accordingly.
     173
     174Note, that this is only a convenience component. You don't have to
     175define context adapters but it makes things easier for others if you
     176do, as you don't have to remember the exact file id creation method
     177all the time and can change things quick and in only one location if
     178you need to do so.
     179
     180Please see the :class:`FileStoreNameChooser` default implementation
     181below for details.
     182
    23183"""
    24184import grok
    25 import hashlib
    26185import os
    27 import transaction
    28 import warnings
    29 from StringIO import StringIO
    30 from ZODB.blob import Blob
    31 from persistent import Persistent
     186import tempfile
     187from hurry.file import HurryFile
    32188from hurry.file.interfaces import IFileRetrieval
    33 from waeup.sirp.image import WAeUPImageFile
    34 from waeup.sirp.utils.helpers import cmp_files
    35 
    36 def md5digest(fd):
    37     """Get an MD5 hexdigest for the file stored in `fd`.
    38 
    39     `fd`
    40       a file object open for reading.
    41 
     189from zope.component import queryUtility
     190from zope.interface import Interface
     191from waeup.sirp.interfaces import (
     192    IFileStoreNameChooser, IExtFileStore, IFileStoreHandler,)
     193
     194class FileStoreNameChooser(grok.Adapter):
     195    """Default file store name chooser.
     196
     197    File store name choosers pick a file id, a string, for a certain
     198    context object. They are normally registered as adapters for a
     199    certain content type and know how to build the file id for this
     200    special type of context.
     201
     202    Provides the :class:`waeup.sirp.interfaces.IFileStoreNameChooser`
     203    interface.
     204
     205    This default file name chosser accepts almost every name as long
     206    as it is a string or unicode object.
    42207    """
    43     return hashlib.md5(fd.read()).hexdigest()
    44 
    45 class Basket(grok.Container):
    46     """A basket holds a set of image files with same hash.
     208    grok.context(Interface)
     209    grok.implements(IFileStoreNameChooser)
     210
     211    def checkName(self, name):
     212        """Check whether an object name is valid.
     213
     214        Raises a user error if the name is not valid.
     215
     216        For the default file store name chooser any name is valid.
     217        """
     218        if isinstance(name, basestring):
     219            return True
     220        return False
     221
     222    def chooseName(self, name):
     223        """Choose a unique valid name for the object.
     224
     225        The given name and object may be taken into account when
     226        choosing the name.
     227
     228        chooseName is expected to always choose a valid name (that
     229        would pass the checkName test) and never raise an error.
     230
     231        For this default name chooser we return the given name if it
     232        is valid or ``unknown_file`` else.
     233        """
     234        if self.checkName(name):
     235            return name
     236        return u'unknown_file'
     237
     238class ExtFileStore(object):
     239    """External file store.
     240
     241    External file stores are meant to store files 'externally' of the
     242    ZODB, i.e. in filesystem.
     243
     244    Most important attribute of the external file store is the `root`
     245    path which gives the path to the location where files will be
     246    stored within.
     247
     248    By default `root` is a ``'media/'`` directory in the root of the
     249    datacenter root of a site.
     250
     251    The `root` attribute is 'read-only' because you normally don't
     252    want to change this path -- it is dynamic. That means, if you call
     253    the file store from 'within' a site, the root path will be located
     254    inside this site (a :class:`waeup.sirp.University` instance). If
     255    you call it from 'outside' a site some temporary dir (always the
     256    same during lifetime of the file store instance) will be used. The
     257    term 'temporary' tells what you can expect from this path
     258    persistence-wise.
     259
     260    If you insist, you can pass a root path on initialization to the
     261    constructor but when calling from within a site afterwards, the
     262    site will override your setting for security measures. This way
     263    you can safely use one file store for different sites in a Zope
     264    instance simultanously and files from one site won't show up in
     265    another.
     266
     267    An ExtFileStore instance is available as a global utility
     268    implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
     269
     270    To add and retrieve files from the storage, use the appropriate
     271    methods below.
    47272    """
    48273
    49     def _del(self):
    50         """Remove temporary files associated with local blobs.
    51 
    52         A basket holds files as Blob objects. Unfortunately, if a
    53         basket was not committed (put into ZODB), those blobs linger
    54         around as real files in some temporary directory and won't be
    55         removed.
    56 
    57         This is a helper function to remove all those uncommitted
    58         blobs that has to be called explicitly, for instance in tests.
    59         """
    60         key_list = list(self.keys())
    61         for key in key_list:
    62             item = self[key]
    63             if getattr(item, '_p_oid', None):
    64                 # Don't mess around with blobs in ZODB
    65                 continue
    66             fd = item.open('r')
    67             name = getattr(fd, 'name', None)
    68             fd.close()
    69             if name is not None and os.path.exists(name):
    70                 os.unlink(name)
    71             del self[key]
     274    grok.implements(IExtFileStore)
     275
     276    _root = None
     277
     278    @property
     279    def root(self):
     280        """Root dir of this storage.
     281
     282        The root dir is a readonly value determined dynamically. It
     283        holds media files for sites or other components.
     284
     285        If a site is available we return a ``media/`` dir in the
     286        datacenter storage dir.
     287
     288        Otherwise we create a temporary dir which will be remembered
     289        on next call.
     290
     291        If a site exists and has a datacenter, it has always
     292        precedence over temporary dirs, also after a temporary
     293        directory was created.
     294
     295        Please note that retrieving `root` is expensive. You might
     296        want to store a copy once retrieved in order to minimize the
     297        number of calls to `root`.
     298
     299        """
     300        site = grok.getSite()
     301        if site is not None:
     302            root = os.path.join(site['datacenter'].storage, 'media')
     303            return root
     304        if self._root is None:
     305            self._root = tempfile.mkdtemp()
     306        return self._root
     307
     308    def __init__(self, root=None):
     309        self._root = root
    72310        return
    73311
    74     def getInternalId(self, fd):
    75         """Get the basket-internal id for the file stored in `fd`.
    76 
    77         `fd` must be a file open for reading. If an (byte-wise) equal
    78         file can be found in the basket, its internal id (basket id)
    79         is returned, ``None`` otherwise.
    80         """
    81         fd.seek(0)
    82         for key, val in self.items():
    83             fd_stored = val.open('r')
    84             file_len = os.stat(fd_stored.name)[6]
    85             if file_len == 0:
    86                 # Nasty workaround. Blobs seem to suffer from being emptied
    87                 # accidentally.
    88                 site = grok.getSite()
    89                 if site is not None:
    90                     site.logger.warn(
    91                         'Empty Blob detected: %s' % fd_stored.name)
    92                 warnings.warn("EMPTY BLOB DETECTED: %s" % fd_stored.name)
    93                 fd_stored.close()
    94                 val.open('w').write(fd.read())
    95                 return key
    96             fd_stored.seek(0)
    97             if cmp_files(fd, fd_stored):
    98                 fd_stored.close()
    99                 return key
    100             fd_stored.close()
    101         return None
    102 
    103     @property
    104     def curr_id(self):
    105         """The current basket id.
    106 
    107         An integer number which is not yet in use. If there are
    108         already `maxint` entries in the basket, a :exc:`ValueError` is
    109         raised. The latter is _highly_ unlikely. It would mean to have
    110         more than 2**32 hash collisions, i.e. so many files with the
    111         same MD5 sum.
    112         """
    113         num = 1
    114         while True:
    115             if str(num) not in self.keys():
    116                 return str(num)
    117             num += 1
    118             if num <= 0:
    119                 name = getattr(self, '__name__', None)
    120                 raise ValueError('Basket full: %s' % name)
    121 
    122     def storeFile(self, fd, filename):
    123         """Store the file in `fd` into the basket.
    124 
    125         The file will be stored in a Blob.
    126         """
    127         fd.seek(0)
    128         internal_id = self.getInternalId(fd) # Moves file pointer!
    129         if internal_id is None:
    130             internal_id = self.curr_id
    131             fd.seek(0)
    132             self[internal_id] = Blob()
    133             transaction.commit() # Urgently needed to make the Blob
    134                                  # persistent. Took me ages to find
    135                                  # out that solution, which makes some
    136                                  # design flaw in ZODB Blobs likely.
    137             self[internal_id].open('w').write(fd.read())
    138             fd.seek(0)
    139             self._p_changed = True
    140         return internal_id
    141 
    142     def retrieveFile(self, basket_id):
    143         """Retrieve a file open for reading with basket id `basket_id`.
    144 
    145         If there is no such id, ``None`` is returned. It is the
    146         callers responsibility to close the open file.
    147         """
    148         if basket_id in self.keys():
    149             return self[basket_id].open('r')
    150         return None
    151 
    152 class ImageStorage(grok.Container):
    153     """A container for image files.
     312    def getFile(self, file_id):
     313        """Get a file stored under file ID `file_id`.
     314
     315        Returns a file already opened for reading.
     316
     317        If the file cannot be found ``None`` is returned.
     318
     319        This methods takes into account registered handlers for any
     320        marker put into the file_id.
     321
     322        .. seealso:: :class:`DefaultFileStoreHandler`
     323        """
     324        marker, filename, base, ext = self.extractMarker(file_id)
     325        handler = queryUtility(IFileStoreHandler, name=marker,
     326                               default=DefaultFileStoreHandler())
     327        path = handler.pathFromFileID(self, self.root, file_id)
     328        if not os.path.exists(path):
     329            return None
     330        fd = open(path, 'rb')
     331        return fd
     332
     333    def getFileByContext(self, context):
     334        """Get a file for given context.
     335
     336        Returns a file already opened for reading.
     337
     338        If the file cannot be found ``None`` is returned.
     339
     340        This method takes into account registered handlers and file
     341        name choosers for context types.
     342
     343        This is a convenience method that internally calls
     344        :meth:`getFile`.
     345
     346        .. seealso:: :class:`FileStoreNameChooser`,
     347                     :class:`DefaultFileStoreHandler`.
     348        """
     349        file_id = IFileStoreNameChooser(context).chooseName()
     350        return self.getFile(file_id)
     351
     352    def createFile(self, filename, f):
     353        """Store a file.
     354        """
     355        file_id = filename
     356        root = self.root # Calls to self.root are expensive
     357        marker, filename, base, ext = self.extractMarker(file_id)
     358        handler = queryUtility(IFileStoreHandler, name=marker,
     359                               default=DefaultFileStoreHandler())
     360        f, path, file_obj = handler.createFile(
     361            self, root, file_id, filename, f)
     362        dirname = os.path.dirname(path)
     363        if not os.path.exists(dirname):
     364            os.makedirs(dirname, 0755)
     365        open(path, 'wb').write(f.read())
     366        return file_obj
     367
     368    def extractMarker(self, file_id):
     369        """split filename into marker, filename, basename, and extension.
     370
     371        A marker is a leading part of a string of form
     372        ``__MARKERNAME__`` followed by the real filename. This way we
     373        can put markers into a filename to request special processing.
     374
     375        Returns a quadruple
     376
     377          ``(marker, filename, basename, extension)``
     378
     379        where ``marker`` is the marker in lowercase, filename is the
     380        complete trailing real filename, ``basename`` is the basename
     381        of the filename and ``extension`` the filename extension of
     382        the trailing filename. See examples below.
     383
     384        Example:
     385
     386           >>> extractMarker('__MaRkEr__sample.jpg')
     387           ('marker', 'sample.jpg', 'sample', '.jpg')
     388
     389        If no marker is contained, we assume the whole string to be a
     390        real filename:
     391
     392           >>> extractMarker('no-marker.txt')
     393           ('', 'no-marker.txt', 'no-marker', '.txt')
     394
     395        Filenames without extension give an empty extension string:
     396
     397           >>> extractMarker('no-marker')
     398           ('', 'no-marker', 'no-marker', '')
     399
     400        """
     401        if not isinstance(file_id, basestring) or not file_id:
     402            return ('', '', '', '')
     403        parts = file_id.split('__', 2)
     404        marker = ''
     405        if len(parts) == 3 and parts[0] == '':
     406            marker = parts[1].lower()
     407            file_id = parts[2]
     408        basename, ext = os.path.splitext(file_id)
     409        return (marker, file_id, basename, ext)
     410
     411grok.global_utility(ExtFileStore, provides=IExtFileStore)
     412
     413class DefaultStorage(ExtFileStore):
     414    """Default storage for files.
     415
     416    Registered globally as utility for
     417    :class:`hurry.file.interfaces.IFileRetrieval`.
    154418    """
    155     def _del(self):
    156         for basket in self.values():
    157             try:
    158                 basket._del()
    159             except:
    160                 pass
    161 
    162     def storeFile(self, fd, filename):
    163         fd.seek(0)
    164         digest = md5digest(fd)
    165         fd.seek(0)
    166         if not digest in self.keys():
    167             self[digest] = Basket()
    168         basket_id = self[digest].storeFile(fd, filename)
    169         full_id = "%s-%s" % (digest, basket_id)
    170         return full_id
    171 
    172     def retrieveFile(self, file_id):
    173         if not '-' in file_id:
    174             return None
    175         full_id, basket_id = file_id.split('-', 1)
    176         if not full_id in self.keys():
    177             return None
    178         return self[full_id].retrieveFile(basket_id)
    179 
    180 class ImageStorageFileRetrieval(Persistent):
    181     grok.implements(IFileRetrieval)
    182 
    183     def getImageStorage(self):
    184         site = grok.getSite()
    185         if site is None:
    186             return None
    187         return site.get('images', None)
    188 
    189     def isImageStorageEnabled(self):
    190         site = grok.getSite()
    191         if site is None:
    192             return False
    193         if site.get('images', None) is None:
    194             return False
    195         return True
    196 
    197     def getFile(self, data):
    198         # ImageStorage is disabled, so give fall-back behaviour for
    199         # testing without ImageStorage
    200         if not self.isImageStorageEnabled():
    201             return StringIO(data)
    202         storage = self.getImageStorage()
    203         if storage is None:
    204             raise ValueError('Cannot find an image storage')
    205         result = storage.retrieveFile(data)
    206         if result is None:
    207             return StringIO(data)
    208         return storage.retrieveFile(data)
    209 
    210     def createFile(self, filename, f):
    211         if not self.isImageStorageEnabled():
    212             return WAeUPImageFile(filename, f.read())
    213         storage = self.getImageStorage()
    214         if storage is None:
    215             raise ValueError('Cannot find an image storage')
    216         file_id = storage.storeFile(f, filename)
    217         return WAeUPImageFile(filename, file_id)
     419    grok.provides(IFileRetrieval)
     420
     421grok.global_utility(DefaultStorage, provides=IFileRetrieval)
     422
     423class DefaultFileStoreHandler(grok.GlobalUtility):
     424    """A default handler for external file store.
     425
     426    This handler is the fallback called by external file stores when
     427    there is no or an unknown marker in the file id.
     428
     429    Registered globally as utility for
     430    :class:`waeup.sirp.interfaces.IFileStoreHandler`.
     431    """
     432    grok.implements(IFileStoreHandler)
     433
     434    def pathFromFileID(self, store, root, file_id):
     435        """Return the root path of external file store appended by file id.
     436        """
     437        return os.path.join(root, file_id)
     438
     439    def createFile(self, store, root, filename, file_id, f):
     440        """Infos about what to store exactly and where.
     441
     442        When a file should be handled by an external file storage, it
     443        looks up any handlers (like this one), passes runtime infos
     444        like the storage object, root path, filename, file_id, and the
     445        raw file object itself.
     446
     447        The handler can then change the file, raise exceptions or
     448        whatever and return the result.
     449
     450        This handler returns the input file as-is, a path returned by
     451        :meth:`pathFromFileID` and an instance of
     452        :class:`hurry.file.HurryFile` for further operations.
     453
     454        Please note: although a handler has enough infos to store the
     455        file itself, it should leave that task to the calling file
     456        store.
     457        """
     458        path = self.pathFromFileID(store, root, file_id)
     459        return f, path, HurryFile(filename, file_id)
Note: See TracChangeset for help on using the changeset viewer.