source: main/waeup.sirp/branches/ulif-extimgstore/src/waeup/sirp/imagestorage.py @ 7031

Last change on this file since 7031 was 7031, checked in by uli, 13 years ago
  • Pin Sphinx version to 1.0.7. Apparently this is the last version not producing errors.
  • Fix some ReST markup for sphinx.
File size: 21.9 KB
Line 
1##
2## imagestorage.py
3## Login : <uli@pu.smp.net>
4## Started on  Mon Jul  4 16:02:14 2011 Uli Fouquet
5## $Id$
6##
7## Copyright (C) 2011 Uli Fouquet
8## This program is free software; you can redistribute it and/or modify
9## it under the terms of the GNU General Public License as published by
10## the Free Software Foundation; either version 2 of the License, or
11## (at your option) any later version.
12##
13## This program is distributed in the hope that it will be useful,
14## but WITHOUT ANY WARRANTY; without even the implied warranty of
15## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
16## GNU General Public License for more details.
17##
18## You should have received a copy of the GNU General Public License
19## along with this program; if not, write to the Free Software
20## Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
21##
22"""A storage for image files.
23
24A few words about storing files with ``waeup.sirp``. The need for this
25feature arised initially from the need to store passport files for
26applicants and students. These files are dynamic (can be changed
27anytime), mean a lot of traffic and cost a lot of memory/disk space.
28
29**Design Basics**
30
31While one *can* store images and similar 'large binary objects' aka
32blobs in the ZODB, this approach quickly becomes cumbersome and
33difficult to understand. The worst approach here would be to store
34images as regular byte-stream objects. ZODB supports this but
35obviously access is slow (data must be looked up in the one
36``Data.fs`` file, each file has to be sent to the ZEO server and back,
37etc.).
38
39A bit less worse is the approach to store images in the ZODB but as
40Blobs. ZODB supports storing blobs in separate files in order to
41accelerate lookup/retrieval of these files. The files, however, have
42to be sent to the ZEO server (and back on lookups) which means a
43bottleneck and will easily result in an increased number of
44``ConflictErrors`` even on simple reads.
45
46The advantage of both ZODB-geared approaches is, of course, complete
47database consistency. ZODB will guarantee that your files are
48available under some object name and can be handled as any other
49Python object.
50
51Another approach is to leave the ZODB behind and to store images and
52other files in filesystem directly. This is faster (no ZEO contacts,
53etc.), reduces probability of `ConflictErrors`, keeps the ZODB
54smaller, and enables direct access (over filesystem) to the
55files. Furthermore steps might be better understandable for
56third-party developers. We opted for this last option.
57
58**External File Store**
59
60Our implementation for storing-files-API is defined in
61:class:`ExtFileStore`. An instance of this file storage (which is also
62able to store non-image files) is available at runtime as a global
63utility implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
64
65The main task of this central component is to maintain a filesystem
66root path for all files to be stored. It also provides methods to
67store/get files under certain file ids which identify certain files
68locally.
69
70So, to store a file away, you can do something like this:
71
72  >>> from StringIO import StringIO
73  >>> from zope.component import getUtility
74  >>> from waeup.sirp.interfaces import IExtFileStore
75  >>> store = getUtility(IExtFileStore)
76  >>> store.createFile('myfile.txt', StringIO('some file content'))
77
78All you need is a filename and the file-like object containing the
79real file data.
80
81This will store the file somewhere (you shouldn't make too much
82assumptions about the real filesystem path here).
83
84Later, we can get the file back like this:
85
86  >>> store.getFile('myfile.txt')
87  <open file ...>
88
89What we get back is a file or file-like object already opened for
90reading:
91
92  >>> store.getFile('myfile.txt').read()
93  'some file content'
94
95**Handlers: Special Places for Special Files**
96
97The file store supports special handling for certain files. For
98example we want applicant images to be stored in a different directory
99than student images, etc. Because the file store cannot know all
100details about these special tratment of certain files, it looks up
101helpers (handlers) to provide the information it needs for really
102storing the files at the correct location.
103
104That a file stored in filestore needs special handling can be
105indicated by special filenames. These filenames start with a marker like
106this::
107
108  __<MARKER-STRING>__real-filename.jpg
109
110Please note the double underscores before and after the marker
111string. They indicate that all in between is a marker.
112
113If you store a file in file store with such a filename (we call this a
114`file_id` to distuingish it from real world filenames), the file store
115will look up a handler for ``<MARKER-STRING>`` and pass it the file to
116store. The handler then will return the internal path to store the
117file and possibly do additional things as well like validating the
118file or similar.
119
120Examples for such a file store handler can be found in the
121:mod:`waeup.sirp.applicants.applicant` module. Please see also the
122:class:`DefaultFileStoreHandler` class below for more details.
123
124The file store looks up handlers by utility lookups: it looks for a
125named utiliy providing
126:class:`waeup.sirp.interfaces.IFileStoreHandler` and named like the
127marker string (without leading/trailing underscores) in lower
128case. For example if the file id would be
129
130  ``__IMG_USER__manfred.jpg``
131
132then the looked up utility should be registered under name
133
134  ``img_user``
135
136and provide :class:`waeup.sirp.interfaces.IFileStoreHandler`. If no
137such utility can be found, a default handler is used instead
138(see :class:`DefaultFileStoreHandler`).
139
140**Context Adapters: Knowing Your Family**
141
142Often the internal filename or file id of a file depends on a
143context. For example when we store passport photographs of applicants,
144then each image belongs to a certain applicant instance. It is not
145difficult to maintain such a connection manually: Say every applicant
146had an id, then we could put this id into the filename as well and
147would build the filename to store/get the connected file by using that
148filename. You then would create filenames of a format like this::
149
150  __<MARKER-STRING>__applicant0001.jpg
151
152where ``applicant0001`` would tell exactly which applicant you can see
153on the photograph. You notice that the internal file id might have
154nothing to do with once uploaded filenames. The id above could have
155been uploaded with filename ``manfred.jpg`` but with the new file id
156we are able to find the file again later.
157
158Unfortunately it might soon get boring or cumbersome to retype this
159building of filenames for a certain type of context, especially if
160your filenames take more of the context into account than only a
161simple id.
162
163Therefore you can define filename building for a context as an adapter
164that then could be looked up by other components simply by doing
165something like:
166
167  >>> from waeup.sirp.interfaces import IFileStoreNameChooser
168  >>> file_id = IFileStoreNameChooser(my_context_obj)
169
170If you later want to change the way file ids are created from a
171certain context, you only have to change the adapter implementation
172accordingly.
173
174Note, that this is only a convenience component. You don't have to
175define context adapters but it makes things easier for others if you
176do, as you don't have to remember the exact file id creation method
177all the time and can change things quick and in only one location if
178you need to do so.
179
180Please see the :class:`FileStoreNameChooser` default implementation
181below for details.
182
183"""
184import grok
185import hashlib
186import os
187import tempfile
188import transaction
189import warnings
190from StringIO import StringIO
191from ZODB.blob import Blob
192from persistent import Persistent
193from hurry.file import HurryFile
194from hurry.file.interfaces import IFileRetrieval
195from zope.component import queryUtility
196from zope.interface import Interface
197from waeup.sirp.image import WAeUPImageFile
198from waeup.sirp.interfaces import (
199    IFileStoreNameChooser, IExtFileStore, IFileStoreHandler,)
200from waeup.sirp.utils.helpers import cmp_files
201
202def md5digest(fd):
203    """Get an MD5 hexdigest for the file stored in `fd`.
204
205    `fd`
206      a file object open for reading.
207
208    """
209    return hashlib.md5(fd.read()).hexdigest()
210
211class FileStoreNameChooser(grok.Adapter):
212    grok.context(Interface)
213    grok.implements(IFileStoreNameChooser)
214
215    def checkName(self, name):
216        """Check whether an object name is valid.
217
218        Raises a user error if the name is not valid.
219        """
220        pass
221
222    def chooseName(self, name):
223        """Choose a unique valid name for the object.
224
225        The given name and object may be taken into account when
226        choosing the name.
227
228        chooseName is expected to always choose a valid name (that
229        would pass the checkName test) and never raise an error.
230        """
231        return u'unknown_file'
232
233class Basket(grok.Container):
234    """A basket holds a set of image files with same hash.
235
236    It is used for storing files as blobs in ZODB. Current sites do
237    not use it.
238
239    .. deprecated:: 0.2
240
241    """
242
243    def _del(self):
244        """Remove temporary files associated with local blobs.
245
246        A basket holds files as Blob objects. Unfortunately, if a
247        basket was not committed (put into ZODB), those blobs linger
248        around as real files in some temporary directory and won't be
249        removed.
250
251        This is a helper function to remove all those uncommitted
252        blobs that has to be called explicitly, for instance in tests.
253        """
254        key_list = self.keys()
255        for key in key_list:
256            item = self[key]
257            if getattr(item, '_p_oid', None):
258                # Don't mess around with blobs in ZODB
259                continue
260            fd = item.open('r')
261            name = getattr(fd, 'name', None)
262            fd.close()
263            if name is not None and os.path.exists(name):
264                os.unlink(name)
265            del self[key]
266        return
267
268    def getInternalId(self, fd):
269        """Get the basket-internal id for the file stored in `fd`.
270
271        `fd` must be a file open for reading. If an (byte-wise) equal
272        file can be found in the basket, its internal id (basket id)
273        is returned, ``None`` otherwise.
274        """
275        fd.seek(0)
276        for key, val in self.items():
277            fd_stored = val.open('r')
278            file_len = os.stat(fd_stored.name)[6]
279            if file_len == 0:
280                # Nasty workaround. Blobs seem to suffer from being emptied
281                # accidentally.
282                site = grok.getSite()
283                if site is not None:
284                    site.logger.warn(
285                        'Empty Blob detected: %s' % fd_stored.name)
286                warnings.warn("EMPTY BLOB DETECTED: %s" % fd_stored.name)
287                fd_stored.close()
288                val.open('w').write(fd.read())
289                return key
290            fd_stored.seek(0)
291            if cmp_files(fd, fd_stored):
292                fd_stored.close()
293                return key
294            fd_stored.close()
295        return None
296
297    @property
298    def curr_id(self):
299        """The current basket id.
300
301        An integer number which is not yet in use. If there are
302        already `maxint` entries in the basket, a :exc:`ValueError` is
303        raised. The latter is _highly_ unlikely. It would mean to have
304        more than 2**32 hash collisions, i.e. so many files with the
305        same MD5 sum.
306        """
307        num = 1
308        while True:
309            if str(num) not in self.keys():
310                return str(num)
311            num += 1
312            if num <= 0:
313                name = getattr(self, '__name__', None)
314                raise ValueError('Basket full: %s' % name)
315
316    def storeFile(self, fd, filename):
317        """Store the file in `fd` into the basket.
318
319        The file will be stored in a Blob.
320        """
321        fd.seek(0)
322        internal_id = self.getInternalId(fd) # Moves file pointer!
323        if internal_id is None:
324            internal_id = self.curr_id
325            fd.seek(0)
326            self[internal_id] = Blob()
327            transaction.commit() # Urgently needed to make the Blob
328                                 # persistent. Took me ages to find
329                                 # out that solution, which makes some
330                                 # design flaw in ZODB Blobs likely.
331            self[internal_id].open('w').write(fd.read())
332            fd.seek(0)
333            self._p_changed = True
334        return internal_id
335
336    def retrieveFile(self, basket_id):
337        """Retrieve a file open for reading with basket id `basket_id`.
338
339        If there is no such id, ``None`` is returned. It is the
340        callers responsibility to close the open file.
341        """
342        if basket_id in self.keys():
343            return self[basket_id].open('r')
344        return None
345
346class ImageStorage(grok.Container):
347    """A container for image files.
348
349    .. deprecated:: 0.2
350
351       Use :class:`waeup.sirp.ExtFileStore` instead.
352    """
353    def _del(self):
354        for basket in self.values():
355            try:
356                basket._del()
357            except:
358                pass
359
360    def storeFile(self, fd, filename):
361        """Store contents of file addressed by `fd` under filename `filename`.
362
363        Returns the internal file id (a string) for the file stored.
364        """
365        fd.seek(0)
366        digest = md5digest(fd)
367        fd.seek(0)
368        if not digest in self.keys():
369            self[digest] = Basket()
370        basket_id = self[digest].storeFile(fd, filename)
371        full_id = "%s-%s" % (digest, basket_id)
372        return full_id
373
374    def retrieveFile(self, file_id):
375        if not '-' in file_id:
376            return None
377        full_id, basket_id = file_id.split('-', 1)
378        if not full_id in self.keys():
379            return None
380        return self[full_id].retrieveFile(basket_id)
381
382class ImageStorageFileRetrieval(Persistent):
383    """A persistent object to retrieve files stored in ZODB.
384
385    .. deprecated:: 0.2
386
387       Since we have :class:`ExtFileStore` now we do not need this
388       class anymore.
389    """
390    grok.implements(IFileRetrieval)
391
392    def getImageStorage(self):
393        site = grok.getSite()
394        if site is None:
395            return None
396        return site.get('images', None)
397
398    def isImageStorageEnabled(self):
399        site = grok.getSite()
400        if site is None:
401            return False
402        if site.get('images', None) is None:
403            return False
404        return True
405
406    def getFile(self, data):
407        # ImageStorage is disabled, so give fall-back behaviour for
408        # testing without ImageStorage
409        if not self.isImageStorageEnabled():
410            return StringIO(data)
411        storage = self.getImageStorage()
412        if storage is None:
413            raise ValueError('Cannot find an image storage')
414        result = storage.retrieveFile(data)
415        if result is None:
416            return StringIO(data)
417        return storage.retrieveFile(data)
418
419    def createFile(self, filename, f):
420        if not self.isImageStorageEnabled():
421            return WAeUPImageFile(filename, f.read())
422        storage = self.getImageStorage()
423        if storage is None:
424            raise ValueError('Cannot find an image storage')
425        file_id = storage.storeFile(f, filename)
426        return WAeUPImageFile(filename, file_id)
427
428
429class ExtFileStore(object):
430    """External file store.
431
432    External file stores are meant to store files 'externally' of the
433    ZODB, i.e. in filesystem.
434
435    Most important attribute of the external file store is the `root`
436    path which gives the path to the location where files will be
437    stored within.
438
439    By default `root` is a ``'media/'`` directory in the root of the
440    datacenter root of a site.
441
442    The `root` attribute is 'read-only' because you normally don't
443    want to change this path -- it is dynamic. That means, if you call
444    the file store from 'within' a site, the root path will be located
445    inside this site (a :class:`waeup.sirp.University` instance). If
446    you call it from 'outside' a site some temporary dir (always the
447    same during lifetime of the file store instance) will be used. The
448    term 'temporary' tells what you can expect from this path
449    persistence-wise.
450
451    If you insist, you can pass a root path on initialization to the
452    constructor but when calling from within a site afterwards, the
453    site will override your setting for security measures. This way
454    you can safely use one file store for different sites in a Zope
455    instance simultanously and files from one site won't show up in
456    another.
457
458    An ExtFileStore instance is available as a global utility
459    implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
460
461    To add and retrieve files from the storage, use the appropriate
462    methods below.
463    """
464
465    grok.implements(IExtFileStore)
466
467    _root = None
468
469    @property
470    def root(self):
471        """Root dir of this storage.
472
473        The root dir is a readonly value determined dynamically. It
474        holds media files for sites or other components.
475
476        If a site is available we return a ``media/`` dir in the
477        datacenter storage dir.
478
479        Otherwise we create a temporary dir which will be remembered
480        on next call.
481
482        If a site exists and has a datacenter, it has always
483        precedence over temporary dirs, also after a temporary
484        directory was created.
485
486        Please note that retrieving `root` is expensive. You might
487        want to store a copy once retrieved in order to minimize the
488        number of calls to `root`.
489
490        """
491        site = grok.getSite()
492        if site is not None:
493            root = os.path.join(site['datacenter'].storage, 'media')
494            return root
495        if self._root is None:
496            self._root = tempfile.mkdtemp()
497        return self._root
498
499    def __init__(self, root=None):
500        self._root = root
501        return
502
503    def getFile(self, file_id):
504        """Get a file stored under file ID `file_id`.
505
506        Returns a file already opened for reading.
507
508        If the file cannot be found ``None`` is returned.
509
510        This methods takes into account registered handlers for any
511        marker put into the file_id.
512        """
513        marker, filename, base, ext = self.extractMarker(file_id)
514        handler = queryUtility(IFileStoreHandler, name=marker,
515                               default=DefaultFileStoreHandler())
516        path = handler.pathFromFileID(self, self.root, file_id)
517        if not os.path.exists(path):
518            return None
519        fd = open(path, 'rb')
520        return fd
521
522    def createFile(self, filename, f):
523        """Store a file.
524        """
525        file_id = filename
526        root = self.root # Calls to self.root are expensive
527        marker, filename, base, ext = self.extractMarker(file_id)
528        handler = queryUtility(IFileStoreHandler, name=marker,
529                               default=DefaultFileStoreHandler())
530        f, path, file_obj = handler.createFile(
531            self, root, file_id, filename, f)
532        dirname = os.path.dirname(path)
533        if not os.path.exists(dirname):
534            os.makedirs(dirname, 0755)
535        open(path, 'wb').write(f.read())
536        return file_obj
537
538    def extractMarker(self, file_id):
539        """split filename into marker, filename, basename, and extension.
540
541        A marker is a leading part of a string of form
542        ``__MARKERNAME__`` followed by the real filename. This way we
543        can put markers into a filename to request special processing.
544
545        Returns a quadruple
546
547          ``(marker, filename, basename, extension)``
548
549        where ``marker`` is the marker in lowercase, filename is the
550        complete trailing real filename, ``basename`` is the basename
551        of the filename and ``extension`` the filename extension of
552        the trailing filename. See examples below.
553
554        Example:
555
556           >>> extractMarker('__MaRkEr__sample.jpg')
557           ('marker', 'sample.jpg', 'sample', '.jpg')
558
559        If no marker is contained, we assume the whole string to be a
560        real filename:
561
562           >>> extractMarker('no-marker.txt')
563           ('', 'no-marker.txt', 'no-marker', '.txt')
564
565        Filenames without extension give an empty extension string:
566
567           >>> extractMarker('no-marker')
568           ('', 'no-marker', 'no-marker', '')
569
570        """
571        if not isinstance(file_id, basestring) or not file_id:
572            return ('', '', '', '')
573        parts = file_id.split('__', 2)
574        marker = ''
575        if len(parts) == 3 and parts[0] == '':
576            marker = parts[1].lower()
577            file_id = parts[2]
578        basename, ext = os.path.splitext(file_id)
579        return (marker, file_id, basename, ext)
580
581grok.global_utility(ExtFileStore, provides=IExtFileStore)
582
583class DefaultStorage(ExtFileStore):
584    grok.provides(IFileRetrieval)
585
586grok.global_utility(DefaultStorage, provides=IFileRetrieval)
587
588class DefaultFileStoreHandler(grok.GlobalUtility):
589    """A default handler for external file store.
590
591    This handler is the fallback called by external file stores when
592    there is no or an unknown marker in the file id.
593    """
594    grok.implements(IFileStoreHandler)
595
596    def pathFromFileID(self, store, root, file_id):
597        """Return the root path of external file store appended by file id.
598        """
599        return os.path.join(root, file_id)
600
601    def createFile(self, store, root, filename, file_id, f):
602        """Infos about what to store exactly and where.
603
604        When a file should be handled by an external file storage, it
605        looks up any handlers (like this one), passes runtime infos
606        like the storage object, root path, filename, file_id, and the
607        raw file object itself.
608
609        The handler can then change the file, raise exceptions or
610        whatever and return the result.
611
612        This handler returns the input file as-is, a path returned by
613        :meth:`pathFromFileID` and an instance of
614        :class:`hurry.file.HurryFile` for further operations.
615
616        Please note: although a handler has enough infos to store the
617        file itself, it should leave that task to the calling file
618        store.
619        """
620        path = self.pathFromFileID(store, root, file_id)
621        return f, path, HurryFile(filename, file_id)
Note: See TracBrowser for help on using the repository browser.