source: main/waeup.sirp/trunk/src/waeup/sirp/imagestorage.py @ 9506

Last change on this file since 9506 was 7193, checked in by Henrik Bettermann, 13 years ago

More copyright adjustments.

  • Property svn:keywords set to Id
File size: 21.2 KB
RevLine 
[7193]1## $Id: imagestorage.py 7193 2011-11-25 07:21:29Z henrik $
[6519]2##
[7193]3## Copyright (C) 2011 Uli Fouquet & Henrik Bettermann
[6519]4## This program is free software; you can redistribute it and/or modify
5## it under the terms of the GNU General Public License as published by
6## the Free Software Foundation; either version 2 of the License, or
7## (at your option) any later version.
[7193]8##
[6519]9## This program is distributed in the hope that it will be useful,
10## but WITHOUT ANY WARRANTY; without even the implied warranty of
11## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
12## GNU General Public License for more details.
[7193]13##
[6519]14## You should have received a copy of the GNU General Public License
15## along with this program; if not, write to the Free Software
16## Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17##
[7120]18"""A storage for image (and other) files.
[7063]19
20A few words about storing files with ``waeup.sirp``. The need for this
21feature arised initially from the need to store passport files for
22applicants and students. These files are dynamic (can be changed
23anytime), mean a lot of traffic and cost a lot of memory/disk space.
24
25**Design Basics**
26
27While one *can* store images and similar 'large binary objects' aka
28blobs in the ZODB, this approach quickly becomes cumbersome and
29difficult to understand. The worst approach here would be to store
30images as regular byte-stream objects. ZODB supports this but
31obviously access is slow (data must be looked up in the one
32``Data.fs`` file, each file has to be sent to the ZEO server and back,
33etc.).
34
35A bit less worse is the approach to store images in the ZODB but as
36Blobs. ZODB supports storing blobs in separate files in order to
37accelerate lookup/retrieval of these files. The files, however, have
38to be sent to the ZEO server (and back on lookups) which means a
39bottleneck and will easily result in an increased number of
40``ConflictErrors`` even on simple reads.
41
42The advantage of both ZODB-geared approaches is, of course, complete
43database consistency. ZODB will guarantee that your files are
44available under some object name and can be handled as any other
45Python object.
46
47Another approach is to leave the ZODB behind and to store images and
48other files in filesystem directly. This is faster (no ZEO contacts,
49etc.), reduces probability of `ConflictErrors`, keeps the ZODB
50smaller, and enables direct access (over filesystem) to the
51files. Furthermore steps might be better understandable for
52third-party developers. We opted for this last option.
53
54**External File Store**
55
56Our implementation for storing-files-API is defined in
57:class:`ExtFileStore`. An instance of this file storage (which is also
58able to store non-image files) is available at runtime as a global
59utility implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
60
61The main task of this central component is to maintain a filesystem
62root path for all files to be stored. It also provides methods to
63store/get files under certain file ids which identify certain files
64locally.
65
66So, to store a file away, you can do something like this:
67
68  >>> from StringIO import StringIO
69  >>> from zope.component import getUtility
70  >>> from waeup.sirp.interfaces import IExtFileStore
71  >>> store = getUtility(IExtFileStore)
72  >>> store.createFile('myfile.txt', StringIO('some file content'))
73
74All you need is a filename and the file-like object containing the
75real file data.
76
77This will store the file somewhere (you shouldn't make too much
78assumptions about the real filesystem path here).
79
80Later, we can get the file back like this:
81
[7120]82  >>> store.getFile('myfile')
[7063]83  <open file ...>
84
[7120]85Please note, that we ask for ``myfile`` instead of ``myfile.jpg`` as
86the file id should not make a difference for different filename
87extensions. The file id for ``sample.jpg`` thus could simply be
88``sample``.
89
[7063]90What we get back is a file or file-like object already opened for
91reading:
92
[7120]93  >>> store.getFile('myfile').read()
[7063]94  'some file content'
95
96**Handlers: Special Places for Special Files**
97
98The file store supports special handling for certain files. For
99example we want applicant images to be stored in a different directory
100than student images, etc. Because the file store cannot know all
101details about these special tratment of certain files, it looks up
102helpers (handlers) to provide the information it needs for really
103storing the files at the correct location.
104
105That a file stored in filestore needs special handling can be
106indicated by special filenames. These filenames start with a marker like
107this::
108
[7120]109  __<MARKER-STRING>__real-filename
[7063]110
111Please note the double underscores before and after the marker
112string. They indicate that all in between is a marker.
113
114If you store a file in file store with such a filename (we call this a
115`file_id` to distuingish it from real world filenames), the file store
116will look up a handler for ``<MARKER-STRING>`` and pass it the file to
117store. The handler then will return the internal path to store the
118file and possibly do additional things as well like validating the
119file or similar.
120
121Examples for such a file store handler can be found in the
122:mod:`waeup.sirp.applicants.applicant` module. Please see also the
123:class:`DefaultFileStoreHandler` class below for more details.
124
125The file store looks up handlers by utility lookups: it looks for a
126named utiliy providing
127:class:`waeup.sirp.interfaces.IFileStoreHandler` and named like the
128marker string (without leading/trailing underscores) in lower
129case. For example if the file id would be
130
[7120]131  ``__IMG_USER__manfred``
[7063]132
133then the looked up utility should be registered under name
134
135  ``img_user``
136
137and provide :class:`waeup.sirp.interfaces.IFileStoreHandler`. If no
138such utility can be found, a default handler is used instead
139(see :class:`DefaultFileStoreHandler`).
140
[7120]141**About File IDs and Filenames**
142
143In the waeup.sirp package we want to store documents like CVs,
144photographs, and similar. Each of this documents might come into the
145system with different filename extensions. This could be a problem as
146the browser components might have to set different response headers
147for different filetypes and we nevertheless want to make sure that
148only one file is stored per document. For instance we don't want
149``passport.jpg`` *and* ``passport.png`` but only one of them.
150
151The default components like :class:`DefaultFileStoreHandler` take care
152of this by searching the filesystem for already existing files with
153same file id and eventually removing them.
154
155Therefore file ids should never include filename extensions (except if
156you only support exactly one filename extension for a certain
157document). The only part where you should add an extension (and it is
158important to do so) is when creating new files: when a file was
159uploaded you can pass in the filename (including the filename
160extension) and the file stored in external file store will (most
161probably) have a different name but the same extension as the original
162file.
163
164When looking for the file, you however only have to give the file id
165and the handlers should find the right file for you, regardless of the
166filename extension it has.
167
[7063]168**Context Adapters: Knowing Your Family**
169
170Often the internal filename or file id of a file depends on a
171context. For example when we store passport photographs of applicants,
172then each image belongs to a certain applicant instance. It is not
173difficult to maintain such a connection manually: Say every applicant
174had an id, then we could put this id into the filename as well and
175would build the filename to store/get the connected file by using that
176filename. You then would create filenames of a format like this::
177
[7120]178  __<MARKER-STRING>__applicant0001
[7063]179
180where ``applicant0001`` would tell exactly which applicant you can see
181on the photograph. You notice that the internal file id might have
182nothing to do with once uploaded filenames. The id above could have
183been uploaded with filename ``manfred.jpg`` but with the new file id
184we are able to find the file again later.
185
186Unfortunately it might soon get boring or cumbersome to retype this
187building of filenames for a certain type of context, especially if
188your filenames take more of the context into account than only a
189simple id.
190
191Therefore you can define filename building for a context as an adapter
192that then could be looked up by other components simply by doing
193something like:
194
195  >>> from waeup.sirp.interfaces import IFileStoreNameChooser
196  >>> file_id = IFileStoreNameChooser(my_context_obj)
197
198If you later want to change the way file ids are created from a
199certain context, you only have to change the adapter implementation
200accordingly.
201
202Note, that this is only a convenience component. You don't have to
203define context adapters but it makes things easier for others if you
204do, as you don't have to remember the exact file id creation method
205all the time and can change things quick and in only one location if
206you need to do so.
207
208Please see the :class:`FileStoreNameChooser` default implementation
209below for details.
210
[6519]211"""
[7120]212import glob
[6519]213import grok
214import os
[7063]215import tempfile
216from hurry.file import HurryFile
[6519]217from hurry.file.interfaces import IFileRetrieval
[7063]218from zope.component import queryUtility
219from zope.interface import Interface
220from waeup.sirp.interfaces import (
221    IFileStoreNameChooser, IExtFileStore, IFileStoreHandler,)
[6519]222
[7063]223class FileStoreNameChooser(grok.Adapter):
224    """Default file store name chooser.
[6519]225
[7063]226    File store name choosers pick a file id, a string, for a certain
227    context object. They are normally registered as adapters for a
228    certain content type and know how to build the file id for this
229    special type of context.
[6519]230
[7063]231    Provides the :class:`waeup.sirp.interfaces.IFileStoreNameChooser`
232    interface.
[6519]233
[7063]234    This default file name chosser accepts almost every name as long
235    as it is a string or unicode object.
[6519]236    """
[7063]237    grok.context(Interface)
238    grok.implements(IFileStoreNameChooser)
[6528]239
[7066]240    def checkName(self, name, attr=None):
241        """Check whether a given name (file id) is valid.
[6519]242
[7063]243        Raises a user error if the name is not valid.
[6519]244
[7066]245        For the default file store name chooser any name is valid as
246        long as it is a string.
247
248        The `attr` is not taken into account here.
[6519]249        """
[7063]250        if isinstance(name, basestring):
251            return True
252        return False
[6519]253
[7105]254    def chooseName(self, name, attr=None):
[7066]255        """Choose a unique valid file id for the object.
[6528]256
[7066]257        The given name may be taken into account when choosing the
258        name (file id).
[7063]259
260        chooseName is expected to always choose a valid name (that
261        would pass the checkName test) and never raise an error.
262
263        For this default name chooser we return the given name if it
[7105]264        is valid or ``unknown_file`` else. The `attr` param is not
265        taken into account here.
[6528]266        """
[7063]267        if self.checkName(name):
268            return name
269        return u'unknown_file'
[6519]270
[7063]271class ExtFileStore(object):
272    """External file store.
273
274    External file stores are meant to store files 'externally' of the
275    ZODB, i.e. in filesystem.
276
277    Most important attribute of the external file store is the `root`
278    path which gives the path to the location where files will be
279    stored within.
280
281    By default `root` is a ``'media/'`` directory in the root of the
282    datacenter root of a site.
283
284    The `root` attribute is 'read-only' because you normally don't
285    want to change this path -- it is dynamic. That means, if you call
286    the file store from 'within' a site, the root path will be located
287    inside this site (a :class:`waeup.sirp.University` instance). If
288    you call it from 'outside' a site some temporary dir (always the
289    same during lifetime of the file store instance) will be used. The
290    term 'temporary' tells what you can expect from this path
291    persistence-wise.
292
293    If you insist, you can pass a root path on initialization to the
294    constructor but when calling from within a site afterwards, the
295    site will override your setting for security measures. This way
296    you can safely use one file store for different sites in a Zope
297    instance simultanously and files from one site won't show up in
298    another.
299
300    An ExtFileStore instance is available as a global utility
301    implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
302
303    To add and retrieve files from the storage, use the appropriate
304    methods below.
305    """
306
307    grok.implements(IExtFileStore)
308
309    _root = None
310
[6519]311    @property
[7063]312    def root(self):
313        """Root dir of this storage.
[6528]314
[7063]315        The root dir is a readonly value determined dynamically. It
316        holds media files for sites or other components.
317
318        If a site is available we return a ``media/`` dir in the
319        datacenter storage dir.
320
321        Otherwise we create a temporary dir which will be remembered
322        on next call.
323
324        If a site exists and has a datacenter, it has always
325        precedence over temporary dirs, also after a temporary
326        directory was created.
327
328        Please note that retrieving `root` is expensive. You might
329        want to store a copy once retrieved in order to minimize the
330        number of calls to `root`.
331
[6528]332        """
[7063]333        site = grok.getSite()
334        if site is not None:
335            root = os.path.join(site['datacenter'].storage, 'media')
336            return root
337        if self._root is None:
338            self._root = tempfile.mkdtemp()
339        return self._root
[6519]340
[7063]341    def __init__(self, root=None):
342        self._root = root
343        return
[6528]344
[7093]345    def _pathFromFileID(self, file_id):
346        """Helper method to create filesystem path from FileID.
347
348        Used class-internally. Do not rely on this method when working
349        with an :class:`ExtFileStore` instance from other components.
350        """
351        marker, filename, base, ext = self.extractMarker(file_id)
352        handler = queryUtility(IFileStoreHandler, name=marker,
353                               default=DefaultFileStoreHandler())
354        path = handler.pathFromFileID(self, self.root, file_id)
355        return path
356
[7063]357    def getFile(self, file_id):
358        """Get a file stored under file ID `file_id`.
359
360        Returns a file already opened for reading.
361
362        If the file cannot be found ``None`` is returned.
363
364        This methods takes into account registered handlers for any
365        marker put into the file_id.
366
367        .. seealso:: :class:`DefaultFileStoreHandler`
[6528]368        """
[7093]369        path = self._pathFromFileID(file_id)
[7063]370        if not os.path.exists(path):
371            return None
372        fd = open(path, 'rb')
373        return fd
[6519]374
[7105]375    def getFileByContext(self, context, attr=None):
[7063]376        """Get a file for given context.
[6528]377
[7063]378        Returns a file already opened for reading.
379
380        If the file cannot be found ``None`` is returned.
381
382        This method takes into account registered handlers and file
[7073]383        name choosers for context types to build an intermediate file
384        id for the context and `attr` given.
[7063]385
[7105]386        Both, `context` and `attr` are used to find (`context`)
[7073]387        and feed (`attr`) an appropriate file name chooser.
388
[7063]389        This is a convenience method that internally calls
390        :meth:`getFile`.
391
392        .. seealso:: :class:`FileStoreNameChooser`,
393                     :class:`DefaultFileStoreHandler`.
[6528]394        """
[7105]395        file_id = IFileStoreNameChooser(context).chooseName(attr=attr)
[7063]396        return self.getFile(file_id)
[6519]397
[7093]398    def deleteFile(self, file_id):
399        """Delete file stored under `file_id` in storage.
400
401        The file is physically removed from filesystem.
402        """
403        path = self._pathFromFileID(file_id)
404        if not os.path.exists(path) or not os.path.isfile(path):
405            return
406        os.unlink(path)
407        return
408
[7105]409    def deleteFileByContext(self, context, attr=None):
[7093]410        """Remove file identified by `context` and `attr` if it exists.
411
412        This method takes into account registered handlers and file
413        name choosers for context types to build an intermediate file
414        id for the context and `attr` given.
415
[7105]416        Both, `context` and `attr` are used to find (`context`)
[7093]417        and feed (`attr`) an appropriate file name chooser.
418
419        This is a convenience method that internally calls
420        :meth:`getFile`.
421
422        .. seealso:: :class:`FileStoreNameChooser`,
423                     :class:`DefaultFileStoreHandler`.
424
425        """
[7105]426        file_id = IFileStoreNameChooser(context).chooseName(attr=attr)
[7093]427        return self.deleteFile(file_id)
428
[7063]429    def createFile(self, filename, f):
430        """Store a file.
431        """
432        root = self.root # Calls to self.root are expensive
[7120]433        file_id = os.path.splitext(filename)[0]
434        marker, filename, base, ext = self.extractMarker(filename)
[7063]435        handler = queryUtility(IFileStoreHandler, name=marker,
436                               default=DefaultFileStoreHandler())
437        f, path, file_obj = handler.createFile(
[7120]438            self, root, filename, file_id, f)
[7063]439        dirname = os.path.dirname(path)
440        if not os.path.exists(dirname):
441            os.makedirs(dirname, 0755)
442        open(path, 'wb').write(f.read())
443        return file_obj
444
445    def extractMarker(self, file_id):
446        """split filename into marker, filename, basename, and extension.
447
448        A marker is a leading part of a string of form
449        ``__MARKERNAME__`` followed by the real filename. This way we
450        can put markers into a filename to request special processing.
451
452        Returns a quadruple
453
454          ``(marker, filename, basename, extension)``
455
456        where ``marker`` is the marker in lowercase, filename is the
457        complete trailing real filename, ``basename`` is the basename
458        of the filename and ``extension`` the filename extension of
459        the trailing filename. See examples below.
460
461        Example:
462
463           >>> extractMarker('__MaRkEr__sample.jpg')
464           ('marker', 'sample.jpg', 'sample', '.jpg')
465
466        If no marker is contained, we assume the whole string to be a
467        real filename:
468
469           >>> extractMarker('no-marker.txt')
470           ('', 'no-marker.txt', 'no-marker', '.txt')
471
472        Filenames without extension give an empty extension string:
473
474           >>> extractMarker('no-marker')
475           ('', 'no-marker', 'no-marker', '')
476
477        """
478        if not isinstance(file_id, basestring) or not file_id:
479            return ('', '', '', '')
480        parts = file_id.split('__', 2)
481        marker = ''
482        if len(parts) == 3 and parts[0] == '':
483            marker = parts[1].lower()
484            file_id = parts[2]
485        basename, ext = os.path.splitext(file_id)
486        return (marker, file_id, basename, ext)
487
488grok.global_utility(ExtFileStore, provides=IExtFileStore)
489
490class DefaultStorage(ExtFileStore):
491    """Default storage for files.
492
493    Registered globally as utility for
494    :class:`hurry.file.interfaces.IFileRetrieval`.
[6519]495    """
[7063]496    grok.provides(IFileRetrieval)
[6519]497
[7063]498grok.global_utility(DefaultStorage, provides=IFileRetrieval)
[6519]499
[7063]500class DefaultFileStoreHandler(grok.GlobalUtility):
501    """A default handler for external file store.
[6519]502
[7063]503    This handler is the fallback called by external file stores when
504    there is no or an unknown marker in the file id.
[6519]505
[7063]506    Registered globally as utility for
507    :class:`waeup.sirp.interfaces.IFileStoreHandler`.
508    """
509    grok.implements(IFileStoreHandler)
[6519]510
[7120]511    def _searchInPath(self, path):
512        """Get complete path of any existing file starting with `path`.
513
514        If no such file can be found, return input path.
515
516        If multiple such files exist, return the first one.
517
518        **Example:**
519
520        Looking for a `path`::
521
522          '/tmp/myfile'
523
524        will find any file like ``'/tmp/myfile.txt'``,
525        ``'/tmp/myfile.jpg'`` and so on, if it exists.
526        """
527        result = path
528        if os.path.isdir(os.path.dirname(path)):
529            file_iter = glob.iglob('%s*' % (path,))
530            try:
531                result = file_iter.next()
532            except StopIteration:
533                pass
534        return result
535
[7063]536    def pathFromFileID(self, store, root, file_id):
[7120]537        """Return a path for getting/storing a file with given file id.
538
539        If there is already a file stored for the given file id, the
540        path to this file is returned.
541
542        If no such file exists yet (or the the only file existing has
543        no filename extension at all) a path to store the file but
544        without any filename extension is returned.
[7063]545        """
[7120]546        path = os.path.join(root, file_id)
547        return self._searchInPath(path)
[6519]548
[7063]549    def createFile(self, store, root, filename, file_id, f):
550        """Infos about what to store exactly and where.
[6519]551
[7063]552        When a file should be handled by an external file storage, it
553        looks up any handlers (like this one), passes runtime infos
554        like the storage object, root path, filename, file_id, and the
555        raw file object itself.
556
557        The handler can then change the file, raise exceptions or
558        whatever and return the result.
559
560        This handler returns the input file as-is, a path returned by
561        :meth:`pathFromFileID` and an instance of
562        :class:`hurry.file.HurryFile` for further operations.
563
564        Please note: although a handler has enough infos to store the
565        file itself, it should leave that task to the calling file
566        store.
[7120]567
568        This method does, however, remove any existing files stored
569        under the given file id.
[7063]570        """
[7120]571        ext = os.path.splitext(filename)[1]
[7063]572        path = self.pathFromFileID(store, root, file_id)
[7120]573        base, old_ext = os.path.splitext(path)
574        if old_ext != ext:
575            if os.path.exists(path):
576                os.unlink(path)
577            path = base + ext
578        return f, path, HurryFile(filename, file_id + ext)
Note: See TracBrowser for help on using the repository browser.