source: main/waeup.sirp/trunk/src/waeup/sirp/imagestorage.py @ 7093

Last change on this file since 7093 was 7093, checked in by uli, 13 years ago

Implement deleteFile() and deleteFileByContext() for ExtFileStore?. We
can now also remove files from ExtFileStores?.

File size: 18.5 KB
Line 
1##
2## imagestorage.py
3## Login : <uli@pu.smp.net>
4## Started on  Mon Jul  4 16:02:14 2011 Uli Fouquet
5## $Id$
6##
7## Copyright (C) 2011 Uli Fouquet
8## This program is free software; you can redistribute it and/or modify
9## it under the terms of the GNU General Public License as published by
10## the Free Software Foundation; either version 2 of the License, or
11## (at your option) any later version.
12##
13## This program is distributed in the hope that it will be useful,
14## but WITHOUT ANY WARRANTY; without even the implied warranty of
15## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
16## GNU General Public License for more details.
17##
18## You should have received a copy of the GNU General Public License
19## along with this program; if not, write to the Free Software
20## Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
21##
22"""A storage for image files.
23
24A few words about storing files with ``waeup.sirp``. The need for this
25feature arised initially from the need to store passport files for
26applicants and students. These files are dynamic (can be changed
27anytime), mean a lot of traffic and cost a lot of memory/disk space.
28
29**Design Basics**
30
31While one *can* store images and similar 'large binary objects' aka
32blobs in the ZODB, this approach quickly becomes cumbersome and
33difficult to understand. The worst approach here would be to store
34images as regular byte-stream objects. ZODB supports this but
35obviously access is slow (data must be looked up in the one
36``Data.fs`` file, each file has to be sent to the ZEO server and back,
37etc.).
38
39A bit less worse is the approach to store images in the ZODB but as
40Blobs. ZODB supports storing blobs in separate files in order to
41accelerate lookup/retrieval of these files. The files, however, have
42to be sent to the ZEO server (and back on lookups) which means a
43bottleneck and will easily result in an increased number of
44``ConflictErrors`` even on simple reads.
45
46The advantage of both ZODB-geared approaches is, of course, complete
47database consistency. ZODB will guarantee that your files are
48available under some object name and can be handled as any other
49Python object.
50
51Another approach is to leave the ZODB behind and to store images and
52other files in filesystem directly. This is faster (no ZEO contacts,
53etc.), reduces probability of `ConflictErrors`, keeps the ZODB
54smaller, and enables direct access (over filesystem) to the
55files. Furthermore steps might be better understandable for
56third-party developers. We opted for this last option.
57
58**External File Store**
59
60Our implementation for storing-files-API is defined in
61:class:`ExtFileStore`. An instance of this file storage (which is also
62able to store non-image files) is available at runtime as a global
63utility implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
64
65The main task of this central component is to maintain a filesystem
66root path for all files to be stored. It also provides methods to
67store/get files under certain file ids which identify certain files
68locally.
69
70So, to store a file away, you can do something like this:
71
72  >>> from StringIO import StringIO
73  >>> from zope.component import getUtility
74  >>> from waeup.sirp.interfaces import IExtFileStore
75  >>> store = getUtility(IExtFileStore)
76  >>> store.createFile('myfile.txt', StringIO('some file content'))
77
78All you need is a filename and the file-like object containing the
79real file data.
80
81This will store the file somewhere (you shouldn't make too much
82assumptions about the real filesystem path here).
83
84Later, we can get the file back like this:
85
86  >>> store.getFile('myfile.txt')
87  <open file ...>
88
89What we get back is a file or file-like object already opened for
90reading:
91
92  >>> store.getFile('myfile.txt').read()
93  'some file content'
94
95**Handlers: Special Places for Special Files**
96
97The file store supports special handling for certain files. For
98example we want applicant images to be stored in a different directory
99than student images, etc. Because the file store cannot know all
100details about these special tratment of certain files, it looks up
101helpers (handlers) to provide the information it needs for really
102storing the files at the correct location.
103
104That a file stored in filestore needs special handling can be
105indicated by special filenames. These filenames start with a marker like
106this::
107
108  __<MARKER-STRING>__real-filename.jpg
109
110Please note the double underscores before and after the marker
111string. They indicate that all in between is a marker.
112
113If you store a file in file store with such a filename (we call this a
114`file_id` to distuingish it from real world filenames), the file store
115will look up a handler for ``<MARKER-STRING>`` and pass it the file to
116store. The handler then will return the internal path to store the
117file and possibly do additional things as well like validating the
118file or similar.
119
120Examples for such a file store handler can be found in the
121:mod:`waeup.sirp.applicants.applicant` module. Please see also the
122:class:`DefaultFileStoreHandler` class below for more details.
123
124The file store looks up handlers by utility lookups: it looks for a
125named utiliy providing
126:class:`waeup.sirp.interfaces.IFileStoreHandler` and named like the
127marker string (without leading/trailing underscores) in lower
128case. For example if the file id would be
129
130  ``__IMG_USER__manfred.jpg``
131
132then the looked up utility should be registered under name
133
134  ``img_user``
135
136and provide :class:`waeup.sirp.interfaces.IFileStoreHandler`. If no
137such utility can be found, a default handler is used instead
138(see :class:`DefaultFileStoreHandler`).
139
140**Context Adapters: Knowing Your Family**
141
142Often the internal filename or file id of a file depends on a
143context. For example when we store passport photographs of applicants,
144then each image belongs to a certain applicant instance. It is not
145difficult to maintain such a connection manually: Say every applicant
146had an id, then we could put this id into the filename as well and
147would build the filename to store/get the connected file by using that
148filename. You then would create filenames of a format like this::
149
150  __<MARKER-STRING>__applicant0001.jpg
151
152where ``applicant0001`` would tell exactly which applicant you can see
153on the photograph. You notice that the internal file id might have
154nothing to do with once uploaded filenames. The id above could have
155been uploaded with filename ``manfred.jpg`` but with the new file id
156we are able to find the file again later.
157
158Unfortunately it might soon get boring or cumbersome to retype this
159building of filenames for a certain type of context, especially if
160your filenames take more of the context into account than only a
161simple id.
162
163Therefore you can define filename building for a context as an adapter
164that then could be looked up by other components simply by doing
165something like:
166
167  >>> from waeup.sirp.interfaces import IFileStoreNameChooser
168  >>> file_id = IFileStoreNameChooser(my_context_obj)
169
170If you later want to change the way file ids are created from a
171certain context, you only have to change the adapter implementation
172accordingly.
173
174Note, that this is only a convenience component. You don't have to
175define context adapters but it makes things easier for others if you
176do, as you don't have to remember the exact file id creation method
177all the time and can change things quick and in only one location if
178you need to do so.
179
180Please see the :class:`FileStoreNameChooser` default implementation
181below for details.
182
183"""
184import grok
185import os
186import tempfile
187from hurry.file import HurryFile
188from hurry.file.interfaces import IFileRetrieval
189from zope.component import queryUtility
190from zope.interface import Interface
191from waeup.sirp.interfaces import (
192    IFileStoreNameChooser, IExtFileStore, IFileStoreHandler,)
193
194class FileStoreNameChooser(grok.Adapter):
195    """Default file store name chooser.
196
197    File store name choosers pick a file id, a string, for a certain
198    context object. They are normally registered as adapters for a
199    certain content type and know how to build the file id for this
200    special type of context.
201
202    Provides the :class:`waeup.sirp.interfaces.IFileStoreNameChooser`
203    interface.
204
205    This default file name chosser accepts almost every name as long
206    as it is a string or unicode object.
207    """
208    grok.context(Interface)
209    grok.implements(IFileStoreNameChooser)
210
211    def checkName(self, name, attr=None):
212        """Check whether a given name (file id) is valid.
213
214        Raises a user error if the name is not valid.
215
216        For the default file store name chooser any name is valid as
217        long as it is a string.
218
219        The `attr` is not taken into account here.
220        """
221        if isinstance(name, basestring):
222            return True
223        return False
224
225    def chooseName(self, name, attr=None):
226        """Choose a unique valid file id for the object.
227
228        The given name may be taken into account when choosing the
229        name (file id).
230
231        chooseName is expected to always choose a valid name (that
232        would pass the checkName test) and never raise an error.
233
234        For this default name chooser we return the given name if it
235        is valid or ``unknown_file`` else. The `attr` param is not
236        taken into account here.
237        """
238        if self.checkName(name):
239            return name
240        return u'unknown_file'
241
242class ExtFileStore(object):
243    """External file store.
244
245    External file stores are meant to store files 'externally' of the
246    ZODB, i.e. in filesystem.
247
248    Most important attribute of the external file store is the `root`
249    path which gives the path to the location where files will be
250    stored within.
251
252    By default `root` is a ``'media/'`` directory in the root of the
253    datacenter root of a site.
254
255    The `root` attribute is 'read-only' because you normally don't
256    want to change this path -- it is dynamic. That means, if you call
257    the file store from 'within' a site, the root path will be located
258    inside this site (a :class:`waeup.sirp.University` instance). If
259    you call it from 'outside' a site some temporary dir (always the
260    same during lifetime of the file store instance) will be used. The
261    term 'temporary' tells what you can expect from this path
262    persistence-wise.
263
264    If you insist, you can pass a root path on initialization to the
265    constructor but when calling from within a site afterwards, the
266    site will override your setting for security measures. This way
267    you can safely use one file store for different sites in a Zope
268    instance simultanously and files from one site won't show up in
269    another.
270
271    An ExtFileStore instance is available as a global utility
272    implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
273
274    To add and retrieve files from the storage, use the appropriate
275    methods below.
276    """
277
278    grok.implements(IExtFileStore)
279
280    _root = None
281
282    @property
283    def root(self):
284        """Root dir of this storage.
285
286        The root dir is a readonly value determined dynamically. It
287        holds media files for sites or other components.
288
289        If a site is available we return a ``media/`` dir in the
290        datacenter storage dir.
291
292        Otherwise we create a temporary dir which will be remembered
293        on next call.
294
295        If a site exists and has a datacenter, it has always
296        precedence over temporary dirs, also after a temporary
297        directory was created.
298
299        Please note that retrieving `root` is expensive. You might
300        want to store a copy once retrieved in order to minimize the
301        number of calls to `root`.
302
303        """
304        site = grok.getSite()
305        if site is not None:
306            root = os.path.join(site['datacenter'].storage, 'media')
307            return root
308        if self._root is None:
309            self._root = tempfile.mkdtemp()
310        return self._root
311
312    def __init__(self, root=None):
313        self._root = root
314        return
315
316    def _pathFromFileID(self, file_id):
317        """Helper method to create filesystem path from FileID.
318
319        Used class-internally. Do not rely on this method when working
320        with an :class:`ExtFileStore` instance from other components.
321        """
322        marker, filename, base, ext = self.extractMarker(file_id)
323        handler = queryUtility(IFileStoreHandler, name=marker,
324                               default=DefaultFileStoreHandler())
325        path = handler.pathFromFileID(self, self.root, file_id)
326        return path
327
328    def getFile(self, file_id):
329        """Get a file stored under file ID `file_id`.
330
331        Returns a file already opened for reading.
332
333        If the file cannot be found ``None`` is returned.
334
335        This methods takes into account registered handlers for any
336        marker put into the file_id.
337
338        .. seealso:: :class:`DefaultFileStoreHandler`
339        """
340        path = self._pathFromFileID(file_id)
341        if not os.path.exists(path):
342            return None
343        fd = open(path, 'rb')
344        return fd
345
346    def getFileByContext(self, context, attr=None):
347        """Get a file for given context.
348
349        Returns a file already opened for reading.
350
351        If the file cannot be found ``None`` is returned.
352
353        This method takes into account registered handlers and file
354        name choosers for context types to build an intermediate file
355        id for the context and `attr` given.
356
357        Both, `context` and `attr` are used to find (`context`)
358        and feed (`attr`) an appropriate file name chooser.
359
360        This is a convenience method that internally calls
361        :meth:`getFile`.
362
363        .. seealso:: :class:`FileStoreNameChooser`,
364                     :class:`DefaultFileStoreHandler`.
365        """
366        file_id = IFileStoreNameChooser(context).chooseName(attr=attr)
367        return self.getFile(file_id)
368
369    def deleteFile(self, file_id):
370        """Delete file stored under `file_id` in storage.
371
372        The file is physically removed from filesystem.
373        """
374        path = self._pathFromFileID(file_id)
375        if not os.path.exists(path) or not os.path.isfile(path):
376            return
377        os.unlink(path)
378        return
379
380    def deleteFileByContext(self, context, attr=None):
381        """Remove file identified by `context` and `attr` if it exists.
382
383        This method takes into account registered handlers and file
384        name choosers for context types to build an intermediate file
385        id for the context and `attr` given.
386
387        Both, `context` and `attr` are used to find (`context`)
388        and feed (`attr`) an appropriate file name chooser.
389
390        This is a convenience method that internally calls
391        :meth:`getFile`.
392
393        .. seealso:: :class:`FileStoreNameChooser`,
394                     :class:`DefaultFileStoreHandler`.
395
396        """
397        file_id = IFileStoreNameChooser(context).chooseName(attr=attr)
398        return self.deleteFile(file_id)
399
400    def createFile(self, filename, f):
401        """Store a file.
402        """
403        file_id = filename
404        root = self.root # Calls to self.root are expensive
405        marker, filename, base, ext = self.extractMarker(file_id)
406        handler = queryUtility(IFileStoreHandler, name=marker,
407                               default=DefaultFileStoreHandler())
408        f, path, file_obj = handler.createFile(
409            self, root, file_id, filename, f)
410        dirname = os.path.dirname(path)
411        if not os.path.exists(dirname):
412            os.makedirs(dirname, 0755)
413        open(path, 'wb').write(f.read())
414        return file_obj
415
416    def extractMarker(self, file_id):
417        """split filename into marker, filename, basename, and extension.
418
419        A marker is a leading part of a string of form
420        ``__MARKERNAME__`` followed by the real filename. This way we
421        can put markers into a filename to request special processing.
422
423        Returns a quadruple
424
425          ``(marker, filename, basename, extension)``
426
427        where ``marker`` is the marker in lowercase, filename is the
428        complete trailing real filename, ``basename`` is the basename
429        of the filename and ``extension`` the filename extension of
430        the trailing filename. See examples below.
431
432        Example:
433
434           >>> extractMarker('__MaRkEr__sample.jpg')
435           ('marker', 'sample.jpg', 'sample', '.jpg')
436
437        If no marker is contained, we assume the whole string to be a
438        real filename:
439
440           >>> extractMarker('no-marker.txt')
441           ('', 'no-marker.txt', 'no-marker', '.txt')
442
443        Filenames without extension give an empty extension string:
444
445           >>> extractMarker('no-marker')
446           ('', 'no-marker', 'no-marker', '')
447
448        """
449        if not isinstance(file_id, basestring) or not file_id:
450            return ('', '', '', '')
451        parts = file_id.split('__', 2)
452        marker = ''
453        if len(parts) == 3 and parts[0] == '':
454            marker = parts[1].lower()
455            file_id = parts[2]
456        basename, ext = os.path.splitext(file_id)
457        return (marker, file_id, basename, ext)
458
459grok.global_utility(ExtFileStore, provides=IExtFileStore)
460
461class DefaultStorage(ExtFileStore):
462    """Default storage for files.
463
464    Registered globally as utility for
465    :class:`hurry.file.interfaces.IFileRetrieval`.
466    """
467    grok.provides(IFileRetrieval)
468
469grok.global_utility(DefaultStorage, provides=IFileRetrieval)
470
471class DefaultFileStoreHandler(grok.GlobalUtility):
472    """A default handler for external file store.
473
474    This handler is the fallback called by external file stores when
475    there is no or an unknown marker in the file id.
476
477    Registered globally as utility for
478    :class:`waeup.sirp.interfaces.IFileStoreHandler`.
479    """
480    grok.implements(IFileStoreHandler)
481
482    def pathFromFileID(self, store, root, file_id):
483        """Return the root path of external file store appended by file id.
484        """
485        return os.path.join(root, file_id)
486
487    def createFile(self, store, root, filename, file_id, f):
488        """Infos about what to store exactly and where.
489
490        When a file should be handled by an external file storage, it
491        looks up any handlers (like this one), passes runtime infos
492        like the storage object, root path, filename, file_id, and the
493        raw file object itself.
494
495        The handler can then change the file, raise exceptions or
496        whatever and return the result.
497
498        This handler returns the input file as-is, a path returned by
499        :meth:`pathFromFileID` and an instance of
500        :class:`hurry.file.HurryFile` for further operations.
501
502        Please note: although a handler has enough infos to store the
503        file itself, it should leave that task to the calling file
504        store.
505        """
506        path = self.pathFromFileID(store, root, file_id)
507        return f, path, HurryFile(filename, file_id)
Note: See TracBrowser for help on using the repository browser.