source: main/waeup.sirp/trunk/src/waeup/sirp/imagestorage.py @ 7125

Last change on this file since 7125 was 7120, checked in by uli, 13 years ago

Make imagestorage sensible for different filename extensions per upload doc.

File size: 21.3 KB
Line 
1##
2## imagestorage.py
3## Login : <uli@pu.smp.net>
4## Started on  Mon Jul  4 16:02:14 2011 Uli Fouquet
5## $Id$
6##
7## Copyright (C) 2011 Uli Fouquet
8## This program is free software; you can redistribute it and/or modify
9## it under the terms of the GNU General Public License as published by
10## the Free Software Foundation; either version 2 of the License, or
11## (at your option) any later version.
12##
13## This program is distributed in the hope that it will be useful,
14## but WITHOUT ANY WARRANTY; without even the implied warranty of
15## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
16## GNU General Public License for more details.
17##
18## You should have received a copy of the GNU General Public License
19## along with this program; if not, write to the Free Software
20## Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
21##
22"""A storage for image (and other) files.
23
24A few words about storing files with ``waeup.sirp``. The need for this
25feature arised initially from the need to store passport files for
26applicants and students. These files are dynamic (can be changed
27anytime), mean a lot of traffic and cost a lot of memory/disk space.
28
29**Design Basics**
30
31While one *can* store images and similar 'large binary objects' aka
32blobs in the ZODB, this approach quickly becomes cumbersome and
33difficult to understand. The worst approach here would be to store
34images as regular byte-stream objects. ZODB supports this but
35obviously access is slow (data must be looked up in the one
36``Data.fs`` file, each file has to be sent to the ZEO server and back,
37etc.).
38
39A bit less worse is the approach to store images in the ZODB but as
40Blobs. ZODB supports storing blobs in separate files in order to
41accelerate lookup/retrieval of these files. The files, however, have
42to be sent to the ZEO server (and back on lookups) which means a
43bottleneck and will easily result in an increased number of
44``ConflictErrors`` even on simple reads.
45
46The advantage of both ZODB-geared approaches is, of course, complete
47database consistency. ZODB will guarantee that your files are
48available under some object name and can be handled as any other
49Python object.
50
51Another approach is to leave the ZODB behind and to store images and
52other files in filesystem directly. This is faster (no ZEO contacts,
53etc.), reduces probability of `ConflictErrors`, keeps the ZODB
54smaller, and enables direct access (over filesystem) to the
55files. Furthermore steps might be better understandable for
56third-party developers. We opted for this last option.
57
58**External File Store**
59
60Our implementation for storing-files-API is defined in
61:class:`ExtFileStore`. An instance of this file storage (which is also
62able to store non-image files) is available at runtime as a global
63utility implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
64
65The main task of this central component is to maintain a filesystem
66root path for all files to be stored. It also provides methods to
67store/get files under certain file ids which identify certain files
68locally.
69
70So, to store a file away, you can do something like this:
71
72  >>> from StringIO import StringIO
73  >>> from zope.component import getUtility
74  >>> from waeup.sirp.interfaces import IExtFileStore
75  >>> store = getUtility(IExtFileStore)
76  >>> store.createFile('myfile.txt', StringIO('some file content'))
77
78All you need is a filename and the file-like object containing the
79real file data.
80
81This will store the file somewhere (you shouldn't make too much
82assumptions about the real filesystem path here).
83
84Later, we can get the file back like this:
85
86  >>> store.getFile('myfile')
87  <open file ...>
88
89Please note, that we ask for ``myfile`` instead of ``myfile.jpg`` as
90the file id should not make a difference for different filename
91extensions. The file id for ``sample.jpg`` thus could simply be
92``sample``.
93
94What we get back is a file or file-like object already opened for
95reading:
96
97  >>> store.getFile('myfile').read()
98  'some file content'
99
100**Handlers: Special Places for Special Files**
101
102The file store supports special handling for certain files. For
103example we want applicant images to be stored in a different directory
104than student images, etc. Because the file store cannot know all
105details about these special tratment of certain files, it looks up
106helpers (handlers) to provide the information it needs for really
107storing the files at the correct location.
108
109That a file stored in filestore needs special handling can be
110indicated by special filenames. These filenames start with a marker like
111this::
112
113  __<MARKER-STRING>__real-filename
114
115Please note the double underscores before and after the marker
116string. They indicate that all in between is a marker.
117
118If you store a file in file store with such a filename (we call this a
119`file_id` to distuingish it from real world filenames), the file store
120will look up a handler for ``<MARKER-STRING>`` and pass it the file to
121store. The handler then will return the internal path to store the
122file and possibly do additional things as well like validating the
123file or similar.
124
125Examples for such a file store handler can be found in the
126:mod:`waeup.sirp.applicants.applicant` module. Please see also the
127:class:`DefaultFileStoreHandler` class below for more details.
128
129The file store looks up handlers by utility lookups: it looks for a
130named utiliy providing
131:class:`waeup.sirp.interfaces.IFileStoreHandler` and named like the
132marker string (without leading/trailing underscores) in lower
133case. For example if the file id would be
134
135  ``__IMG_USER__manfred``
136
137then the looked up utility should be registered under name
138
139  ``img_user``
140
141and provide :class:`waeup.sirp.interfaces.IFileStoreHandler`. If no
142such utility can be found, a default handler is used instead
143(see :class:`DefaultFileStoreHandler`).
144
145**About File IDs and Filenames**
146
147In the waeup.sirp package we want to store documents like CVs,
148photographs, and similar. Each of this documents might come into the
149system with different filename extensions. This could be a problem as
150the browser components might have to set different response headers
151for different filetypes and we nevertheless want to make sure that
152only one file is stored per document. For instance we don't want
153``passport.jpg`` *and* ``passport.png`` but only one of them.
154
155The default components like :class:`DefaultFileStoreHandler` take care
156of this by searching the filesystem for already existing files with
157same file id and eventually removing them.
158
159Therefore file ids should never include filename extensions (except if
160you only support exactly one filename extension for a certain
161document). The only part where you should add an extension (and it is
162important to do so) is when creating new files: when a file was
163uploaded you can pass in the filename (including the filename
164extension) and the file stored in external file store will (most
165probably) have a different name but the same extension as the original
166file.
167
168When looking for the file, you however only have to give the file id
169and the handlers should find the right file for you, regardless of the
170filename extension it has.
171
172**Context Adapters: Knowing Your Family**
173
174Often the internal filename or file id of a file depends on a
175context. For example when we store passport photographs of applicants,
176then each image belongs to a certain applicant instance. It is not
177difficult to maintain such a connection manually: Say every applicant
178had an id, then we could put this id into the filename as well and
179would build the filename to store/get the connected file by using that
180filename. You then would create filenames of a format like this::
181
182  __<MARKER-STRING>__applicant0001
183
184where ``applicant0001`` would tell exactly which applicant you can see
185on the photograph. You notice that the internal file id might have
186nothing to do with once uploaded filenames. The id above could have
187been uploaded with filename ``manfred.jpg`` but with the new file id
188we are able to find the file again later.
189
190Unfortunately it might soon get boring or cumbersome to retype this
191building of filenames for a certain type of context, especially if
192your filenames take more of the context into account than only a
193simple id.
194
195Therefore you can define filename building for a context as an adapter
196that then could be looked up by other components simply by doing
197something like:
198
199  >>> from waeup.sirp.interfaces import IFileStoreNameChooser
200  >>> file_id = IFileStoreNameChooser(my_context_obj)
201
202If you later want to change the way file ids are created from a
203certain context, you only have to change the adapter implementation
204accordingly.
205
206Note, that this is only a convenience component. You don't have to
207define context adapters but it makes things easier for others if you
208do, as you don't have to remember the exact file id creation method
209all the time and can change things quick and in only one location if
210you need to do so.
211
212Please see the :class:`FileStoreNameChooser` default implementation
213below for details.
214
215"""
216import glob
217import grok
218import os
219import tempfile
220from hurry.file import HurryFile
221from hurry.file.interfaces import IFileRetrieval
222from zope.component import queryUtility
223from zope.interface import Interface
224from waeup.sirp.interfaces import (
225    IFileStoreNameChooser, IExtFileStore, IFileStoreHandler,)
226
227class FileStoreNameChooser(grok.Adapter):
228    """Default file store name chooser.
229
230    File store name choosers pick a file id, a string, for a certain
231    context object. They are normally registered as adapters for a
232    certain content type and know how to build the file id for this
233    special type of context.
234
235    Provides the :class:`waeup.sirp.interfaces.IFileStoreNameChooser`
236    interface.
237
238    This default file name chosser accepts almost every name as long
239    as it is a string or unicode object.
240    """
241    grok.context(Interface)
242    grok.implements(IFileStoreNameChooser)
243
244    def checkName(self, name, attr=None):
245        """Check whether a given name (file id) is valid.
246
247        Raises a user error if the name is not valid.
248
249        For the default file store name chooser any name is valid as
250        long as it is a string.
251
252        The `attr` is not taken into account here.
253        """
254        if isinstance(name, basestring):
255            return True
256        return False
257
258    def chooseName(self, name, attr=None):
259        """Choose a unique valid file id for the object.
260
261        The given name may be taken into account when choosing the
262        name (file id).
263
264        chooseName is expected to always choose a valid name (that
265        would pass the checkName test) and never raise an error.
266
267        For this default name chooser we return the given name if it
268        is valid or ``unknown_file`` else. The `attr` param is not
269        taken into account here.
270        """
271        if self.checkName(name):
272            return name
273        return u'unknown_file'
274
275class ExtFileStore(object):
276    """External file store.
277
278    External file stores are meant to store files 'externally' of the
279    ZODB, i.e. in filesystem.
280
281    Most important attribute of the external file store is the `root`
282    path which gives the path to the location where files will be
283    stored within.
284
285    By default `root` is a ``'media/'`` directory in the root of the
286    datacenter root of a site.
287
288    The `root` attribute is 'read-only' because you normally don't
289    want to change this path -- it is dynamic. That means, if you call
290    the file store from 'within' a site, the root path will be located
291    inside this site (a :class:`waeup.sirp.University` instance). If
292    you call it from 'outside' a site some temporary dir (always the
293    same during lifetime of the file store instance) will be used. The
294    term 'temporary' tells what you can expect from this path
295    persistence-wise.
296
297    If you insist, you can pass a root path on initialization to the
298    constructor but when calling from within a site afterwards, the
299    site will override your setting for security measures. This way
300    you can safely use one file store for different sites in a Zope
301    instance simultanously and files from one site won't show up in
302    another.
303
304    An ExtFileStore instance is available as a global utility
305    implementing :class:`waeup.sirp.interfaces.IExtFileStore`.
306
307    To add and retrieve files from the storage, use the appropriate
308    methods below.
309    """
310
311    grok.implements(IExtFileStore)
312
313    _root = None
314
315    @property
316    def root(self):
317        """Root dir of this storage.
318
319        The root dir is a readonly value determined dynamically. It
320        holds media files for sites or other components.
321
322        If a site is available we return a ``media/`` dir in the
323        datacenter storage dir.
324
325        Otherwise we create a temporary dir which will be remembered
326        on next call.
327
328        If a site exists and has a datacenter, it has always
329        precedence over temporary dirs, also after a temporary
330        directory was created.
331
332        Please note that retrieving `root` is expensive. You might
333        want to store a copy once retrieved in order to minimize the
334        number of calls to `root`.
335
336        """
337        site = grok.getSite()
338        if site is not None:
339            root = os.path.join(site['datacenter'].storage, 'media')
340            return root
341        if self._root is None:
342            self._root = tempfile.mkdtemp()
343        return self._root
344
345    def __init__(self, root=None):
346        self._root = root
347        return
348
349    def _pathFromFileID(self, file_id):
350        """Helper method to create filesystem path from FileID.
351
352        Used class-internally. Do not rely on this method when working
353        with an :class:`ExtFileStore` instance from other components.
354        """
355        marker, filename, base, ext = self.extractMarker(file_id)
356        handler = queryUtility(IFileStoreHandler, name=marker,
357                               default=DefaultFileStoreHandler())
358        path = handler.pathFromFileID(self, self.root, file_id)
359        return path
360
361    def getFile(self, file_id):
362        """Get a file stored under file ID `file_id`.
363
364        Returns a file already opened for reading.
365
366        If the file cannot be found ``None`` is returned.
367
368        This methods takes into account registered handlers for any
369        marker put into the file_id.
370
371        .. seealso:: :class:`DefaultFileStoreHandler`
372        """
373        path = self._pathFromFileID(file_id)
374        if not os.path.exists(path):
375            return None
376        fd = open(path, 'rb')
377        return fd
378
379    def getFileByContext(self, context, attr=None):
380        """Get a file for given context.
381
382        Returns a file already opened for reading.
383
384        If the file cannot be found ``None`` is returned.
385
386        This method takes into account registered handlers and file
387        name choosers for context types to build an intermediate file
388        id for the context and `attr` given.
389
390        Both, `context` and `attr` are used to find (`context`)
391        and feed (`attr`) an appropriate file name chooser.
392
393        This is a convenience method that internally calls
394        :meth:`getFile`.
395
396        .. seealso:: :class:`FileStoreNameChooser`,
397                     :class:`DefaultFileStoreHandler`.
398        """
399        file_id = IFileStoreNameChooser(context).chooseName(attr=attr)
400        return self.getFile(file_id)
401
402    def deleteFile(self, file_id):
403        """Delete file stored under `file_id` in storage.
404
405        The file is physically removed from filesystem.
406        """
407        path = self._pathFromFileID(file_id)
408        if not os.path.exists(path) or not os.path.isfile(path):
409            return
410        os.unlink(path)
411        return
412
413    def deleteFileByContext(self, context, attr=None):
414        """Remove file identified by `context` and `attr` if it exists.
415
416        This method takes into account registered handlers and file
417        name choosers for context types to build an intermediate file
418        id for the context and `attr` given.
419
420        Both, `context` and `attr` are used to find (`context`)
421        and feed (`attr`) an appropriate file name chooser.
422
423        This is a convenience method that internally calls
424        :meth:`getFile`.
425
426        .. seealso:: :class:`FileStoreNameChooser`,
427                     :class:`DefaultFileStoreHandler`.
428
429        """
430        file_id = IFileStoreNameChooser(context).chooseName(attr=attr)
431        return self.deleteFile(file_id)
432
433    def createFile(self, filename, f):
434        """Store a file.
435        """
436        root = self.root # Calls to self.root are expensive
437        file_id = os.path.splitext(filename)[0]
438        marker, filename, base, ext = self.extractMarker(filename)
439        handler = queryUtility(IFileStoreHandler, name=marker,
440                               default=DefaultFileStoreHandler())
441        f, path, file_obj = handler.createFile(
442            self, root, filename, file_id, f)
443        dirname = os.path.dirname(path)
444        if not os.path.exists(dirname):
445            os.makedirs(dirname, 0755)
446        open(path, 'wb').write(f.read())
447        return file_obj
448
449    def extractMarker(self, file_id):
450        """split filename into marker, filename, basename, and extension.
451
452        A marker is a leading part of a string of form
453        ``__MARKERNAME__`` followed by the real filename. This way we
454        can put markers into a filename to request special processing.
455
456        Returns a quadruple
457
458          ``(marker, filename, basename, extension)``
459
460        where ``marker`` is the marker in lowercase, filename is the
461        complete trailing real filename, ``basename`` is the basename
462        of the filename and ``extension`` the filename extension of
463        the trailing filename. See examples below.
464
465        Example:
466
467           >>> extractMarker('__MaRkEr__sample.jpg')
468           ('marker', 'sample.jpg', 'sample', '.jpg')
469
470        If no marker is contained, we assume the whole string to be a
471        real filename:
472
473           >>> extractMarker('no-marker.txt')
474           ('', 'no-marker.txt', 'no-marker', '.txt')
475
476        Filenames without extension give an empty extension string:
477
478           >>> extractMarker('no-marker')
479           ('', 'no-marker', 'no-marker', '')
480
481        """
482        if not isinstance(file_id, basestring) or not file_id:
483            return ('', '', '', '')
484        parts = file_id.split('__', 2)
485        marker = ''
486        if len(parts) == 3 and parts[0] == '':
487            marker = parts[1].lower()
488            file_id = parts[2]
489        basename, ext = os.path.splitext(file_id)
490        return (marker, file_id, basename, ext)
491
492grok.global_utility(ExtFileStore, provides=IExtFileStore)
493
494class DefaultStorage(ExtFileStore):
495    """Default storage for files.
496
497    Registered globally as utility for
498    :class:`hurry.file.interfaces.IFileRetrieval`.
499    """
500    grok.provides(IFileRetrieval)
501
502grok.global_utility(DefaultStorage, provides=IFileRetrieval)
503
504class DefaultFileStoreHandler(grok.GlobalUtility):
505    """A default handler for external file store.
506
507    This handler is the fallback called by external file stores when
508    there is no or an unknown marker in the file id.
509
510    Registered globally as utility for
511    :class:`waeup.sirp.interfaces.IFileStoreHandler`.
512    """
513    grok.implements(IFileStoreHandler)
514
515    def _searchInPath(self, path):
516        """Get complete path of any existing file starting with `path`.
517
518        If no such file can be found, return input path.
519
520        If multiple such files exist, return the first one.
521
522        **Example:**
523
524        Looking for a `path`::
525
526          '/tmp/myfile'
527
528        will find any file like ``'/tmp/myfile.txt'``,
529        ``'/tmp/myfile.jpg'`` and so on, if it exists.
530        """
531        result = path
532        if os.path.isdir(os.path.dirname(path)):
533            file_iter = glob.iglob('%s*' % (path,))
534            try:
535                result = file_iter.next()
536            except StopIteration:
537                pass
538        return result
539
540    def pathFromFileID(self, store, root, file_id):
541        """Return a path for getting/storing a file with given file id.
542
543        If there is already a file stored for the given file id, the
544        path to this file is returned.
545
546        If no such file exists yet (or the the only file existing has
547        no filename extension at all) a path to store the file but
548        without any filename extension is returned.
549        """
550        path = os.path.join(root, file_id)
551        return self._searchInPath(path)
552
553    def createFile(self, store, root, filename, file_id, f):
554        """Infos about what to store exactly and where.
555
556        When a file should be handled by an external file storage, it
557        looks up any handlers (like this one), passes runtime infos
558        like the storage object, root path, filename, file_id, and the
559        raw file object itself.
560
561        The handler can then change the file, raise exceptions or
562        whatever and return the result.
563
564        This handler returns the input file as-is, a path returned by
565        :meth:`pathFromFileID` and an instance of
566        :class:`hurry.file.HurryFile` for further operations.
567
568        Please note: although a handler has enough infos to store the
569        file itself, it should leave that task to the calling file
570        store.
571
572        This method does, however, remove any existing files stored
573        under the given file id.
574        """
575        ext = os.path.splitext(filename)[1]
576        path = self.pathFromFileID(store, root, file_id)
577        base, old_ext = os.path.splitext(path)
578        if old_ext != ext:
579            if os.path.exists(path):
580                os.unlink(path)
581            path = base + ext
582        return f, path, HurryFile(filename, file_id + ext)
Note: See TracBrowser for help on using the repository browser.