source: waeup/branches/ulif-rewrite/src/waeup/csvfile/README.txt @ 4288

Last change on this file since 4288 was 4288, checked in by uli, 15 years ago

More sphinx aware tagging.

File size: 15.5 KB
Line 
1:mod:`waeup.csvfile` -- generic support for handling CSV files
2**************************************************************
3
4:Test-Layer: unit
5
6.. module:: waeup.csvfile
7   :synopsis: generic support for handling CSV files.
8
9
10.. note::
11
12   This version of the :mod:`waeup.csvfile` module doesn't support
13   Unicode input.  Also, there are currently some issues regarding
14   ASCII NUL characters.  Accordingly, all input should be UTF-8 or
15   printable ASCII to be safe. These restrictions will be removed in
16   the future.
17
18
19Module Contents
20================
21
22:class:`CSVFile`
23----------------
24
25.. class:: CSVFile(filepath)
26
27   Wrapper around the path to a real CSV file. :class:`CSVFile` is an
28   adapter that adapts basestring objects (aka regular and unicode
29   strings).
30
31   :class:`CSVFile` is designed as a base for derived, more
32   specialized types of CSV files wrappers, although it can serve as a
33   basic wrapper for simple, unspecified CSV files.
34
35   .. method:: grok.context(basestring)
36      :noindex:
37
38      We bind to basestring objects.
39
40   .. method:: grok.implements(ICSVFile)
41      :noindex:
42
43   .. attribute:: required_fields=[]
44
45      A list of header fields (strings) required for this kind of
46      CSVFile. Using the default constructor will fail with paths to
47      files that do *not* provide those fields.
48
49      The defaul value (empty list) means: no special fields required
50      at all.
51
52      Deriving classes can override this attribute to accept only
53      files that provide the appropriate header fields. The default
54      constructor already checks this.
55
56   .. attribute:: path
57
58      A string describing the path to the associated CSV file.
59
60   .. method:: getData()
61
62      Returns a generator delivering one data row of the denoted file
63      at a time.
64
65      Each data row is delivered as a dictionary mapping from header
66      field names to the values. Therefore a source line like this::
67
68        field1,field2
69        ...
70        data 1,data 2
71
72      will become a dictionary like this::
73
74        {'field1' : 'data 1',
75         'field2' : 'data 2'}
76
77      The :meth:`getData` method does not evaluate file
78      values for correct data types or so.
79
80   .. method:: getHeaderFields()
81
82      Get a sorted list of header fields in the wrapped CSV file.
83
84
85:func:`getCSVFile`
86----------------------
87
88.. function:: getCSVFile(filepath)
89
90   `filepath`
91      String with a filepath to an existing CSV file.
92
93   Get a CSV file wrapper for the given `filepath`. :func:`getCSVFile`
94   knows about all registered :class:`CSVFile` wrappers registered and
95   searches them for the most appropriate one.
96
97   If none can be found ``None`` is returned.
98
99   .. seealso::
100
101      :ref:`getcsvfilewrapper`,
102
103      :ref:`getcsvfiledecision`
104
105Basic example
106=============
107
108To initialize the whole framework we have to grok the :mod:`waeup`
109package first:
110
111    >>> import grok
112    >>> grok.testing.grok('waeup')
113
114Create a file:
115
116    >>> path = 'mycsvfile.csv'
117    >>> open(path, 'wb').write(
118    ... """col1,col2
119    ... dataitem1,dataitem2
120    ... item3,item4
121    ... """)
122
123A regular file path is difficult to handle in terms of a component
124framework. Therefore we get a wrapper for it, that makes a CSV file
125object out of a path string:
126
127    >>> from waeup.csvfile.interfaces import ICSVFile
128    >>> src = ICSVFile(path)
129    >>> src
130    <waeup.csvfile.csvfile.CSVFile object at 0x...>
131
132Create a receiver:
133
134    >>> from waeup.csvfile.interfaces import ICSVDataReceiver
135    >>> class Receiver(object):
136    ...   grok.implements(ICSVDataReceiver)
137    ...   def receive(self, data):
138    ...     print "RECEIVED: ", data
139
140    >>> recv = Receiver()
141
142Find a connector:
143
144    >>> from zope.component import getMultiAdapter
145    >>> from waeup.csvfile.interfaces import ICSVDataConnector
146    >>> conn = getMultiAdapter((src, recv), ICSVDataConnector)
147    Traceback (most recent call last):
148    ...
149    ComponentLookupError: ((<waeup.csvfile.csvfile.CSVFile object at 0x...>,
150                            <Receiver object at 0x...>),
151        <InterfaceClass waeup.csvfile.interfaces.ICSVDataConnector>, u'')
152
153Okay, create a connector:
154
155    >>> class Connector1(grok.MultiAdapter):
156    ...   grok.adapts(ICSVFile, ICSVDataReceiver)
157    ...   grok.provides(ICSVDataConnector)
158    ...   def __init__(self, source, receiver):
159    ...     self.source = source
160    ...     self.receiver = receiver
161    ...   def doImport(self):
162    ...     self.receiver.receive(
163    ...        self.source.getData())
164
165    >>> grok.testing.grok_component('Connector1', Connector1)
166    True
167
168Try again...
169
170    >>> conn = getMultiAdapter((src, recv), ICSVDataConnector)
171    >>> conn
172    <Connector1 object at 0x...>
173
174    >>> conn.doImport()
175    RECEIVED: <generator object at 0x...>
176
177Clean up:
178
179    >>> import os
180    >>> os.unlink(path)
181
182
183CSV file wrappers
184=================
185
186CSV file wrappers can extract data from CSV files denoted by a path.
187
188We create a CSV file:
189
190    >>> path = 'mycsvfile.csv'
191    >>> open(path, 'wb').write(
192    ... """col1,col2
193    ... dataitem1,dataitem2
194    ... item3,item4
195    ... """)
196
197Now we get a CSV file wrapper for it. This is simply done by asking
198for an adapter to the path string:
199
200    >>> from waeup.csvfile.interfaces import ICSVFile
201    >>> wrapper = ICSVFile(path)
202    >>> wrapper
203    <waeup.csvfile.csvfile.CSVFile object at 0x...>
204
205This wrapper can return the CSV data as a sequence of dicts:
206
207    >>> wrapper.getData()
208    <generator object at 0x...>
209
210As we see, the single dicts (each representing a row) are returned as
211a generator. We can list them:
212
213    >>> list(wrapper.getData())
214    [{'col2': 'dataitem2', 'col1': 'dataitem1'},
215     {'col2': 'item4', 'col1': 'item3'}]
216
217We can get a list of headerfields found in the file:
218
219    >>> wrapper.getHeaderFields()
220    ['col1', 'col2']
221
222.. _getcsvfilewrapper:
223
224Getting a wrapper
225=================
226
227If we want to get a wrapper best suited for our purposes, we can also
228use the :func:`getCSVFile` function:
229
230    >>> from waeup.csvfile.csvfile import getCSVFile
231    >>> wrapper = getCSVFile(path)
232    >>> wrapper
233    <waeup.csvfile.csvfile.CSVFile object at 0x...>
234
235As we currently have only one type of wrapper, we get this. Let's
236create another wrapper, that requires a column 'col1':
237
238    >>> from waeup.csvfile.interfaces import ICSVFile
239    >>> from waeup.csvfile import CSVFile
240    >>> class ICSVFileWithCol1(ICSVFile):
241    ...   """A CSV file that contains a 'col1' column.
242    ...   """
243
244    >>> class CSVFileWithCol1(CSVFile):
245    ...   required_fields = ['col1']
246    ...   grok.implements(ICSVFileWithCol1)
247    ...   grok.provides(ICSVFileWithCol1)
248
249We have to grok:
250
251    >>> grok.testing.grok_component('CSVFileWithCol1', CSVFileWithCol1)
252    True
253
254Now we can ask for a wrapper again, but this time we will get a
255CSVFileWithCol12 instance:
256
257    >>> getCSVFile(path)
258    <CSVFileWithCol1 object at 0x...>
259
260If we cannot get a wrapper at all, ``None`` is returned:
261
262    >>> getCSVFile('not-existent-file') is None
263    True
264
265.. _getcsvfiledecision:
266
267
268How :func:`getCSVFile` decides which wrapper to use
269---------------------------------------------------
270
271Apparently, :func:`getCSVFile` performes some magic: given a certain
272CSV file, it decides which one of all registered wrappers suits the
273file best.
274
275This decision is based on a score, which is computed as shown below
276for each registered wrapper.
277
278Before we can show this, we create some more CSV files.
279
280One file that does not contain valid CSV data:
281
282     >>> nocsvpath = 'nocsvfile.csv'
283     >>> open(nocsvpath, 'wb').write(
284     ... """blah blah blah.
285     ... blubb blubb.
286     ... """)
287
288One file that contains a 'col1' and a 'col3' column:
289
290    >>> path2 = 'mycsvfile2.csv'
291    >>> open(path2, 'wb').write(
292    ... """col1,col3
293    ... dataitem1b,dataitem2b
294    ... item3b,item4b
295    ... """)
296
297We create a wrapper that requires 'col1' and 'col2':
298
299    >>> from waeup.csvfile.interfaces import ICSVFile
300    >>> from waeup.csvfile import CSVFile
301    >>> class ICSVFile12(ICSVFile):
302    ...   """A CSV file that contains a 'special_col' column.
303    ...   """
304
305    >>> class CSVFile12(CSVFile):
306    ...   required_fields = ['col1', 'col2']
307    ...   grok.context(basestring)
308    ...   grok.implements(ICSVFile12)
309    ...   grok.provides(ICSVFile12)
310
311    >>> grok.testing.grok_component('CSVFile12',  CSVFile12)
312    True
313
314
315* If no instance of a certain wrapper can be created from the given
316  path (i.e. __init__ raises some kind of exception): score is -1:
317
318    >>> from waeup.csvfile.csvfile import getScore
319    >>> getScore('nonexistant', CSVFile)
320    -1
321
322* If a wrapper requires at least one header_field and the given file
323  does not provide all of the required fields: score is -1:
324
325    >>> getScore(path2, CSVFile12)
326    -1
327
328* If a wrapper requires no header fields at all
329  (i.e. `required_fields` equals empty list): score is 0 (zero):
330
331    >>> getScore(path, CSVFile)
332    0
333
334* If a wrapper requires at least one header_field and all header
335  fields do also appear in the file: score is number of required
336  fields.
337
338    >>> getScore(path, CSVFileWithCol1)
339    1
340
341    >>> getScore(path, CSVFile12)
342    2
343
344If several wrappers get the same score for a certain file, the result
345is not determined.
346
347
348How to build custom CSV file wrappers
349=====================================
350
351A typical CSV file wrapper can be built like this:
352
353    >>> import grok
354    >>> from waeup.csvfile.interfaces import ICSVFile
355    >>> from waeup.csvfile import CSVFile
356
357    >>> class ICustomCSVFile(ICSVFile):
358    ...   """A marker for custom CSV files."""
359
360    >>> class CustomCSVFile(CSVFile):
361    ...   required_fields = ['somecol', 'othercol']
362    ...   grok.implements(ICustomCSVFile)
363    ...   grok.provides(ICustomCSVFile)
364
365    >>> grok.testing.grok_component('CustomCSVFile',  CustomCSVFile)
366    True
367
368The special things here are:
369
370* Derive from :class:`CSVFile`
371
372  :func:`getCSVFile` looks only for classes that are derived from
373  :class:`CSVFile`. So if you want your wrapper to be found by this function,
374  derive from :class:`CSVFile`.
375
376  As :class:`CSVFile` is an adapter, also our custom wrapper will
377  become one (adapting strings):
378
379     >>> ICustomCSVFile(path)
380     Traceback (most recent call last):
381     ...
382     TypeError: Missing columns in CSV file: ['somecol', 'othercol']
383
384  If our input file provides the correct columns, it will work:
385
386     >>> custompath = 'mycustom.csv'
387     >>> open(custompath, 'wb').write(
388     ... """somecol,othercol,thirdcol
389     ... dataitem1,dataitem2,dataitem3
390     ... """)
391
392     >>> ICustomCSVFile(custompath)
393     <CustomCSVFile object at 0x...>
394
395* Provide and implement a custom interface
396
397  A custom CSV file wrapper should provide and implement an own
398  interface. Otherwise it could not be required by other components
399  explicitly.
400
401  We have to provide *and* implement the custom interface because
402  otherwise instances would not implement the required interface
403  (maybe due to a flaw in `grok`/`martian`).
404
405
406Common Use Cases
407================
408
409Get a wrapper for a certain type of CSV file
410--------------------------------------------
411
412The type of a :class:`CSVFile` is determined by the interfaces it
413provides.
414
415If we want to get a wrapper that also guarantees to support certain
416fields (or None), then we already know about the wanted type.
417
418We create a file that does not have a 'special_col' field:
419
420    >>> path = 'mycsvfile.csv'
421    >>> open(path, 'wb').write(
422    ... """col1,col2
423    ... dataitem1,dataitem2
424    ... item3,item4
425    ... """)
426
427Now we create a wrapper, that requires that field:
428
429    >>> from waeup.csvfile.interfaces import ICSVFile
430    >>> from waeup.csvfile import CSVFile
431    >>> class ICSVFileWithSpecialCol(ICSVFile):
432    ...   """A CSV file that contains a 'special_col' column.
433    ...   """
434
435    >>> class CSVFileWithSpecialCol(CSVFile):
436    ...   required_fields = ['special_col']
437    ...   grok.provides(ICSVFileWithSpecialCol)
438
439    >>> grok.testing.grok_component('CSVFileWithSpecialCol',
440    ...                             CSVFileWithSpecialCol)
441    True
442
443If we want to get a wrapper for that kind of file:
444
445    >>> ICSVFileWithSpecialCol(path)
446    Traceback (most recent call last):
447    ...
448    TypeError: Missing columns in CSV file: ['special_col']
449
450If the required col is available, however:
451
452    >>> path2 = 'mycsvfile2.csv'
453    >>> open(path2, 'wb').write(
454    ... """col1,col2,special_col
455    ... dataitem1,dataitem2,dataitem3
456    ... item4,item5,item6
457    ... """)
458    >>> ICSVFileWithSpecialCol(path2)
459    <CSVFileWithSpecialCol object at 0x...>
460
461Build an importer framework
462---------------------------
463
464We can also build an importer framework with CSV file support using
465the components described above.
466
467To model this, we start with two files, we will import lateron:
468
469    >>> path = 'mycsvfile.csv'
470    >>> open(path, 'wb').write(
471    ... """col1,col2
472    ... dataitem1a,dataitem2a
473    ... item3a,item4a
474    ... """)
475
476    >>> path2 = 'mycsvfile2.csv'
477    >>> open(path2, 'wb').write(
478    ... """col1,col3
479    ... dataitem1b,dataitem2b
480    ... item3b,item4b
481    ... """)
482
483Then we create two receivers for CSV file data:
484
485    >>> from zope.interface import Interface
486    >>> class IReceiver1(Interface):
487    ...   """A CSV data receiver."""
488
489    >>> class IReceiver2(Interface):
490    ...   """Another CSV data receiver."""
491
492    >>> class Receiver1(object):
493    ...   grok.implements(IReceiver1)
494    ...   def receive(self, data):
495    ...     print "Receiver1 received: ", data
496
497    >>> class Receiver2(object):
498    ...   grok.implements(IReceiver2)
499    ...   def receive(self, data):
500    ...     print "Receiver2 received: ", data
501   
502
503If we want to be sure, that a wrapper requires these fields, we ask
504for ICSVFile12:
505
506    >>> wrapper1 = ICSVFile12(path)
507    >>> wrapper1
508    <CSVFile12 object at 0x...>
509
510We could not use this interface (adapter) with the other CSV file:
511
512    >>> wrapper2 = ICSVFile12(path2)
513    Traceback (most recent call last):
514    ...
515    TypeError: Missing columns in CSV file: ['col2']
516
517The last step is to build a bridge between the receivers and the
518sources. We call it connector or importer here:
519
520    >>> class IImporter(Interface):
521    ...   """Import sources to receivers."""
522
523    >>> class IImporter12(IImporter):
524    ...   """Imports ICSVFile12 data into IReceiver1 objects."""
525
526    >>> class Importer12(grok.MultiAdapter):
527    ...   grok.adapts(ICSVFile12, IReceiver1)
528    ...   grok.implements(IImporter12)
529    ...   def __init__(self, csvfile, receiver):
530    ...     self.csvfile = csvfile
531    ...     self.receiver = receiver
532    ...   def doImport(self):
533    ...     self.receiver.receive(self.csvfile.getData())
534
535    >>> grok.testing.grok_component('Importer12',  Importer12)
536    True
537
538We can create an importer if we know the type of CSV file:
539
540    >>> myrecv = Receiver1()
541    >>> myfile = ICSVFile12(path)
542    >>> myfile
543    <CSVFile12 object at 0x...>
544
545    >>> ICSVFile12.providedBy(myfile)
546    True
547
548    >>> from zope.component import getMultiAdapter
549    >>> myimporter = getMultiAdapter((myfile, myrecv), IImporter12)
550    >>> myimporter
551    <Importer12 object at 0x...>
552
553
554We can also create an importer without knowing the type of CSV file
555before, using the :func:`getCSVFile` function:
556
557    >>> from waeup.csvfile import getCSVFile
558    >>> myfile = getCSVFile(path)
559    >>> ICSVFile12.providedBy(myfile)
560    True
561
562Apparently, the CSVFileWrapper getter knows, that a CSVFile12 suits
563the contents of our file best. We did not specify what type of file
564wrapper we want.
565
566Getting an importer now is easy:
567
568    >>> myimporter = getMultiAdapter((myfile, myrecv), IImporter12)
569    >>> myimporter
570    <Importer12 object at 0x...>
571
572
573Clean up:
574
575    >>> import os
576    >>> os.unlink(path)
577    >>> os.unlink(path2)
578    >>> os.unlink(nocsvpath)
579    >>> os.unlink(custompath)
Note: See TracBrowser for help on using the repository browser.