:mod:`waeup.sirp.csvfile` -- generic support for handling CSV files ******************************************************************* :Test-Layer: unit .. module:: waeup.sirp.csvfile :synopsis: generic support for handling CSV files. .. note:: This version of the :mod:`waeup.sirp.csvfile` module doesn't support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe. These restrictions will be removed in the future. Module Contents ================ :class:`CSVFile` ---------------- .. class:: CSVFile(filepath) Wrapper around the path to a real CSV file. :class:`CSVFile` is an adapter that adapts basestring objects (aka regular and unicode strings). :class:`CSVFile` is designed as a base for derived, more specialized types of CSV files wrappers, although it can serve as a basic wrapper for simple, unspecified CSV files. .. method:: grok.context(basestring) :noindex: We bind to basestring objects. .. method:: grok.implements(ICSVFile) :noindex: .. attribute:: required_fields=[] A list of header fields (strings) required for this kind of CSVFile. Using the default constructor will fail with paths to files that do *not* provide those fields. The defaul value (empty list) means: no special fields required at all. Deriving classes can override this attribute to accept only files that provide the appropriate header fields. The default constructor already checks this. .. attribute:: path A string describing the path to the associated CSV file. .. method:: getData() Returns a generator delivering one data row of the denoted file at a time. Each data row is delivered as a dictionary mapping from header field names to the values. Therefore a source line like this:: field1,field2 ... data 1,data 2 will become a dictionary like this:: {'field1' : 'data 1', 'field2' : 'data 2'} The :meth:`getData` method does not evaluate file values for correct data types or so. .. method:: getHeaderFields() Get a sorted list of header fields in the wrapped CSV file. :func:`getCSVFile` ------------------ .. function:: getCSVFile(filepath) `filepath` String with a filepath to an existing CSV file. Get a CSV file wrapper for the given `filepath`. :func:`getCSVFile` knows about all registered :class:`CSVFile` wrappers registered and searches them for the most appropriate one. If none can be found ``None`` is returned. .. seealso:: :ref:`getcsvfilewrapper`, :ref:`getcsvfiledecision` Helpers ======= Some helper functions provide convenience methods for handling CSV data. :func:`toBool` -------------- .. function:: toBool(string) `string` String containing some CSV data. Turn a string into a boolean value. If the string contains one of the values ``'true'``, ``'yes'``, ``'y'``, ``'on'`` or ``'checked'`` then ``True`` is returned, ``False`` otherwise. The string can be uppercase, lowercase or mixed: >>> from waeup.sirp.csvfile import toBool >>> toBool('y') True >>> toBool('Yes') True >>> toBool('TRUE') True >>> toBool('no') False If we pass in a boolean then this will be returned unchanged: >>> toBool(True) True >>> toBool(False) False Basic example ============= To initialize the whole framework we have to grok the :mod:`waeup.sirp` package first: >>> import grok >>> grok.testing.grok('waeup.sirp') Create a file: >>> path = 'mycsvfile.csv' >>> open(path, 'wb').write( ... """col1,col2 ... dataitem1,dataitem2 ... item3,item4 ... """) A regular file path is difficult to handle in terms of a component framework. Therefore we get a wrapper for it, that makes a CSV file object out of a path string: >>> from waeup.sirp.csvfile.interfaces import ICSVFile >>> src = ICSVFile(path) >>> src Create a receiver: >>> from waeup.sirp.csvfile.interfaces import ICSVDataReceiver >>> class Receiver(object): ... grok.implements(ICSVDataReceiver) ... def receive(self, data): ... print "RECEIVED: ", data >>> recv = Receiver() Find a connector: >>> from zope.component import getMultiAdapter >>> from waeup.sirp.csvfile.interfaces import ICSVDataConnector >>> conn = getMultiAdapter((src, recv), ICSVDataConnector) Traceback (most recent call last): ... ComponentLookupError: ((, ), , u'') Okay, create a connector: >>> class Connector1(grok.MultiAdapter): ... grok.adapts(ICSVFile, ICSVDataReceiver) ... grok.provides(ICSVDataConnector) ... def __init__(self, source, receiver): ... self.source = source ... self.receiver = receiver ... def doImport(self): ... self.receiver.receive( ... self.source.getData()) >>> grok.testing.grok_component('Connector1', Connector1) True Try again... >>> conn = getMultiAdapter((src, recv), ICSVDataConnector) >>> conn >>> conn.doImport() RECEIVED: Clean up: >>> import os >>> os.unlink(path) CSV file wrappers ================= CSV file wrappers can extract data from CSV files denoted by a path. We create a CSV file: >>> path = 'mycsvfile.csv' >>> open(path, 'wb').write( ... """col1,col2 ... dataitem1,dataitem2 ... item3,item4 ... """) Now we get a CSV file wrapper for it. This is simply done by asking for an adapter to the path string: >>> from waeup.sirp.csvfile.interfaces import ICSVFile >>> wrapper = ICSVFile(path) >>> wrapper This wrapper can return the CSV data as a sequence of dicts: >>> wrapper.getData() As we see, the single dicts (each representing a row) are returned as a generator. We can list them: >>> list(wrapper.getData()) [{'col2': 'dataitem2', 'col1': 'dataitem1'}, {'col2': 'item4', 'col1': 'item3'}] We can get a list of headerfields found in the file: >>> wrapper.getHeaderFields() ['col1', 'col2'] .. _getcsvfilewrapper: Getting a wrapper ================= If we want to get a wrapper best suited for our purposes, we can also use the :func:`getCSVFile` function: >>> from waeup.sirp.csvfile.csvfile import getCSVFile >>> wrapper = getCSVFile(path) >>> wrapper As we currently have only one type of wrapper, we get this. Let's create another wrapper, that requires a column 'col1': >>> from waeup.sirp.csvfile.interfaces import ICSVFile >>> from waeup.sirp.csvfile import CSVFile >>> class ICSVFileWithCol1(ICSVFile): ... """A CSV file that contains a 'col1' column. ... """ >>> class CSVFileWithCol1(CSVFile): ... required_fields = ['col1'] ... grok.implements(ICSVFileWithCol1) ... grok.provides(ICSVFileWithCol1) We have to grok: >>> grok.testing.grok_component('CSVFileWithCol1', CSVFileWithCol1) True Now we can ask for a wrapper again, but this time we will get a CSVFileWithCol12 instance: >>> getCSVFile(path) If we cannot get a wrapper at all, ``None`` is returned: >>> getCSVFile('not-existent-file') is None True .. _getcsvfiledecision: How :func:`getCSVFile` decides which wrapper to use --------------------------------------------------- Apparently, :func:`getCSVFile` performes some magic: given a certain CSV file, it decides which one of all registered wrappers suits the file best. This decision is based on a score, which is computed as shown below for each registered wrapper. Before we can show this, we create some more CSV files. One file that does not contain valid CSV data: >>> nocsvpath = 'nocsvfile.csv' >>> open(nocsvpath, 'wb').write( ... """blah blah blah. ... blubb blubb. ... """) One file that contains a 'col1' and a 'col3' column: >>> path2 = 'mycsvfile2.csv' >>> open(path2, 'wb').write( ... """col1,col3 ... dataitem1b,dataitem2b ... item3b,item4b ... """) We create a wrapper that requires 'col1' and 'col2' but does not check in constructor, whether this requirement is met: >>> from waeup.sirp.csvfile.interfaces import ICSVFile >>> from waeup.sirp.csvfile import CSVFile >>> class ICSVFile13(ICSVFile): ... """A CSV file that contains a 'special_col' column. ... """ >>> class CSVFile13(CSVFile): ... required_fields = ['col1', 'col2'] ... grok.context(basestring) ... grok.implements(ICSVFile13) ... grok.provides(ICSVFile13) ... def __init__(self, context): ... self.path = context >>> grok.testing.grok_component('CSVFile13', CSVFile13) True .. warn:: This is bad design as :class:`ICSVFile` instances should always raise an exception in their constructor if a file does not meet the basic requirements. The base constructor will check for the correct values. So, if you do not overwrite the base constructor, instances will check on creation time, whether they can handle the desired file. We create a wrapper that requires 'col1' and 'col2': >>> from waeup.sirp.csvfile.interfaces import ICSVFile >>> from waeup.sirp.csvfile import CSVFile >>> class ICSVFile12(ICSVFile): ... """A CSV file that contains a 'special_col' column. ... """ >>> class CSVFile12(CSVFile): ... required_fields = ['col1', 'col2'] ... grok.context(basestring) ... grok.implements(ICSVFile12) ... grok.provides(ICSVFile12) >>> grok.testing.grok_component('CSVFile12', CSVFile12) True Now the rules applied to all wrappers are: * If no instance of a certain wrapper can be created from the given path (i.e. __init__ raises some kind of exception): score is -1: >>> from waeup.sirp.csvfile.csvfile import getScore >>> getScore('nonexistant', CSVFile) -1 * If a wrapper requires at least one header_field and the given file does not provide all of the required fields: score is -1: >>> getScore(path2, CSVFile12) -1 * If a wrapper requires no header fields at all (i.e. `required_fields` equals empty list): score is 0 (zero): >>> getScore(path, CSVFile) 0 * If a wrapper requires at least one header_field and all header fields do also appear in the file: score is number of required fields. >>> getScore(path, CSVFileWithCol1) 1 >>> getScore(path, CSVFile12) 2 If several wrappers get the same score for a certain file, the result is not determined. How to build custom CSV file wrappers ===================================== A typical CSV file wrapper can be built like this: >>> import grok >>> from waeup.sirp.csvfile.interfaces import ICSVFile >>> from waeup.sirp.csvfile import CSVFile >>> class ICustomCSVFile(ICSVFile): ... """A marker for custom CSV files.""" >>> class CustomCSVFile(CSVFile): ... required_fields = ['somecol', 'othercol'] ... grok.implements(ICustomCSVFile) ... grok.provides(ICustomCSVFile) >>> grok.testing.grok_component('CustomCSVFile', CustomCSVFile) True The special things here are: * Derive from :class:`CSVFile` :func:`getCSVFile` looks only for classes that are derived from :class:`CSVFile`. So if you want your wrapper to be found by this function, derive from :class:`CSVFile`. As :class:`CSVFile` is an adapter, also our custom wrapper will become one (adapting strings): >>> ICustomCSVFile(path) Traceback (most recent call last): ... TypeError: Missing columns in CSV file: ['somecol', 'othercol'] If our input file provides the correct columns, it will work: >>> custompath = 'mycustom.csv' >>> open(custompath, 'wb').write( ... """somecol,othercol,thirdcol ... dataitem1,dataitem2,dataitem3 ... """) >>> ICustomCSVFile(custompath) * Provide and implement a custom interface A custom CSV file wrapper should provide and implement an own interface. Otherwise it could not be required by other components explicitly. We have to provide *and* implement the custom interface because otherwise instances would not implement the required interface (maybe due to a flaw in `grok`/`martian`). Common Use Cases ================ Get a wrapper for a certain type of CSV file -------------------------------------------- The type of a :class:`CSVFile` is determined by the interfaces it provides. If we want to get a wrapper that also guarantees to support certain fields (or None), then we already know about the wanted type. We create a file that does not have a 'special_col' field: >>> path = 'mycsvfile.csv' >>> open(path, 'wb').write( ... """col1,col2 ... dataitem1,dataitem2 ... item3,item4 ... """) Now we create a wrapper, that requires that field: >>> from waeup.sirp.csvfile.interfaces import ICSVFile >>> from waeup.sirp.csvfile import CSVFile >>> class ICSVFileWithSpecialCol(ICSVFile): ... """A CSV file that contains a 'special_col' column. ... """ >>> class CSVFileWithSpecialCol(CSVFile): ... required_fields = ['special_col'] ... grok.provides(ICSVFileWithSpecialCol) >>> grok.testing.grok_component('CSVFileWithSpecialCol', ... CSVFileWithSpecialCol) True If we want to get a wrapper for that kind of file: >>> ICSVFileWithSpecialCol(path) Traceback (most recent call last): ... TypeError: Missing columns in CSV file: ['special_col'] If the required col is available, however: >>> path2 = 'mycsvfile2.csv' >>> open(path2, 'wb').write( ... """col1,col2,special_col ... dataitem1,dataitem2,dataitem3 ... item4,item5,item6 ... """) >>> ICSVFileWithSpecialCol(path2) Build an importer framework --------------------------- We can also build an importer framework with CSV file support using the components described above. To model this, we start with two files, we will import lateron: >>> path = 'mycsvfile.csv' >>> open(path, 'wb').write( ... """col1,col2 ... dataitem1a,dataitem2a ... item3a,item4a ... """) >>> path2 = 'mycsvfile2.csv' >>> open(path2, 'wb').write( ... """col1,col3 ... dataitem1b,dataitem2b ... item3b,item4b ... """) Then we create two receivers for CSV file data: >>> from zope.interface import Interface >>> class IReceiver1(Interface): ... """A CSV data receiver.""" >>> class IReceiver2(Interface): ... """Another CSV data receiver.""" >>> class Receiver1(object): ... grok.implements(IReceiver1) ... def receive(self, data): ... print "Receiver1 received: ", data >>> class Receiver2(object): ... grok.implements(IReceiver2) ... def receive(self, data): ... print "Receiver2 received: ", data If we want to be sure, that a wrapper requires these fields, we ask for ICSVFile12: >>> wrapper1 = ICSVFile12(path) >>> wrapper1 We could not use this interface (adapter) with the other CSV file: >>> wrapper2 = ICSVFile12(path2) Traceback (most recent call last): ... TypeError: Missing columns in CSV file: ['col2'] The last step is to build a bridge between the receivers and the sources. We call it connector or importer here: >>> class IImporter(Interface): ... """Import sources to receivers.""" >>> class IImporter12(IImporter): ... """Imports ICSVFile12 data into IReceiver1 objects.""" >>> class Importer12(grok.MultiAdapter): ... grok.adapts(ICSVFile12, IReceiver1) ... grok.implements(IImporter12) ... def __init__(self, csvfile, receiver): ... self.csvfile = csvfile ... self.receiver = receiver ... def doImport(self): ... self.receiver.receive(self.csvfile.getData()) >>> grok.testing.grok_component('Importer12', Importer12) True We can create an importer if we know the type of CSV file: >>> myrecv = Receiver1() >>> myfile = ICSVFile12(path) >>> myfile >>> ICSVFile12.providedBy(myfile) True >>> from zope.component import getMultiAdapter >>> myimporter = getMultiAdapter((myfile, myrecv), IImporter12) >>> myimporter We can also create an importer without knowing the type of CSV file before, using the :func:`getCSVFile` function: >>> from waeup.sirp.csvfile import getCSVFile >>> myfile = getCSVFile(path) >>> ICSVFile12.providedBy(myfile) True Apparently, the CSVFileWrapper getter knows, that a CSVFile12 suits the contents of our file best. We did not specify what type of file wrapper we want. Getting an importer now is easy: >>> myimporter = getMultiAdapter((myfile, myrecv), IImporter12) >>> myimporter Clean up: >>> import os >>> os.unlink(path) >>> os.unlink(path2) >>> os.unlink(nocsvpath) >>> os.unlink(custompath)