Context navigation

source: waeup/branches/ulif-namespace/src/waeup/sirp/datacenter.txt @ 5103

Last change on this file since 5103 was 4920, checked in by uli, 15 years ago
Make unit tests run again with the new package layout.
File size: 13.7 KB

Rev	Line
[4169]	1	WAeUP Data Center
	2	*****************
	3
	4	The WAeUP data center cares for managing CSV files and importing then.
	5
	6	:Test-Layer: unit
	7
	8	Creating a data center
	9	======================
	10
	11	A data center can be created easily:
	12
[4920]	13	>>> from waeup.sirp.datacenter import DataCenter
[4169]	14	>>> mydatacenter = DataCenter()
	15	>>> mydatacenter
[4920]	16	<waeup.sirp.datacenter.DataCenter object at 0x...>
[4169]	17
	18	Each data center has a location in file system where files are stored:
	19
	20	>>> storagepath = mydatacenter.storage
	21	>>> storagepath
[4920]	22	'/.../waeup/sirp/files'
[4169]	23
	24
[4174]	25	Managing the storage path
	26	-------------------------
	27
	28	We can set another storage path:
	29
	30	>>> import os
	31	>>> os.mkdir('newlocation')
	32	>>> newpath = os.path.abspath('newlocation')
	33	>>> mydatacenter.setStoragePath(newpath)
[4191]	34	[]
[4174]	35
[4191]	36	The result here is a list of filenames, that could not be
	37	copied. Luckily, this list is empty.
	38
[4174]	39	When we set a new storage path, we can tell to move all files in the
	40	old location to the new one. To see this feature in action, we first
	41	have to put a file into the old location:
	42
	43	>>> open(os.path.join(newpath, 'myfile.txt'), 'wb').write('hello')
	44
	45	Now we can set a new location and the file will be copied:
	46
	47	>>> verynewpath = os.path.abspath('verynewlocation')
	48	>>> os.mkdir(verynewpath)
	49
	50	>>> mydatacenter.setStoragePath(verynewpath, move=True)
[4191]	51	[]
	52
[4174]	53	>>> storagepath = mydatacenter.storage
	54	>>> 'myfile.txt' in os.listdir(verynewpath)
	55	True
	56
	57	We remove the created file to have a clean testing environment for
	58	upcoming examples:
	59
	60	>>> os.unlink(os.path.join(storagepath, 'myfile.txt'))
	61
[4169]	62	Uploading files
	63	===============
	64
	65	We can get a list of files stored in that location:
	66
	67	>>> mydatacenter.getFiles()
	68	[]
	69
	70	Let's put some file in the storage:
	71
	72	>>> import os
	73	>>> filepath = os.path.join(storagepath, 'data.csv')
	74	>>> open(filepath, 'wb').write('Some Content\n')
	75
	76	Now we can find a file:
	77
	78	>>> mydatacenter.getFiles()
[4920]	79	[<waeup.sirp.datacenter.DataCenterFile object at 0x...>]
[4169]	80
	81	As we can see, the actual file is wrapped by a convenience wrapper,
	82	that enables us to fetch some data about the file. The data returned
	83	is formatted in strings, so that it can easily be put into output
	84	pages:
	85
	86	>>> datafile = mydatacenter.getFiles()[0]
	87	>>> datafile.getSize()
	88	'13 bytes'
	89
	90	>>> datafile.getDate() # Nearly current datetime...
	91	'...'
	92
	93	Clean up:
	94
[4174]	95	>>> import shutil
	96	>>> shutil.rmtree(newpath)
	97	>>> shutil.rmtree(verynewpath)
[4169]	98
	99
[4897]	100	Distributing processed files
	101	============================
	102
	103	When files were processed by a batch processor, we can put the
	104	resulting files into desired destinations.
	105
	106	We recreate the datacenter root in case it is missing:
	107
	108	>>> import os
	109	>>> dc_root = mydatacenter.storage
	110	>>> fin_dir = os.path.join(dc_root, 'finished')
	111	>>> unfin_dir = os.path.join(dc_root, 'unfinished')
	112
	113	>>> def recreate_dc_storage():
	114	... if os.path.exists(dc_root):
	115	... shutil.rmtree(dc_root)
	116	... os.mkdir(dc_root)
	117	... mydatacenter.setStoragePath(mydatacenter.storage)
	118	>>> recreate_dc_storage()
	119
	120	We define a function that creates a set of faked result files:
	121
	122	>>> import os
	123	>>> import tempfile
	124	>>> def create_fake_results(source_basename, create_pending=True):
	125	... tmp_dir = tempfile.mkdtemp()
	126	... src = os.path.join(dc_root, source_basename)
	127	... pending_src = None
	128	... if create_pending:
	129	... pending_src = os.path.join(tmp_dir, 'mypendingsource.csv')
	130	... finished_src = os.path.join(tmp_dir, 'myfinishedsource.csv')
	131	... for path in (src, pending_src, finished_src):
	132	... if path is not None:
	133	... open(path, 'wb').write('blah')
	134	... return tmp_dir, src, finished_src, pending_src
	135
	136	Now we can create the set of result files, that typically come after a
	137	successful processing of a regular source:
	138
	139	Now we can try to distribute those files. Let's start with a source
	140	file, that was processed successfully:
	141
	142	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
	143	... 'mysource.csv', create_pending=False)
	144	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
	145	... pending_src)
	146	>>> sorted(os.listdir(dc_root))
	147	['finished', 'logs', 'unfinished']
	148
	149	>>> sorted(os.listdir(fin_dir))
	150	['mysource.csv', 'mysource.finished.csv']
	151
	152	>>> sorted(os.listdir(unfin_dir))
	153	[]
	154
[4907]	155	The created dir will be removed for us by the datacenter. This way we
	156	can assured, that less temporary dirs are left hanging around:
[4897]	157
[4907]	158	>>> os.path.exists(tmp_dir)
	159	False
	160
[4897]	161	The root dir is empty, while the original file and the file containing
	162	all processed data were moved to'finished/'.
	163
	164	Now we restart, but this time we fake an erranous action:
	165
	166	>>> recreate_dc_storage()
	167	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
	168	... 'mysource.csv')
	169	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
	170	... pending_src)
	171	>>> sorted(os.listdir(dc_root))
	172	['finished', 'logs', 'mysource.pending.csv', 'unfinished']
	173
	174	>>> sorted(os.listdir(fin_dir))
	175	['mysource.finished.csv']
	176
	177	>>> sorted(os.listdir(unfin_dir))
	178	['mysource.csv']
	179
	180	While the original source was moved to the 'unfinished' dir, the
	181	pending file went to the root and the set of already processed items
	182	are stored in finished/.
	183
	184	We fake processing the pending file and assume that everything went
	185	well this time:
	186
	187	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
	188	... 'mysource.pending.csv', create_pending=False)
	189	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
	190	... pending_src)
	191
	192	>>> sorted(os.listdir(dc_root))
	193	['finished', 'logs', 'unfinished']
	194
	195	>>> sorted(os.listdir(fin_dir))
	196	['mysource.csv', 'mysource.finished.csv']
	197
	198	>>> sorted(os.listdir(unfin_dir))
	199	[]
	200
	201	The result is the same as in the first case shown above.
	202
	203	We restart again, but this time we fake several non-working imports in
	204	a row.
	205
	206	We start with a faulty start-import:
	207
	208	>>> recreate_dc_storage()
	209	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
	210	... 'mysource.csv')
	211	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
	212	... pending_src)
	213
	214	We try to process the pending file, which fails again:
	215
	216	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
	217	... 'mysource.pending.csv')
	218	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
	219	... pending_src)
	220
	221	We try to process the new pending file:
	222
	223	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
	224	... 'mysource.pending.csv')
	225	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
	226	... pending_src)
	227
	228	>>> sorted(os.listdir(dc_root))
	229	['finished', 'logs', 'mysource.pending.csv', 'unfinished']
	230
	231	>>> sorted(os.listdir(fin_dir))
	232	['mysource.finished.csv']
	233
	234	>>> sorted(os.listdir(unfin_dir))
	235	['mysource.csv']
	236
	237	Finally, we process the pending file and everything works:
	238
	239	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
	240	... 'mysource.pending.csv', create_pending=False)
	241	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
	242	... pending_src)
	243
	244	>>> sorted(os.listdir(dc_root))
	245	['finished', 'logs', 'unfinished']
	246
	247	>>> sorted(os.listdir(fin_dir))
	248	['mysource.csv', 'mysource.finished.csv']
	249
	250	>>> sorted(os.listdir(unfin_dir))
	251	[]
	252
	253	The root dir is empty (contains no input files) and only the files in
	254	finished-subdirectory remain.
	255
	256	Clean up:
	257
	258	>>> shutil.rmtree(verynewpath)
	259
[4169]	260	Handling imports
	261	================
	262
	263	Data centers can find objects ready for CSV imports and associate
	264	appropriate importers with them.
	265
[4172]	266	Getting importers
	267	-----------------
	268
[4169]	269	To do so, data centers look up their parents for the nearest ancestor,
	270	that implements `ICSVDataReceivers` and grab all attributes, that
	271	provide some importer.
	272
	273	We therefore have to setup a proper scenario first.
	274
	275	We start by creating a simple thing that is ready for receiving CSV
	276	data:
	277
	278	>>> class MyCSVReceiver(object):
	279	... pass
	280
	281	Then we create a container for such a CSV receiver:
	282
	283	>>> import grok
[4920]	284	>>> from waeup.sirp.interfaces import ICSVDataReceivers
	285	>>> from waeup.sirp.datacenter import DataCenter
[4169]	286	>>> class SomeContainer(grok.Container):
	287	... grok.implements(ICSVDataReceivers)
	288	... def __init__(self):
	289	... self.some_receiver = MyCSVReceiver()
	290	... self.other_receiver = MyCSVReceiver()
	291	... self.datacenter = DataCenter()
	292
	293	By implementing `ICSVDataReceivers`, a pure marker interface, we
	294	indicate, that we want instances of this class to be searched for CSV
	295	receivers.
	296
	297	This root container has two CSV receivers.
	298
	299	The datacenter is also an attribute of our root container.
	300
	301	Before we can go into action, we also need an importer, that is able
	302	to import data into instances of MyCSVReceiver:
	303
[4920]	304	>>> from waeup.sirp.csvfile.interfaces import ICSVFile
	305	>>> from waeup.sirp.interfaces import IWAeUPCSVImporter
	306	>>> from waeup.sirp.utils.importexport import CSVImporter
[4169]	307	>>> class MyCSVImporter(CSVImporter):
[4225]	308	... grok.adapts(ICSVFile, MyCSVReceiver)
	309	... grok.provides(IWAeUPCSVImporter)
[4169]	310	... datatype = u'My Stuff'
	311	... def doImport(self, filepath, clear_old_data=True,
	312	... overwrite=True):
	313	... print "Data imported!"
	314
	315	We grok the components to get the importer (which is actually an
	316	adapter) registered with the component architechture:
	317
	318	>>> grok.testing.grok('waeup')
	319	>>> grok.testing.grok_component('MyCSVImporter', MyCSVImporter)
	320	True
	321
	322	Now we can create an instance of `SomeContainer`:
	323
	324	>>> mycontainer = SomeContainer()
	325
	326	As we are not creating real sites and the objects are 'placeless' from
	327	the ZODB point of view, we fake a location by telling the datacenter,
	328	that its parent is the container:
	329
	330	>>> mycontainer.datacenter.__parent__ = mycontainer
	331	>>> datacenter = mycontainer.datacenter
	332
	333	When a datacenter is stored in the ZODB, this step will happen
	334	automatically.
	335
[4574]	336	Before we can go on, we have to set a usable path where we can store
	337	files without doing harm:
	338
	339	>>> os.mkdir('filestore')
	340	>>> filestore = os.path.abspath('filestore')
	341	>>> datacenter.setStoragePath(filestore)
	342	[]
	343
	344	Furthermore we must create a file for possible import, as we will get
	345	only importers, for which also an importable file is available:
	346
	347	>>> import os
	348	>>> filepath = os.path.join(datacenter.storage, 'mydata.csv')
	349	>>> open(filepath, 'wb').write("""col1,col2
	350	... 'ATerm','Something'
	351	... """)
	352
[4169]	353	The datacenter is now able to find the CSV receivers in its parents:
	354
	355	>>> datacenter.getImporters()
	356	[<MyCSVImporter object at 0x...>, <MyCSVImporter object at 0x...>]
	357
	358
	359	Imports with the WAeUP portal
	360	-----------------------------
	361
[4225]	362	The examples above looks complicated, but this is the price for
[4169]	363	modularity. If you create a new container type, you can define an
	364	importer and it will be used automatically by other components.
	365
	366	In the WAeUP portal the only component that actually provides CSV data
	367	importables is the `University` object.
[4172]	368
	369
	370	Getting imports (not: importers)
	371	--------------------------------
	372
[4574]	373	We can get 'imports':
[4172]	374
	375	>>> datacenter.getPossibleImports()
	376	[(<...DataCenterFile object at 0x...>,
[4176]	377	[(<MyCSVImporter object at 0x...>, '...'),
	378	(<MyCSVImporter object at 0x...>, '...')])]
[4172]	379
	380	As we can see, an import is defined here as a tuple of a
[4174]	381	DataCenterFile and a list of available importers with an associated
	382	data receiver (the thing where the data should go to).
[4172]	383
[4176]	384	The data receiver is given as an ZODB object id (if the data receiver
	385	is persistent) or a simple id (if it is not).
	386
[4172]	387	Clean up:
	388
[4574]	389	>>> import shutil
	390	>>> shutil.rmtree(filestore)
[4185]	391
	392
	393	Data center helpers
	394	===================
	395
	396	Data centers provide several helper methods to make their usage more
	397	convenient.
	398
	399
	400	Receivers and receiver ids
	401	--------------------------
	402
	403	As already mentioned above, imports are defined as triples containing
	404
	405	* a file to import,
	406
	407	* an importer to do the import and
	408
	409	* an object, which should be updated by the data file.
	410
	411	The latter normally is some kind of container, like a faculty
	412	container or similar. This is what we call a ``receiver`` as it
	413	receives the data from the file via the importer.
	414
	415	The datacenter finds receivers by looking up its parents for a
	416	component, that implements `ICSVDataReceivers` and scanning that
	417	component for attributes, that can be adapted to `ICSVImporter`.
	418
	419	I.e., once found an `ICSVDataReceiver` parent, the datacenter gets all
	420	importers that can be applied to attributes of this component. For
	421	each attribute there can be at most one importer.
	422
	423	When building the importer list for a certain file, we also check,
	424	that the headers of the file comply with what the respective importers
	425	expect. So, if a file contains broken headers, the file won't be
	426	offered for import at all.
	427
	428	The contexts of the found importers then build our list of available
	429	receivers. This means also, that for each receiver provided by the
	430	datacenter, there is also an importer available.
	431
	432	If for a potential receiver no importer can be found, this receiver
	433	will be skipped.
	434
	435	As one type of importer might be able to serve several receivers, we
	436	also have to provide a unique id for each receiver. This is, where
	437	``receiver ids`` come into play.
	438
	439	Receiver ids of objects are determined as
	440
	441	* the ZODB oid of the object if the object is persistent
	442
	443	* the result of id(obj) otherwise.
	444
	445	The value won this way is a long integer which we turn into a
	446	string. If the value was get from the ZODB oid, we also prepend it
	447	with a ``z`` to avoid any clash with non-ZODB objects (they might
	448	deliver the same id, although this is very unlikely).

Note: See TracBrowser for help on using the repository browser.

Download in other formats: