Context navigation

source: waeup/branches/ulif-namespace/src/sirp/datacenter.txt @ 4915

Last change on this file since 4915 was 4907, checked in by uli, 15 years ago
Update tests.
File size: 13.6 KB

Line
1	WAeUP Data Center
2	*****************
3
4	The WAeUP data center cares for managing CSV files and importing then.
5
6	:Test-Layer: unit
7
8	Creating a data center
9	======================
10
11	A data center can be created easily:
12
13	>>> from waeup.datacenter import DataCenter
14	>>> mydatacenter = DataCenter()
15	>>> mydatacenter
16	<waeup.datacenter.DataCenter object at 0x...>
17
18	Each data center has a location in file system where files are stored:
19
20	>>> storagepath = mydatacenter.storage
21	>>> storagepath
22	'/.../src/waeup/files'
23
24
25	Managing the storage path
26	-------------------------
27
28	We can set another storage path:
29
30	>>> import os
31	>>> os.mkdir('newlocation')
32	>>> newpath = os.path.abspath('newlocation')
33	>>> mydatacenter.setStoragePath(newpath)
34	[]
35
36	The result here is a list of filenames, that could not be
37	copied. Luckily, this list is empty.
38
39	When we set a new storage path, we can tell to move all files in the
40	old location to the new one. To see this feature in action, we first
41	have to put a file into the old location:
42
43	>>> open(os.path.join(newpath, 'myfile.txt'), 'wb').write('hello')
44
45	Now we can set a new location and the file will be copied:
46
47	>>> verynewpath = os.path.abspath('verynewlocation')
48	>>> os.mkdir(verynewpath)
49
50	>>> mydatacenter.setStoragePath(verynewpath, move=True)
51	[]
52
53	>>> storagepath = mydatacenter.storage
54	>>> 'myfile.txt' in os.listdir(verynewpath)
55	True
56
57	We remove the created file to have a clean testing environment for
58	upcoming examples:
59
60	>>> os.unlink(os.path.join(storagepath, 'myfile.txt'))
61
62	Uploading files
63	===============
64
65	We can get a list of files stored in that location:
66
67	>>> mydatacenter.getFiles()
68	[]
69
70	Let's put some file in the storage:
71
72	>>> import os
73	>>> filepath = os.path.join(storagepath, 'data.csv')
74	>>> open(filepath, 'wb').write('Some Content\n')
75
76	Now we can find a file:
77
78	>>> mydatacenter.getFiles()
79	[<waeup.datacenter.DataCenterFile object at 0x...>]
80
81	As we can see, the actual file is wrapped by a convenience wrapper,
82	that enables us to fetch some data about the file. The data returned
83	is formatted in strings, so that it can easily be put into output
84	pages:
85
86	>>> datafile = mydatacenter.getFiles()[0]
87	>>> datafile.getSize()
88	'13 bytes'
89
90	>>> datafile.getDate() # Nearly current datetime...
91	'...'
92
93	Clean up:
94
95	>>> import shutil
96	>>> shutil.rmtree(newpath)
97	>>> shutil.rmtree(verynewpath)
98
99
100	Distributing processed files
101	============================
102
103	When files were processed by a batch processor, we can put the
104	resulting files into desired destinations.
105
106	We recreate the datacenter root in case it is missing:
107
108	>>> import os
109	>>> dc_root = mydatacenter.storage
110	>>> fin_dir = os.path.join(dc_root, 'finished')
111	>>> unfin_dir = os.path.join(dc_root, 'unfinished')
112
113	>>> def recreate_dc_storage():
114	... if os.path.exists(dc_root):
115	... shutil.rmtree(dc_root)
116	... os.mkdir(dc_root)
117	... mydatacenter.setStoragePath(mydatacenter.storage)
118	>>> recreate_dc_storage()
119
120	We define a function that creates a set of faked result files:
121
122	>>> import os
123	>>> import tempfile
124	>>> def create_fake_results(source_basename, create_pending=True):
125	... tmp_dir = tempfile.mkdtemp()
126	... src = os.path.join(dc_root, source_basename)
127	... pending_src = None
128	... if create_pending:
129	... pending_src = os.path.join(tmp_dir, 'mypendingsource.csv')
130	... finished_src = os.path.join(tmp_dir, 'myfinishedsource.csv')
131	... for path in (src, pending_src, finished_src):
132	... if path is not None:
133	... open(path, 'wb').write('blah')
134	... return tmp_dir, src, finished_src, pending_src
135
136	Now we can create the set of result files, that typically come after a
137	successful processing of a regular source:
138
139	Now we can try to distribute those files. Let's start with a source
140	file, that was processed successfully:
141
142	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
143	... 'mysource.csv', create_pending=False)
144	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
145	... pending_src)
146	>>> sorted(os.listdir(dc_root))
147	['finished', 'logs', 'unfinished']
148
149	>>> sorted(os.listdir(fin_dir))
150	['mysource.csv', 'mysource.finished.csv']
151
152	>>> sorted(os.listdir(unfin_dir))
153	[]
154
155	The created dir will be removed for us by the datacenter. This way we
156	can assured, that less temporary dirs are left hanging around:
157
158	>>> os.path.exists(tmp_dir)
159	False
160
161	The root dir is empty, while the original file and the file containing
162	all processed data were moved to'finished/'.
163
164	Now we restart, but this time we fake an erranous action:
165
166	>>> recreate_dc_storage()
167	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
168	... 'mysource.csv')
169	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
170	... pending_src)
171	>>> sorted(os.listdir(dc_root))
172	['finished', 'logs', 'mysource.pending.csv', 'unfinished']
173
174	>>> sorted(os.listdir(fin_dir))
175	['mysource.finished.csv']
176
177	>>> sorted(os.listdir(unfin_dir))
178	['mysource.csv']
179
180	While the original source was moved to the 'unfinished' dir, the
181	pending file went to the root and the set of already processed items
182	are stored in finished/.
183
184	We fake processing the pending file and assume that everything went
185	well this time:
186
187	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
188	... 'mysource.pending.csv', create_pending=False)
189	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
190	... pending_src)
191
192	>>> sorted(os.listdir(dc_root))
193	['finished', 'logs', 'unfinished']
194
195	>>> sorted(os.listdir(fin_dir))
196	['mysource.csv', 'mysource.finished.csv']
197
198	>>> sorted(os.listdir(unfin_dir))
199	[]
200
201	The result is the same as in the first case shown above.
202
203	We restart again, but this time we fake several non-working imports in
204	a row.
205
206	We start with a faulty start-import:
207
208	>>> recreate_dc_storage()
209	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
210	... 'mysource.csv')
211	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
212	... pending_src)
213
214	We try to process the pending file, which fails again:
215
216	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
217	... 'mysource.pending.csv')
218	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
219	... pending_src)
220
221	We try to process the new pending file:
222
223	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
224	... 'mysource.pending.csv')
225	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
226	... pending_src)
227
228	>>> sorted(os.listdir(dc_root))
229	['finished', 'logs', 'mysource.pending.csv', 'unfinished']
230
231	>>> sorted(os.listdir(fin_dir))
232	['mysource.finished.csv']
233
234	>>> sorted(os.listdir(unfin_dir))
235	['mysource.csv']
236
237	Finally, we process the pending file and everything works:
238
239	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
240	... 'mysource.pending.csv', create_pending=False)
241	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
242	... pending_src)
243
244	>>> sorted(os.listdir(dc_root))
245	['finished', 'logs', 'unfinished']
246
247	>>> sorted(os.listdir(fin_dir))
248	['mysource.csv', 'mysource.finished.csv']
249
250	>>> sorted(os.listdir(unfin_dir))
251	[]
252
253	The root dir is empty (contains no input files) and only the files in
254	finished-subdirectory remain.
255
256	Clean up:
257
258	>>> shutil.rmtree(verynewpath)
259
260	Handling imports
261	================
262
263	Data centers can find objects ready for CSV imports and associate
264	appropriate importers with them.
265
266	Getting importers
267	-----------------
268
269	To do so, data centers look up their parents for the nearest ancestor,
270	that implements `ICSVDataReceivers` and grab all attributes, that
271	provide some importer.
272
273	We therefore have to setup a proper scenario first.
274
275	We start by creating a simple thing that is ready for receiving CSV
276	data:
277
278	>>> class MyCSVReceiver(object):
279	... pass
280
281	Then we create a container for such a CSV receiver:
282
283	>>> import grok
284	>>> from waeup.interfaces import ICSVDataReceivers
285	>>> from waeup.datacenter import DataCenter
286	>>> class SomeContainer(grok.Container):
287	... grok.implements(ICSVDataReceivers)
288	... def __init__(self):
289	... self.some_receiver = MyCSVReceiver()
290	... self.other_receiver = MyCSVReceiver()
291	... self.datacenter = DataCenter()
292
293	By implementing `ICSVDataReceivers`, a pure marker interface, we
294	indicate, that we want instances of this class to be searched for CSV
295	receivers.
296
297	This root container has two CSV receivers.
298
299	The datacenter is also an attribute of our root container.
300
301	Before we can go into action, we also need an importer, that is able
302	to import data into instances of MyCSVReceiver:
303
304	>>> from waeup.csvfile.interfaces import ICSVFile
305	>>> from waeup.interfaces import IWAeUPCSVImporter
306	>>> from waeup.utils.importexport import CSVImporter
307	>>> class MyCSVImporter(CSVImporter):
308	... grok.adapts(ICSVFile, MyCSVReceiver)
309	... grok.provides(IWAeUPCSVImporter)
310	... datatype = u'My Stuff'
311	... def doImport(self, filepath, clear_old_data=True,
312	... overwrite=True):
313	... print "Data imported!"
314
315	We grok the components to get the importer (which is actually an
316	adapter) registered with the component architechture:
317
318	>>> grok.testing.grok('waeup')
319	>>> grok.testing.grok_component('MyCSVImporter', MyCSVImporter)
320	True
321
322	Now we can create an instance of `SomeContainer`:
323
324	>>> mycontainer = SomeContainer()
325
326	As we are not creating real sites and the objects are 'placeless' from
327	the ZODB point of view, we fake a location by telling the datacenter,
328	that its parent is the container:
329
330	>>> mycontainer.datacenter.__parent__ = mycontainer
331	>>> datacenter = mycontainer.datacenter
332
333	When a datacenter is stored in the ZODB, this step will happen
334	automatically.
335
336	Before we can go on, we have to set a usable path where we can store
337	files without doing harm:
338
339	>>> os.mkdir('filestore')
340	>>> filestore = os.path.abspath('filestore')
341	>>> datacenter.setStoragePath(filestore)
342	[]
343
344	Furthermore we must create a file for possible import, as we will get
345	only importers, for which also an importable file is available:
346
347	>>> import os
348	>>> filepath = os.path.join(datacenter.storage, 'mydata.csv')
349	>>> open(filepath, 'wb').write("""col1,col2
350	... 'ATerm','Something'
351	... """)
352
353	The datacenter is now able to find the CSV receivers in its parents:
354
355	>>> datacenter.getImporters()
356	[<MyCSVImporter object at 0x...>, <MyCSVImporter object at 0x...>]
357
358
359	Imports with the WAeUP portal
360	-----------------------------
361
362	The examples above looks complicated, but this is the price for
363	modularity. If you create a new container type, you can define an
364	importer and it will be used automatically by other components.
365
366	In the WAeUP portal the only component that actually provides CSV data
367	importables is the `University` object.
368
369
370	Getting imports (not: importers)
371	--------------------------------
372
373	We can get 'imports':
374
375	>>> datacenter.getPossibleImports()
376	[(<...DataCenterFile object at 0x...>,
377	[(<MyCSVImporter object at 0x...>, '...'),
378	(<MyCSVImporter object at 0x...>, '...')])]
379
380	As we can see, an import is defined here as a tuple of a
381	DataCenterFile and a list of available importers with an associated
382	data receiver (the thing where the data should go to).
383
384	The data receiver is given as an ZODB object id (if the data receiver
385	is persistent) or a simple id (if it is not).
386
387	Clean up:
388
389	>>> import shutil
390	>>> shutil.rmtree(filestore)
391
392
393	Data center helpers
394	===================
395
396	Data centers provide several helper methods to make their usage more
397	convenient.
398
399
400	Receivers and receiver ids
401	--------------------------
402
403	As already mentioned above, imports are defined as triples containing
404
405	* a file to import,
406
407	* an importer to do the import and
408
409	* an object, which should be updated by the data file.
410
411	The latter normally is some kind of container, like a faculty
412	container or similar. This is what we call a ``receiver`` as it
413	receives the data from the file via the importer.
414
415	The datacenter finds receivers by looking up its parents for a
416	component, that implements `ICSVDataReceivers` and scanning that
417	component for attributes, that can be adapted to `ICSVImporter`.
418
419	I.e., once found an `ICSVDataReceiver` parent, the datacenter gets all
420	importers that can be applied to attributes of this component. For
421	each attribute there can be at most one importer.
422
423	When building the importer list for a certain file, we also check,
424	that the headers of the file comply with what the respective importers
425	expect. So, if a file contains broken headers, the file won't be
426	offered for import at all.
427
428	The contexts of the found importers then build our list of available
429	receivers. This means also, that for each receiver provided by the
430	datacenter, there is also an importer available.
431
432	If for a potential receiver no importer can be found, this receiver
433	will be skipped.
434
435	As one type of importer might be able to serve several receivers, we
436	also have to provide a unique id for each receiver. This is, where
437	``receiver ids`` come into play.
438
439	Receiver ids of objects are determined as
440
441	* the ZODB oid of the object if the object is persistent
442
443	* the result of id(obj) otherwise.
444
445	The value won this way is a long integer which we turn into a
446	string. If the value was get from the ZODB oid, we also prepend it
447	with a ``z`` to avoid any clash with non-ZODB objects (they might
448	deliver the same id, although this is very unlikely).

Note: See TracBrowser for help on using the repository browser.

Download in other formats: