Context navigation

datacenter.txt @ 16262

Last change on this file since 16262 was 15416, checked in by Henrik Bettermann, 6 years ago
Backup deleted graduated student data somewhere else to ease graduated student data migration.
File size: 8.8 KB

Line
1	Data Center
2	***********
3
4	The Kofa data center cares for managing CSV files and importing then.
5
6	.. :doctest:
7	.. :layer: waeup.kofa.testing.KofaUnitTestLayer
8
9	Creating a data center
10	======================
11
12	A data center can be created easily:
13
14	>>> from waeup.kofa.datacenter import DataCenter
15	>>> mydatacenter = DataCenter()
16	>>> mydatacenter
17	<waeup.kofa.datacenter.DataCenter object at 0x...>
18
19	Each data center has a location in file system where files are stored:
20
21	>>> storagepath = mydatacenter.storage
22	>>> storagepath
23	'/tmp/tmp...'
24
25	Beside other things it provides two locations to put data of deleted
26	items into:
27
28	>>> import os
29	>>> del_path = mydatacenter.deleted_path
30	>>> os.path.isdir(del_path)
31	True
32	>>> grad_path = mydatacenter.graduated_path
33	>>> os.path.isdir(grad_path)
34	True
35
36	Overall it complies with the `IDataCenter` interface:
37
38	>>> from zope.interface import verify
39	>>> from waeup.kofa.interfaces import IDataCenter
40	>>> verify.verifyObject(IDataCenter, DataCenter() )
41	True
42
43	>>> verify.verifyClass(IDataCenter, DataCenter)
44	True
45
46	Managing the storage path
47	=========================
48
49	We can set another storage path:
50
51	>>> import os
52	>>> os.mkdir('newlocation')
53	>>> newpath = os.path.abspath('newlocation')
54	>>> mydatacenter.setStoragePath(newpath)
55	[]
56
57	The result here is a list of filenames, that could not be
58	copied. Luckily, this list is empty.
59
60	When we set a new storage path, we can tell to move all files in the
61	old location to the new one. To see this feature in action, we first
62	have to put a file into the old location:
63
64	>>> open(os.path.join(newpath, 'myfile.txt'), 'wb').write('hello')
65
66	Now we can set a new location and the file will be copied:
67
68	>>> verynewpath = os.path.abspath('verynewlocation')
69	>>> os.mkdir(verynewpath)
70
71	>>> mydatacenter.setStoragePath(verynewpath, move=True)
72	[]
73
74	>>> storagepath = mydatacenter.storage
75	>>> 'myfile.txt' in os.listdir(verynewpath)
76	True
77
78	We remove the created file to have a clean testing environment for
79	upcoming examples:
80
81	>>> os.unlink(os.path.join(storagepath, 'myfile.txt'))
82
83	Uploading files
84	===============
85
86	We can get a list of files stored in that location:
87
88	>>> mydatacenter.getPendingFiles()
89	[]
90
91	Let's put some file in the storage:
92
93	>>> import os
94	>>> filepath = os.path.join(storagepath, 'data.csv')
95	>>> open(filepath, 'wb').write('Some Content\n')
96
97	Now we can find a file:
98
99	>>> mydatacenter.getPendingFiles()
100	[<waeup.kofa.datacenter.DataCenterFile object at 0x...>]
101
102	As we can see, the actual file is wrapped by a convenience wrapper,
103	that enables us to fetch some data about the file. The data returned
104	is formatted in strings, so that it can easily be put into output
105	pages:
106
107	>>> datafile = mydatacenter.getPendingFiles()[0]
108	>>> datafile.getSize()
109	'13 bytes'
110
111	>>> datafile.getDate() # Nearly current datetime...
112	'...'
113
114	Clean up:
115
116	>>> import shutil
117	>>> shutil.rmtree(newpath)
118	>>> shutil.rmtree(verynewpath)
119
120
121	Distributing processed files
122	============================
123
124	When files were processed by a batch processor, we can put the
125	resulting files into desired destinations.
126
127	We recreate the datacenter root in case it is missing:
128
129	>>> import os
130	>>> dc_root = mydatacenter.storage
131	>>> fin_dir = os.path.join(dc_root, 'finished')
132	>>> unfin_dir = os.path.join(dc_root, 'unfinished')
133
134	>>> def recreate_dc_storage():
135	... if os.path.exists(dc_root):
136	... shutil.rmtree(dc_root)
137	... os.mkdir(dc_root)
138	... mydatacenter.setStoragePath(mydatacenter.storage)
139	>>> recreate_dc_storage()
140
141	We define a function that creates a set of faked result files:
142
143	>>> import os
144	>>> import tempfile
145	>>> def create_fake_results(source_basename, create_pending=True):
146	... tmp_dir = tempfile.mkdtemp()
147	... src = os.path.join(dc_root, source_basename)
148	... pending_src = None
149	... if create_pending:
150	... pending_src = os.path.join(tmp_dir, 'mypendingsource.csv')
151	... finished_src = os.path.join(tmp_dir, 'myfinishedsource.csv')
152	... for path in (src, pending_src, finished_src):
153	... if path is not None:
154	... open(path, 'wb').write('blah')
155	... return tmp_dir, src, finished_src, pending_src
156
157	Now we can create the set of result files, that typically come after a
158	successful processing of a regular source:
159
160	Now we can try to distribute those files. Let's start with a source
161	file, that was processed successfully:
162
163	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
164	... 'mysource.csv', create_pending=False)
165	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
166	... pending_src, mode='create')
167	>>> sorted(os.listdir(dc_root))
168	['deleted', 'finished', 'graduated', 'logs', 'unfinished']
169
170	>>> sorted(os.listdir(fin_dir))
171	['mysource.create.finished.csv', 'mysource.csv']
172
173	>>> sorted(os.listdir(unfin_dir))
174	[]
175
176	The created dir will be removed for us by the datacenter. This way we
177	can assured, that less temporary dirs are left hanging around:
178
179	>>> os.path.exists(tmp_dir)
180	False
181
182	The root dir is empty, while the original file and the file containing
183	all processed data were moved to'finished/'.
184
185	Now we restart, but this time we fake an erranous action:
186
187	>>> recreate_dc_storage()
188	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
189	... 'mysource.csv')
190	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
191	... pending_src, mode='create')
192	>>> sorted(os.listdir(dc_root))
193	['deleted', 'finished', 'graduated', 'logs', 'mysource.create.pending.csv', 'unfinished']
194
195	>>> sorted(os.listdir(fin_dir))
196	['mysource.create.finished.csv']
197
198	>>> sorted(os.listdir(unfin_dir))
199	['mysource.csv']
200
201	While the original source was moved to the 'unfinished' dir, the
202	pending file went to the root and the set of already processed items
203	are stored in finished/.
204
205	We fake processing the pending file and assume that everything went
206	well this time:
207
208	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
209	... 'mysource.create.pending.csv', create_pending=False)
210	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
211	... pending_src, mode='create')
212
213	>>> sorted(os.listdir(dc_root))
214	['deleted', 'finished', 'graduated', 'logs', 'unfinished']
215
216	>>> sorted(os.listdir(fin_dir))
217	['mysource.create.finished.csv', 'mysource.csv']
218
219	>>> sorted(os.listdir(unfin_dir))
220	[]
221
222	The result is the same as in the first case shown above.
223
224	We restart again, but this time we fake several non-working imports in
225	a row.
226
227	We start with a faulty start-import:
228
229	>>> recreate_dc_storage()
230	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
231	... 'mysource.csv')
232	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
233	... pending_src, mode='create')
234
235	We try to process the pending file, which fails again:
236
237	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
238	... 'mysource.create.pending.csv')
239	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
240	... pending_src, mode='create')
241
242	We try to process the new pending file:
243
244	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
245	... 'mysource.create.pending.csv')
246	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
247	... pending_src, mode='create')
248
249	>>> sorted(os.listdir(dc_root))
250	['deleted', 'finished', 'graduated', 'logs', 'mysource.create.pending.csv', 'unfinished']
251
252	>>> sorted(os.listdir(fin_dir))
253	['mysource.create.finished.csv']
254
255	>>> sorted(os.listdir(unfin_dir))
256	['mysource.csv']
257
258	Finally, we process the pending file and everything works:
259
260	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
261	... 'mysource.create.pending.csv', create_pending=False)
262	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
263	... pending_src, mode='create')
264
265	>>> sorted(os.listdir(dc_root))
266	['deleted', 'finished', 'graduated', 'logs', 'unfinished']
267
268	>>> sorted(os.listdir(fin_dir))
269	['mysource.create.finished.csv', 'mysource.csv']
270
271	>>> sorted(os.listdir(unfin_dir))
272	[]
273
274	The root dir is empty (contains no input files) and only the files in
275	finished-subdirectory remain.
276
277
278	We can get a list of imported files stored in the finished subfolder:
279
280	>>> mydatacenter.getFinishedFiles()
281	[<waeup.kofa.datacenter.DataCenterFile object at ...>]
282
283	>>> datafile = mydatacenter.getFinishedFiles()[0]
284	>>> datafile.getSize()
285	'2 bytes'
286
287	>>> datafile.getDate() # Nearly current datetime...
288	'...'
289
290
291	Clean up:
292
293	>>> shutil.rmtree(verynewpath)

Note: See TracBrowser for help on using the repository browser.

Context navigation

source: main/waeup.kofa/trunk/src/waeup/kofa/doctests/datacenter.txt @ 16262

Download in other formats: