source: main/waeup.kofa/trunk/src/waeup/kofa/doctests/datacenter.txt @ 15702

Last change on this file since 15702 was 15416, checked in by Henrik Bettermann, 6 years ago

Backup deleted graduated student data somewhere else to ease graduated student data migration.

File size: 8.8 KB
Line 
1Data Center
2***********
3
4The Kofa data center cares for managing CSV files and importing then.
5
6.. :doctest:
7.. :layer: waeup.kofa.testing.KofaUnitTestLayer
8
9Creating a data center
10======================
11
12A data center can be created easily:
13
14    >>> from waeup.kofa.datacenter import DataCenter
15    >>> mydatacenter = DataCenter()
16    >>> mydatacenter
17    <waeup.kofa.datacenter.DataCenter object at 0x...>
18
19Each data center has a location in file system where files are stored:
20
21    >>> storagepath = mydatacenter.storage
22    >>> storagepath
23    '/tmp/tmp...'
24
25Beside other things it provides two locations to put data of deleted
26items into:
27
28    >>> import os
29    >>> del_path = mydatacenter.deleted_path
30    >>> os.path.isdir(del_path)
31    True
32    >>> grad_path = mydatacenter.graduated_path
33    >>> os.path.isdir(grad_path)
34    True
35
36Overall it complies with the `IDataCenter` interface:
37
38    >>> from zope.interface import verify
39    >>> from waeup.kofa.interfaces import IDataCenter
40    >>> verify.verifyObject(IDataCenter, DataCenter() )
41    True
42
43    >>> verify.verifyClass(IDataCenter, DataCenter)
44    True
45
46Managing the storage path
47=========================
48
49We can set another storage path:
50
51    >>> import os
52    >>> os.mkdir('newlocation')
53    >>> newpath = os.path.abspath('newlocation')
54    >>> mydatacenter.setStoragePath(newpath)
55    []
56
57The result here is a list of filenames, that could not be
58copied. Luckily, this list is empty.
59
60When we set a new storage path, we can tell to move all files in the
61old location to the new one. To see this feature in action, we first
62have to put a file into the old location:
63
64    >>> open(os.path.join(newpath, 'myfile.txt'), 'wb').write('hello')
65
66Now we can set a new location and the file will be copied:
67
68    >>> verynewpath = os.path.abspath('verynewlocation')
69    >>> os.mkdir(verynewpath)
70
71    >>> mydatacenter.setStoragePath(verynewpath, move=True)
72    []
73
74    >>> storagepath = mydatacenter.storage
75    >>> 'myfile.txt' in os.listdir(verynewpath)
76    True
77
78We remove the created file to have a clean testing environment for
79upcoming examples:
80
81    >>> os.unlink(os.path.join(storagepath, 'myfile.txt'))
82
83Uploading files
84===============
85
86We can get a list of files stored in that location:
87
88    >>> mydatacenter.getPendingFiles()
89    []
90
91Let's put some file in the storage:
92
93    >>> import os
94    >>> filepath = os.path.join(storagepath, 'data.csv')
95    >>> open(filepath, 'wb').write('Some Content\n')
96
97Now we can find a file:
98
99    >>> mydatacenter.getPendingFiles()
100    [<waeup.kofa.datacenter.DataCenterFile object at 0x...>]
101
102As we can see, the actual file is wrapped by a convenience wrapper,
103that enables us to fetch some data about the file. The data returned
104is formatted in strings, so that it can easily be put into output
105pages:
106
107    >>> datafile = mydatacenter.getPendingFiles()[0]
108    >>> datafile.getSize()
109    '13 bytes'
110
111    >>> datafile.getDate() # Nearly current datetime...
112    '...'
113
114Clean up:
115
116    >>> import shutil
117    >>> shutil.rmtree(newpath)
118    >>> shutil.rmtree(verynewpath)
119
120
121Distributing processed files
122============================
123
124When files were processed by a batch processor, we can put the
125resulting files into desired destinations.
126
127We recreate the datacenter root in case it is missing:
128
129    >>> import os
130    >>> dc_root = mydatacenter.storage
131    >>> fin_dir = os.path.join(dc_root, 'finished')
132    >>> unfin_dir = os.path.join(dc_root, 'unfinished')
133
134    >>> def recreate_dc_storage():
135    ...   if os.path.exists(dc_root):
136    ...     shutil.rmtree(dc_root)
137    ...   os.mkdir(dc_root)
138    ...   mydatacenter.setStoragePath(mydatacenter.storage)
139    >>> recreate_dc_storage()
140
141We define a function that creates a set of faked result files:
142
143    >>> import os
144    >>> import tempfile
145    >>> def create_fake_results(source_basename, create_pending=True):
146    ...   tmp_dir = tempfile.mkdtemp()
147    ...   src = os.path.join(dc_root, source_basename)
148    ...   pending_src = None
149    ...   if create_pending:
150    ...     pending_src = os.path.join(tmp_dir, 'mypendingsource.csv')
151    ...   finished_src = os.path.join(tmp_dir, 'myfinishedsource.csv')
152    ...   for path in (src, pending_src, finished_src):
153    ...     if path is not None:
154    ...       open(path, 'wb').write('blah')
155    ...   return tmp_dir, src, finished_src, pending_src
156
157Now we can create the set of result files, that typically come after a
158successful processing of a regular source:
159
160Now we can try to distribute those files. Let's start with a source
161file, that was processed successfully:
162
163    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
164    ...  'mysource.csv', create_pending=False)
165    >>> mydatacenter.distProcessedFiles(True, src, finished_src,
166    ...                            pending_src, mode='create')
167    >>> sorted(os.listdir(dc_root))
168    ['deleted', 'finished', 'graduated', 'logs', 'unfinished']
169
170    >>> sorted(os.listdir(fin_dir))
171    ['mysource.create.finished.csv', 'mysource.csv']
172
173    >>> sorted(os.listdir(unfin_dir))
174    []
175
176The created dir will be removed for us by the datacenter. This way we
177can assured, that less temporary dirs are left hanging around:
178
179    >>> os.path.exists(tmp_dir)
180    False
181
182The root dir is empty, while the original file and the file containing
183all processed data were moved to'finished/'.
184
185Now we restart, but this time we fake an erranous action:
186
187    >>> recreate_dc_storage()
188    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
189    ...  'mysource.csv')
190    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
191    ...                                 pending_src, mode='create')
192    >>> sorted(os.listdir(dc_root))
193    ['deleted', 'finished', 'graduated', 'logs', 'mysource.create.pending.csv', 'unfinished']
194
195    >>> sorted(os.listdir(fin_dir))
196    ['mysource.create.finished.csv']
197
198    >>> sorted(os.listdir(unfin_dir))
199    ['mysource.csv']
200
201While the original source was moved to the 'unfinished' dir, the
202pending file went to the root and the set of already processed items
203are stored in finished/.
204
205We fake processing the pending file and assume that everything went
206well this time:
207
208    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
209    ...  'mysource.create.pending.csv', create_pending=False)
210    >>> mydatacenter.distProcessedFiles(True, src, finished_src,
211    ...                                 pending_src, mode='create')
212
213    >>> sorted(os.listdir(dc_root))
214    ['deleted', 'finished', 'graduated', 'logs', 'unfinished']
215
216    >>> sorted(os.listdir(fin_dir))
217    ['mysource.create.finished.csv', 'mysource.csv']
218
219    >>> sorted(os.listdir(unfin_dir))
220    []
221
222The result is the same as in the first case shown above.
223
224We restart again, but this time we fake several non-working imports in
225a row.
226
227We start with a faulty start-import:
228
229    >>> recreate_dc_storage()
230    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
231    ...  'mysource.csv')
232    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
233    ...                                 pending_src, mode='create')
234
235We try to process the pending file, which fails again:
236
237    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
238    ...  'mysource.create.pending.csv')
239    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
240    ...                                 pending_src, mode='create')
241
242We try to process the new pending file:
243
244    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
245    ...  'mysource.create.pending.csv')
246    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
247    ...                                 pending_src, mode='create')
248
249    >>> sorted(os.listdir(dc_root))
250    ['deleted', 'finished', 'graduated', 'logs', 'mysource.create.pending.csv', 'unfinished']
251
252    >>> sorted(os.listdir(fin_dir))
253    ['mysource.create.finished.csv']
254
255    >>> sorted(os.listdir(unfin_dir))
256    ['mysource.csv']
257
258Finally, we process the pending file and everything works:
259
260    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
261    ...  'mysource.create.pending.csv', create_pending=False)
262    >>> mydatacenter.distProcessedFiles(True, src, finished_src,
263    ...                                 pending_src, mode='create')
264
265    >>> sorted(os.listdir(dc_root))
266    ['deleted', 'finished', 'graduated', 'logs', 'unfinished']
267
268    >>> sorted(os.listdir(fin_dir))
269    ['mysource.create.finished.csv', 'mysource.csv']
270
271    >>> sorted(os.listdir(unfin_dir))
272    []
273
274The root dir is empty (contains no input files) and only the files in
275finished-subdirectory remain.
276
277
278We can get a list of imported files stored in the finished subfolder:
279
280    >>> mydatacenter.getFinishedFiles()
281    [<waeup.kofa.datacenter.DataCenterFile object at ...>]
282
283    >>> datafile = mydatacenter.getFinishedFiles()[0]
284    >>> datafile.getSize()
285    '2 bytes'
286
287    >>> datafile.getDate() # Nearly current datetime...
288    '...'
289
290
291Clean up:
292
293    >>> shutil.rmtree(verynewpath)
Note: See TracBrowser for help on using the repository browser.