source: main/waeup.kofa/trunk/src/waeup/kofa/doctests/datacenter.txt @ 16901

Last change on this file since 16901 was 15416, checked in by Henrik Bettermann, 6 years ago

Backup deleted graduated student data somewhere else to ease graduated student data migration.

File size: 8.8 KB
RevLine 
[12920]1Data Center
2***********
[4169]3
[7819]4The Kofa data center cares for managing CSV files and importing then.
[4169]5
[5140]6.. :doctest:
[7819]7.. :layer: waeup.kofa.testing.KofaUnitTestLayer
[4169]8
9Creating a data center
10======================
11
12A data center can be created easily:
13
[7811]14    >>> from waeup.kofa.datacenter import DataCenter
[4169]15    >>> mydatacenter = DataCenter()
16    >>> mydatacenter
[7811]17    <waeup.kofa.datacenter.DataCenter object at 0x...>
[4169]18
19Each data center has a location in file system where files are stored:
20
21    >>> storagepath = mydatacenter.storage
22    >>> storagepath
[7584]23    '/tmp/tmp...'
[4169]24
[15416]25Beside other things it provides two locations to put data of deleted
[8398]26items into:
[4169]27
[8398]28    >>> import os
29    >>> del_path = mydatacenter.deleted_path
30    >>> os.path.isdir(del_path)
31    True
[15416]32    >>> grad_path = mydatacenter.graduated_path
33    >>> os.path.isdir(grad_path)
34    True
[8398]35
36Overall it complies with the `IDataCenter` interface:
37
38    >>> from zope.interface import verify
39    >>> from waeup.kofa.interfaces import IDataCenter
40    >>> verify.verifyObject(IDataCenter, DataCenter() )
41    True
42
43    >>> verify.verifyClass(IDataCenter, DataCenter)
44    True
45
[4174]46Managing the storage path
[12920]47=========================
[4174]48
49We can set another storage path:
50
51    >>> import os
52    >>> os.mkdir('newlocation')
53    >>> newpath = os.path.abspath('newlocation')
54    >>> mydatacenter.setStoragePath(newpath)
[4191]55    []
[4174]56
[4191]57The result here is a list of filenames, that could not be
58copied. Luckily, this list is empty.
59
[4174]60When we set a new storage path, we can tell to move all files in the
61old location to the new one. To see this feature in action, we first
62have to put a file into the old location:
63
64    >>> open(os.path.join(newpath, 'myfile.txt'), 'wb').write('hello')
65
66Now we can set a new location and the file will be copied:
67
68    >>> verynewpath = os.path.abspath('verynewlocation')
69    >>> os.mkdir(verynewpath)
70
71    >>> mydatacenter.setStoragePath(verynewpath, move=True)
[4191]72    []
73
[4174]74    >>> storagepath = mydatacenter.storage
75    >>> 'myfile.txt' in os.listdir(verynewpath)
76    True
77
78We remove the created file to have a clean testing environment for
79upcoming examples:
80
81    >>> os.unlink(os.path.join(storagepath, 'myfile.txt'))
82
[4169]83Uploading files
84===============
85
86We can get a list of files stored in that location:
87
[9023]88    >>> mydatacenter.getPendingFiles()
[4169]89    []
90
91Let's put some file in the storage:
92
93    >>> import os
94    >>> filepath = os.path.join(storagepath, 'data.csv')
95    >>> open(filepath, 'wb').write('Some Content\n')
96
97Now we can find a file:
98
[9023]99    >>> mydatacenter.getPendingFiles()
[7811]100    [<waeup.kofa.datacenter.DataCenterFile object at 0x...>]
[4169]101
102As we can see, the actual file is wrapped by a convenience wrapper,
103that enables us to fetch some data about the file. The data returned
104is formatted in strings, so that it can easily be put into output
105pages:
106
[9023]107    >>> datafile = mydatacenter.getPendingFiles()[0]
[4169]108    >>> datafile.getSize()
109    '13 bytes'
110
111    >>> datafile.getDate() # Nearly current datetime...
112    '...'
113
114Clean up:
115
[4174]116    >>> import shutil
117    >>> shutil.rmtree(newpath)
118    >>> shutil.rmtree(verynewpath)
[4169]119
120
[4897]121Distributing processed files
122============================
123
124When files were processed by a batch processor, we can put the
125resulting files into desired destinations.
126
127We recreate the datacenter root in case it is missing:
128
129    >>> import os
130    >>> dc_root = mydatacenter.storage
131    >>> fin_dir = os.path.join(dc_root, 'finished')
132    >>> unfin_dir = os.path.join(dc_root, 'unfinished')
133
134    >>> def recreate_dc_storage():
135    ...   if os.path.exists(dc_root):
136    ...     shutil.rmtree(dc_root)
137    ...   os.mkdir(dc_root)
138    ...   mydatacenter.setStoragePath(mydatacenter.storage)
139    >>> recreate_dc_storage()
140
141We define a function that creates a set of faked result files:
142
143    >>> import os
144    >>> import tempfile
145    >>> def create_fake_results(source_basename, create_pending=True):
146    ...   tmp_dir = tempfile.mkdtemp()
147    ...   src = os.path.join(dc_root, source_basename)
148    ...   pending_src = None
149    ...   if create_pending:
150    ...     pending_src = os.path.join(tmp_dir, 'mypendingsource.csv')
151    ...   finished_src = os.path.join(tmp_dir, 'myfinishedsource.csv')
152    ...   for path in (src, pending_src, finished_src):
153    ...     if path is not None:
154    ...       open(path, 'wb').write('blah')
155    ...   return tmp_dir, src, finished_src, pending_src
156
157Now we can create the set of result files, that typically come after a
158successful processing of a regular source:
159
160Now we can try to distribute those files. Let's start with a source
161file, that was processed successfully:
162
163    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
164    ...  'mysource.csv', create_pending=False)
165    >>> mydatacenter.distProcessedFiles(True, src, finished_src,
[4999]166    ...                            pending_src, mode='create')
[4897]167    >>> sorted(os.listdir(dc_root))
[15416]168    ['deleted', 'finished', 'graduated', 'logs', 'unfinished']
[4897]169
170    >>> sorted(os.listdir(fin_dir))
[4999]171    ['mysource.create.finished.csv', 'mysource.csv']
[4897]172
173    >>> sorted(os.listdir(unfin_dir))
174    []
175
[4907]176The created dir will be removed for us by the datacenter. This way we
177can assured, that less temporary dirs are left hanging around:
[4897]178
[4907]179    >>> os.path.exists(tmp_dir)
180    False
181
[4897]182The root dir is empty, while the original file and the file containing
183all processed data were moved to'finished/'.
184
185Now we restart, but this time we fake an erranous action:
186
187    >>> recreate_dc_storage()
188    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
189    ...  'mysource.csv')
190    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
[4999]191    ...                                 pending_src, mode='create')
[4897]192    >>> sorted(os.listdir(dc_root))
[15416]193    ['deleted', 'finished', 'graduated', 'logs', 'mysource.create.pending.csv', 'unfinished']
[4897]194
195    >>> sorted(os.listdir(fin_dir))
[4999]196    ['mysource.create.finished.csv']
[4897]197
198    >>> sorted(os.listdir(unfin_dir))
199    ['mysource.csv']
200
201While the original source was moved to the 'unfinished' dir, the
202pending file went to the root and the set of already processed items
203are stored in finished/.
204
205We fake processing the pending file and assume that everything went
206well this time:
207
208    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
[4999]209    ...  'mysource.create.pending.csv', create_pending=False)
[4897]210    >>> mydatacenter.distProcessedFiles(True, src, finished_src,
[4999]211    ...                                 pending_src, mode='create')
[4897]212
213    >>> sorted(os.listdir(dc_root))
[15416]214    ['deleted', 'finished', 'graduated', 'logs', 'unfinished']
[4897]215
216    >>> sorted(os.listdir(fin_dir))
[4999]217    ['mysource.create.finished.csv', 'mysource.csv']
[4897]218
219    >>> sorted(os.listdir(unfin_dir))
220    []
221
222The result is the same as in the first case shown above.
223
224We restart again, but this time we fake several non-working imports in
225a row.
226
227We start with a faulty start-import:
228
229    >>> recreate_dc_storage()
230    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
231    ...  'mysource.csv')
232    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
[4999]233    ...                                 pending_src, mode='create')
[4897]234
235We try to process the pending file, which fails again:
236
237    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
[4999]238    ...  'mysource.create.pending.csv')
[4897]239    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
[4999]240    ...                                 pending_src, mode='create')
[4897]241
242We try to process the new pending file:
243
244    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
[4999]245    ...  'mysource.create.pending.csv')
[4897]246    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
[4999]247    ...                                 pending_src, mode='create')
[4897]248
249    >>> sorted(os.listdir(dc_root))
[15416]250    ['deleted', 'finished', 'graduated', 'logs', 'mysource.create.pending.csv', 'unfinished']
[4897]251
252    >>> sorted(os.listdir(fin_dir))
[4999]253    ['mysource.create.finished.csv']
[4897]254
255    >>> sorted(os.listdir(unfin_dir))
256    ['mysource.csv']
257
258Finally, we process the pending file and everything works:
259
260    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
[4999]261    ...  'mysource.create.pending.csv', create_pending=False)
[4897]262    >>> mydatacenter.distProcessedFiles(True, src, finished_src,
[4999]263    ...                                 pending_src, mode='create')
[4897]264
265    >>> sorted(os.listdir(dc_root))
[15416]266    ['deleted', 'finished', 'graduated', 'logs', 'unfinished']
[4897]267
268    >>> sorted(os.listdir(fin_dir))
[4999]269    ['mysource.create.finished.csv', 'mysource.csv']
[4897]270
271    >>> sorted(os.listdir(unfin_dir))
272    []
273
274The root dir is empty (contains no input files) and only the files in
275finished-subdirectory remain.
276
[9023]277
278We can get a list of imported files stored in the finished subfolder:
279
280    >>> mydatacenter.getFinishedFiles()
[9589]281    [<waeup.kofa.datacenter.DataCenterFile object at ...>]
[9023]282
283    >>> datafile = mydatacenter.getFinishedFiles()[0]
284    >>> datafile.getSize()
285    '2 bytes'
286
287    >>> datafile.getDate() # Nearly current datetime...
288    '...'
289
290
[4897]291Clean up:
292
293    >>> shutil.rmtree(verynewpath)
Note: See TracBrowser for help on using the repository browser.