source: main/waeup.sirp/trunk/src/waeup/sirp/datacenter.txt @ 5121

Last change on this file since 5121 was 4999, checked in by uli, 15 years ago

Update tests.

File size: 7.8 KB
RevLine 
[4169]1WAeUP Data Center
2*****************
3
4The WAeUP data center cares for managing CSV files and importing then.
5
6:Test-Layer: unit
7
8Creating a data center
9======================
10
11A data center can be created easily:
12
[4920]13    >>> from waeup.sirp.datacenter import DataCenter
[4169]14    >>> mydatacenter = DataCenter()
15    >>> mydatacenter
[4920]16    <waeup.sirp.datacenter.DataCenter object at 0x...>
[4169]17
18Each data center has a location in file system where files are stored:
19
20    >>> storagepath = mydatacenter.storage
21    >>> storagepath
[4920]22    '/.../waeup/sirp/files'
[4169]23
24
[4174]25Managing the storage path
26-------------------------
27
28We can set another storage path:
29
30    >>> import os
31    >>> os.mkdir('newlocation')
32    >>> newpath = os.path.abspath('newlocation')
33    >>> mydatacenter.setStoragePath(newpath)
[4191]34    []
[4174]35
[4191]36The result here is a list of filenames, that could not be
37copied. Luckily, this list is empty.
38
[4174]39When we set a new storage path, we can tell to move all files in the
40old location to the new one. To see this feature in action, we first
41have to put a file into the old location:
42
43    >>> open(os.path.join(newpath, 'myfile.txt'), 'wb').write('hello')
44
45Now we can set a new location and the file will be copied:
46
47    >>> verynewpath = os.path.abspath('verynewlocation')
48    >>> os.mkdir(verynewpath)
49
50    >>> mydatacenter.setStoragePath(verynewpath, move=True)
[4191]51    []
52
[4174]53    >>> storagepath = mydatacenter.storage
54    >>> 'myfile.txt' in os.listdir(verynewpath)
55    True
56
57We remove the created file to have a clean testing environment for
58upcoming examples:
59
60    >>> os.unlink(os.path.join(storagepath, 'myfile.txt'))
61
[4169]62Uploading files
63===============
64
65We can get a list of files stored in that location:
66
67    >>> mydatacenter.getFiles()
68    []
69
70Let's put some file in the storage:
71
72    >>> import os
73    >>> filepath = os.path.join(storagepath, 'data.csv')
74    >>> open(filepath, 'wb').write('Some Content\n')
75
76Now we can find a file:
77
78    >>> mydatacenter.getFiles()
[4920]79    [<waeup.sirp.datacenter.DataCenterFile object at 0x...>]
[4169]80
81As we can see, the actual file is wrapped by a convenience wrapper,
82that enables us to fetch some data about the file. The data returned
83is formatted in strings, so that it can easily be put into output
84pages:
85
86    >>> datafile = mydatacenter.getFiles()[0]
87    >>> datafile.getSize()
88    '13 bytes'
89
90    >>> datafile.getDate() # Nearly current datetime...
91    '...'
92
93Clean up:
94
[4174]95    >>> import shutil
96    >>> shutil.rmtree(newpath)
97    >>> shutil.rmtree(verynewpath)
[4169]98
99
[4897]100Distributing processed files
101============================
102
103When files were processed by a batch processor, we can put the
104resulting files into desired destinations.
105
106We recreate the datacenter root in case it is missing:
107
108    >>> import os
109    >>> dc_root = mydatacenter.storage
110    >>> fin_dir = os.path.join(dc_root, 'finished')
111    >>> unfin_dir = os.path.join(dc_root, 'unfinished')
112
113    >>> def recreate_dc_storage():
114    ...   if os.path.exists(dc_root):
115    ...     shutil.rmtree(dc_root)
116    ...   os.mkdir(dc_root)
117    ...   mydatacenter.setStoragePath(mydatacenter.storage)
118    >>> recreate_dc_storage()
119
120We define a function that creates a set of faked result files:
121
122    >>> import os
123    >>> import tempfile
124    >>> def create_fake_results(source_basename, create_pending=True):
125    ...   tmp_dir = tempfile.mkdtemp()
126    ...   src = os.path.join(dc_root, source_basename)
127    ...   pending_src = None
128    ...   if create_pending:
129    ...     pending_src = os.path.join(tmp_dir, 'mypendingsource.csv')
130    ...   finished_src = os.path.join(tmp_dir, 'myfinishedsource.csv')
131    ...   for path in (src, pending_src, finished_src):
132    ...     if path is not None:
133    ...       open(path, 'wb').write('blah')
134    ...   return tmp_dir, src, finished_src, pending_src
135
136Now we can create the set of result files, that typically come after a
137successful processing of a regular source:
138
139Now we can try to distribute those files. Let's start with a source
140file, that was processed successfully:
141
142    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
143    ...  'mysource.csv', create_pending=False)
144    >>> mydatacenter.distProcessedFiles(True, src, finished_src,
[4999]145    ...                            pending_src, mode='create')
[4897]146    >>> sorted(os.listdir(dc_root))
147    ['finished', 'logs', 'unfinished']
148
149    >>> sorted(os.listdir(fin_dir))
[4999]150    ['mysource.create.finished.csv', 'mysource.csv']
[4897]151
152    >>> sorted(os.listdir(unfin_dir))
153    []
154
[4907]155The created dir will be removed for us by the datacenter. This way we
156can assured, that less temporary dirs are left hanging around:
[4897]157
[4907]158    >>> os.path.exists(tmp_dir)
159    False
160
[4897]161The root dir is empty, while the original file and the file containing
162all processed data were moved to'finished/'.
163
164Now we restart, but this time we fake an erranous action:
165
166    >>> recreate_dc_storage()
167    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
168    ...  'mysource.csv')
169    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
[4999]170    ...                                 pending_src, mode='create')
[4897]171    >>> sorted(os.listdir(dc_root))
[4999]172    ['finished', 'logs', 'mysource.create.pending.csv', 'unfinished']
[4897]173
174    >>> sorted(os.listdir(fin_dir))
[4999]175    ['mysource.create.finished.csv']
[4897]176
177    >>> sorted(os.listdir(unfin_dir))
178    ['mysource.csv']
179
180While the original source was moved to the 'unfinished' dir, the
181pending file went to the root and the set of already processed items
182are stored in finished/.
183
184We fake processing the pending file and assume that everything went
185well this time:
186
187    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
[4999]188    ...  'mysource.create.pending.csv', create_pending=False)
[4897]189    >>> mydatacenter.distProcessedFiles(True, src, finished_src,
[4999]190    ...                                 pending_src, mode='create')
[4897]191
192    >>> sorted(os.listdir(dc_root))
193    ['finished', 'logs', 'unfinished']
194
195    >>> sorted(os.listdir(fin_dir))
[4999]196    ['mysource.create.finished.csv', 'mysource.csv']
[4897]197
198    >>> sorted(os.listdir(unfin_dir))
199    []
200
201The result is the same as in the first case shown above.
202
203We restart again, but this time we fake several non-working imports in
204a row.
205
206We start with a faulty start-import:
207
208    >>> recreate_dc_storage()
209    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
210    ...  'mysource.csv')
211    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
[4999]212    ...                                 pending_src, mode='create')
[4897]213
214We try to process the pending file, which fails again:
215
216    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
[4999]217    ...  'mysource.create.pending.csv')
[4897]218    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
[4999]219    ...                                 pending_src, mode='create')
[4897]220
221We try to process the new pending file:
222
223    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
[4999]224    ...  'mysource.create.pending.csv')
[4897]225    >>> mydatacenter.distProcessedFiles(False, src, finished_src,
[4999]226    ...                                 pending_src, mode='create')
[4897]227
228    >>> sorted(os.listdir(dc_root))
[4999]229    ['finished', 'logs', 'mysource.create.pending.csv', 'unfinished']
[4897]230
231    >>> sorted(os.listdir(fin_dir))
[4999]232    ['mysource.create.finished.csv']
[4897]233
234    >>> sorted(os.listdir(unfin_dir))
235    ['mysource.csv']
236
237Finally, we process the pending file and everything works:
238
239    >>> tmp_dir, src, finished_src, pending_src = create_fake_results(
[4999]240    ...  'mysource.create.pending.csv', create_pending=False)
[4897]241    >>> mydatacenter.distProcessedFiles(True, src, finished_src,
[4999]242    ...                                 pending_src, mode='create')
[4897]243
244    >>> sorted(os.listdir(dc_root))
245    ['finished', 'logs', 'unfinished']
246
247    >>> sorted(os.listdir(fin_dir))
[4999]248    ['mysource.create.finished.csv', 'mysource.csv']
[4897]249
250    >>> sorted(os.listdir(unfin_dir))
251    []
252
253The root dir is empty (contains no input files) and only the files in
254finished-subdirectory remain.
255
256Clean up:
257
258    >>> shutil.rmtree(verynewpath)
Note: See TracBrowser for help on using the repository browser.