Context navigation

source: main/waeup.sirp/trunk/src/waeup/sirp/datacenter.txt @ 4969

Last change on this file since 4969 was 4962, checked in by uli, 15 years ago
Remove quick-import tests.
File size: 7.6 KB

Line
1	WAeUP Data Center
2	*****************
3
4	The WAeUP data center cares for managing CSV files and importing then.
5
6	:Test-Layer: unit
7
8	Creating a data center
9	======================
10
11	A data center can be created easily:
12
13	>>> from waeup.sirp.datacenter import DataCenter
14	>>> mydatacenter = DataCenter()
15	>>> mydatacenter
16	<waeup.sirp.datacenter.DataCenter object at 0x...>
17
18	Each data center has a location in file system where files are stored:
19
20	>>> storagepath = mydatacenter.storage
21	>>> storagepath
22	'/.../waeup/sirp/files'
23
24
25	Managing the storage path
26	-------------------------
27
28	We can set another storage path:
29
30	>>> import os
31	>>> os.mkdir('newlocation')
32	>>> newpath = os.path.abspath('newlocation')
33	>>> mydatacenter.setStoragePath(newpath)
34	[]
35
36	The result here is a list of filenames, that could not be
37	copied. Luckily, this list is empty.
38
39	When we set a new storage path, we can tell to move all files in the
40	old location to the new one. To see this feature in action, we first
41	have to put a file into the old location:
42
43	>>> open(os.path.join(newpath, 'myfile.txt'), 'wb').write('hello')
44
45	Now we can set a new location and the file will be copied:
46
47	>>> verynewpath = os.path.abspath('verynewlocation')
48	>>> os.mkdir(verynewpath)
49
50	>>> mydatacenter.setStoragePath(verynewpath, move=True)
51	[]
52
53	>>> storagepath = mydatacenter.storage
54	>>> 'myfile.txt' in os.listdir(verynewpath)
55	True
56
57	We remove the created file to have a clean testing environment for
58	upcoming examples:
59
60	>>> os.unlink(os.path.join(storagepath, 'myfile.txt'))
61
62	Uploading files
63	===============
64
65	We can get a list of files stored in that location:
66
67	>>> mydatacenter.getFiles()
68	[]
69
70	Let's put some file in the storage:
71
72	>>> import os
73	>>> filepath = os.path.join(storagepath, 'data.csv')
74	>>> open(filepath, 'wb').write('Some Content\n')
75
76	Now we can find a file:
77
78	>>> mydatacenter.getFiles()
79	[<waeup.sirp.datacenter.DataCenterFile object at 0x...>]
80
81	As we can see, the actual file is wrapped by a convenience wrapper,
82	that enables us to fetch some data about the file. The data returned
83	is formatted in strings, so that it can easily be put into output
84	pages:
85
86	>>> datafile = mydatacenter.getFiles()[0]
87	>>> datafile.getSize()
88	'13 bytes'
89
90	>>> datafile.getDate() # Nearly current datetime...
91	'...'
92
93	Clean up:
94
95	>>> import shutil
96	>>> shutil.rmtree(newpath)
97	>>> shutil.rmtree(verynewpath)
98
99
100	Distributing processed files
101	============================
102
103	When files were processed by a batch processor, we can put the
104	resulting files into desired destinations.
105
106	We recreate the datacenter root in case it is missing:
107
108	>>> import os
109	>>> dc_root = mydatacenter.storage
110	>>> fin_dir = os.path.join(dc_root, 'finished')
111	>>> unfin_dir = os.path.join(dc_root, 'unfinished')
112
113	>>> def recreate_dc_storage():
114	... if os.path.exists(dc_root):
115	... shutil.rmtree(dc_root)
116	... os.mkdir(dc_root)
117	... mydatacenter.setStoragePath(mydatacenter.storage)
118	>>> recreate_dc_storage()
119
120	We define a function that creates a set of faked result files:
121
122	>>> import os
123	>>> import tempfile
124	>>> def create_fake_results(source_basename, create_pending=True):
125	... tmp_dir = tempfile.mkdtemp()
126	... src = os.path.join(dc_root, source_basename)
127	... pending_src = None
128	... if create_pending:
129	... pending_src = os.path.join(tmp_dir, 'mypendingsource.csv')
130	... finished_src = os.path.join(tmp_dir, 'myfinishedsource.csv')
131	... for path in (src, pending_src, finished_src):
132	... if path is not None:
133	... open(path, 'wb').write('blah')
134	... return tmp_dir, src, finished_src, pending_src
135
136	Now we can create the set of result files, that typically come after a
137	successful processing of a regular source:
138
139	Now we can try to distribute those files. Let's start with a source
140	file, that was processed successfully:
141
142	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
143	... 'mysource.csv', create_pending=False)
144	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
145	... pending_src)
146	>>> sorted(os.listdir(dc_root))
147	['finished', 'logs', 'unfinished']
148
149	>>> sorted(os.listdir(fin_dir))
150	['mysource.csv', 'mysource.finished.csv']
151
152	>>> sorted(os.listdir(unfin_dir))
153	[]
154
155	The created dir will be removed for us by the datacenter. This way we
156	can assured, that less temporary dirs are left hanging around:
157
158	>>> os.path.exists(tmp_dir)
159	False
160
161	The root dir is empty, while the original file and the file containing
162	all processed data were moved to'finished/'.
163
164	Now we restart, but this time we fake an erranous action:
165
166	>>> recreate_dc_storage()
167	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
168	... 'mysource.csv')
169	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
170	... pending_src)
171	>>> sorted(os.listdir(dc_root))
172	['finished', 'logs', 'mysource.pending.csv', 'unfinished']
173
174	>>> sorted(os.listdir(fin_dir))
175	['mysource.finished.csv']
176
177	>>> sorted(os.listdir(unfin_dir))
178	['mysource.csv']
179
180	While the original source was moved to the 'unfinished' dir, the
181	pending file went to the root and the set of already processed items
182	are stored in finished/.
183
184	We fake processing the pending file and assume that everything went
185	well this time:
186
187	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
188	... 'mysource.pending.csv', create_pending=False)
189	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
190	... pending_src)
191
192	>>> sorted(os.listdir(dc_root))
193	['finished', 'logs', 'unfinished']
194
195	>>> sorted(os.listdir(fin_dir))
196	['mysource.csv', 'mysource.finished.csv']
197
198	>>> sorted(os.listdir(unfin_dir))
199	[]
200
201	The result is the same as in the first case shown above.
202
203	We restart again, but this time we fake several non-working imports in
204	a row.
205
206	We start with a faulty start-import:
207
208	>>> recreate_dc_storage()
209	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
210	... 'mysource.csv')
211	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
212	... pending_src)
213
214	We try to process the pending file, which fails again:
215
216	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
217	... 'mysource.pending.csv')
218	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
219	... pending_src)
220
221	We try to process the new pending file:
222
223	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
224	... 'mysource.pending.csv')
225	>>> mydatacenter.distProcessedFiles(False, src, finished_src,
226	... pending_src)
227
228	>>> sorted(os.listdir(dc_root))
229	['finished', 'logs', 'mysource.pending.csv', 'unfinished']
230
231	>>> sorted(os.listdir(fin_dir))
232	['mysource.finished.csv']
233
234	>>> sorted(os.listdir(unfin_dir))
235	['mysource.csv']
236
237	Finally, we process the pending file and everything works:
238
239	>>> tmp_dir, src, finished_src, pending_src = create_fake_results(
240	... 'mysource.pending.csv', create_pending=False)
241	>>> mydatacenter.distProcessedFiles(True, src, finished_src,
242	... pending_src)
243
244	>>> sorted(os.listdir(dc_root))
245	['finished', 'logs', 'unfinished']
246
247	>>> sorted(os.listdir(fin_dir))
248	['mysource.csv', 'mysource.finished.csv']
249
250	>>> sorted(os.listdir(unfin_dir))
251	[]
252
253	The root dir is empty (contains no input files) and only the files in
254	finished-subdirectory remain.
255
256	Clean up:
257
258	>>> shutil.rmtree(verynewpath)

Note: See TracBrowser for help on using the repository browser.

Download in other formats: