- Timestamp:
- 21 Apr 2015, 20:56:58 (10 years ago)
- Location:
- main/waeup.kofa/trunk
- Files:
-
- 6 edited
Legend:
- Unmodified
- Added
- Removed
-
main/waeup.kofa/trunk/docs/source/userdocs/datacenter/import.rst
r12867 r12868 4 4 *********** 5 5 6 The term 'Data Import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The staged import process is described in the following. 6 Stages of Batch Processing 7 ========================== 7 8 8 1. File Upload 9 ============== 9 The term 'data import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The stages of the import process are as follows. 10 11 Stage 1: File Upload 12 -------------------- 10 13 11 14 Users with permission … … 17 20 .. autoattribute:: waeup.kofa.browser.pages.DatacenterUploadPage.max_files 18 21 19 Whenthe upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded.22 If the upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded. 20 23 21 The uploader changes the filename. An uploaded file foo.csv will be stored as foo_USERNAME.csv where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below)24 The uploader changes the filename. An uploaded file ``foo.csv`` will be stored as ``foo_USERNAME.csv`` where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below). 22 25 23 After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import ( import step 1). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (import step 2). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user can proceed to the next step (import step 3) by selecting the appropriate processor and an import mode.26 After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import (**import step 1**). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (**import step 2**). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user is able to proceed to the next step (**import step 3**) by selecting the appropriate processor and an import mode. In import mode ``create`` new objects are added to the database, `in `update`` mode existing objects are modified and in ``remove`` mode deleted. 24 27 25 2.File Header Validation26 ========================= 28 Stage 2: File Header Validation 29 ------------------------------- 27 30 28 Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column fields or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data center manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the header fields is temporary, it does not change the file itself.31 Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column titles or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data import manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the column titles is temporary, it does not modify the uploaded file. Consequently, it does not make sense to set new column titles if the file is not imported afterwards. 29 32 30 The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step .33 The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step (**import step 4**). 31 34 32 3. Data Validation and Import 33 ============================= 35 Stage 3: Data Validation and Import 36 ----------------------------------- 37 38 Import step 4 is the actual data import. The import is started by clicking the 'Perform import' button. 34 39 35 40 Kofa does not validate the data in advance. It tries to import the data row-by-row while reading the CSV file. The reason is that import files very often contain thousands or even tenthousands of records. It is not feasable for data managers to edit import files until they are error-free. Very often such an error is not really a mistake made by the person who compiled the file. Example: The import file contains course results although the student has not yet registered the courses. Then the import of this single record has to wait, i.e. it has to be marked pending, until the student has added the course ticket. Only then it can be edited by the batch processor. … … 40 45 :noindex: 41 46 47 Stage 4: Post-Processing 48 ------------------------ 42 49 50 The data import is finalized by calling :py:meth:`distProcessedFiles<waeup.kofa.datacenter.DataCenter.distProcessedFiles>`. This method moves the ``.pending`` and ``.finished`` files from their temporary to their final location in the storage path of the filesystem from where they can be accessed through browser user interface. 43 51 52 Batch Processors 53 ================ 54 55 All batch processors inherit their methods from the :py:class:`waeup.kofa.utils.batching.BatchProcessor` base class. The core ``doImport`` method always remains unchanged. -
main/waeup.kofa/trunk/src/waeup/kofa/browser/batchprocessing.txt
r12439 r12868 574 574 >>> print open(pending_file).read() 575 575 title_prefix,code,title,--ERRORS-- 576 faculty,FAC1,Faculty 1,This object already exists. Skipping.576 faculty,FAC1,Faculty 1,This object already exists. 577 577 faculty,FAC 5,Faculty 5,code: Invalid input 578 578 -
main/waeup.kofa/trunk/src/waeup/kofa/students/tests/test_batching.py
r12811 r12868 548 548 '2,Aaren,C123456,m,aa@aa.ng,1234,admitted,1990-01-04,Berson,mypw1,100000,matric_number: Invalid input\r\n' 549 549 '1,Frank,F123456,m,aa@aa.ng,1234,,1990-01-06,Meyer,,100000,reg_number: Invalid input; matric_number: Invalid input\r\n' 550 '3,Uli,A123456,m,aa@aa.ng,1234,,1990-01-07,Schulz,,100002,This object already exists. Skipping.\r\n'550 '3,Uli,A123456,m,aa@aa.ng,1234,,1990-01-07,Schulz,,100002,This object already exists.\r\n' 551 551 ) 552 552 shutil.rmtree(os.path.dirname(fin_file)) … … 891 891 self.assertEqual(fail_file, 892 892 'reg_number,code,mandatory,level,level_session,score,matric_number,--ERRORS--\r\n' 893 '1,COURSE1,,nonsense,,5,,Not all parents do exist yet. Skipping\r\n'893 '1,COURSE1,,nonsense,,5,,Not all parents do exist yet.\r\n' 894 894 '1,NONSENSE,,100,,5,,code: non-existent\r\n' 895 895 '1,COURSE1,,200,2004,6,,level_session: does not match 2008\r\n' … … 1176 1176 '1,942,online,BTECHBDT,2010/11/26 19:59:33.744 GMT+1,0,' 1177 1177 '19500,schoolfee,19500,2015,paid,' 1178 'Same payment has already been made. Skipping.'1178 'Same payment has already been made.' 1179 1179 in content) 1180 1180 shutil.rmtree(os.path.dirname(fin_file)) -
main/waeup.kofa/trunk/src/waeup/kofa/university/tests/test_batching.py
r11790 r12868 448 448 'local_roles: user_name or local_role missing\r\n' 449 449 'FAC11,DEP2,"[{\'user_name\':\'anne\',\'local_role\':\'waeup.local.DepartmentManager\'}]",' 450 'Not all parents do exist yet. Skipping\r\n'450 'Not all parents do exist yet.\r\n' 451 451 ) 452 452 # Anne got a local role in department DEP2. … … 537 537 'local_roles: user_name or local_role missing\r\n' 538 538 'FAC11,DEP2,CRS2,"[{\'user_name\':\'anne\',\'local_role\':\'waeup.local.Lecturer\'}]",' 539 'Not all parents do exist yet. Skipping\r\n'539 'Not all parents do exist yet.\r\n' 540 540 ) 541 541 # Anne got a local role in course CRS2. … … 636 636 'local_roles: user_name or local_role missing\r\n' 637 637 'FAC11,DEP2,CRT2,"[{\'user_name\':\'anne\',\'local_role\':\'waeup.local.CourseAdviser100\'}]",' 638 'Not all parents do exist yet. Skipping\r\n'638 'Not all parents do exist yet.\r\n' 639 639 ) 640 640 # Anne got a local role in certificate CRT2. … … 726 726 'faculty_code,course,level,department_code,certificate_code,' 727 727 '--ERRORS--\r\nFAC1,CRS1,100,DEP1,CRT1,' 728 'This object already exists. Skipping.\r\nFAC1,CRS1,100,DEP1,CRT2,'729 'Not all parents do exist yet. Skipping\r\n'728 'This object already exists.\r\nFAC1,CRS1,100,DEP1,CRT2,' 729 'Not all parents do exist yet.\r\n' 730 730 731 731 ) -
main/waeup.kofa/trunk/src/waeup/kofa/utils/batching.py
r12867 r12868 298 298 1. An empty row is skipped. 299 299 300 2. Empty strings are replaced by ignore-markers. 301 302 3. The `BatchProcessor.checkConversion` method validates all 303 values in the row. If the validation fails 304 305 4. 306 307 5. 308 309 6. 300 2. Empty strings in the row are replaced by ignore-markers. 301 302 3. The `BatchProcessor.checkConversion` method validates and converts 303 all values in the row. Conversion means the transformation of strings 304 into Python objects. For instance, number expressions have to be 305 transformed into integers, dates into datetime objects, phone number 306 expressions into phone number objects, etc. The converter returns a 307 dictionary with converted values or, if the validation of one of the 308 elements fails, an appropriate warning message. If the conversion 309 fails a pending record is created and stored in the pending data file 310 together with a warning message the converter has raised. 311 312 4. In **create mode** only: 313 314 The parent object must be found and a child 315 object with same object id must not exist. Otherwise the row 316 is skipped, a corresponding warning message is raised and a 317 record is stored in the pending data file. 318 319 Now ``doImport`` tries to add the new object with the data 320 from the conversion dictionary. In some cases this 321 may fail and a DuplicationError is raised. For example, a new 322 payment ticket is created but the same payment for same session 323 has already been made. In this case the object id is unique, no 324 other object with same id exists, but making the 'same' payment 325 twice does not make sense. The import is skipped and a 326 record is stored in the pending data file. 327 328 5. In **update mode** only: 329 330 If the object can't be found, the row is skipped, 331 a ``no such entry`` warning message is raised and a record is 332 stored in the pending data file. 333 334 The `BatchProcessor.checkUpdateRequirements` method checks additional 335 requirements the object must fulfill before being updated. These 336 requirements are not imposed by the data type but the context 337 of the object. For example, post-graduate students have a different 338 registration workflow. With this method we do forbid certain workflow 339 transitions or states. 340 341 Finally, ``doImport`` updates the existing object with the data 342 from the conversion dictionary. 343 344 6. In **remove mode** only: 345 346 If the object can't be found, the row is skipped, 347 a ``no such entry`` warning message is raised and a record is 348 stored in the pending data file. 349 350 Finally, ``doImport`` removes the existing object. 310 351 311 352 """ … … 364 405 self.writeFailedRow( 365 406 failed_writer, string_row, 366 "Not all parents do exist yet. Skipping")407 "Not all parents do exist yet.") 367 408 continue 368 409 if self.entryExists(row, site): … … 370 411 self.writeFailedRow( 371 412 failed_writer, string_row, 372 "This object already exists. Skipping.")413 "This object already exists.") 373 414 continue 374 415 obj = self.callFactory() … … 382 423 num_warns += 1 383 424 self.writeFailedRow( 384 failed_writer, string_row, 385 "%s Skipping." % error.message) 425 failed_writer, string_row, error.message) 386 426 continue 387 427 except DuplicationError, error: 388 428 num_warns += 1 389 429 self.writeFailedRow( 390 failed_writer, string_row, 391 "%s Skipping." % error.msg) 430 failed_writer, string_row, error.msg) 392 431 continue 393 432 elif mode == 'remove': -
main/waeup.kofa/trunk/src/waeup/kofa/utils/batching.txt
r9739 r12868 309 309 >>> print open(result[3]).read() 310 310 owner,name,taxpayer,dinoports,--ERRORS-- 311 Barney,Barneys Home,1,2,This object already exists. Skipping.312 Wilma,Wilmas Asylum,1,1,This object already exists. Skipping.313 Fred,Freds Dinoburgers,0,10,This object already exists. Skipping.314 Joey,Joeys Drive-in,0,110,This object already exists. Skipping.311 Barney,Barneys Home,1,2,This object already exists. 312 Wilma,Wilmas Asylum,1,1,This object already exists. 313 Fred,Freds Dinoburgers,0,10,This object already exists. 314 Joey,Joeys Drive-in,0,110,This object already exists. 315 315 316 316 This way we can correct the faulty entries and afterwards retry without … … 350 350 >>> print open(result[3], 'rb').read() 351 351 name,dinoports,--ERRORS-- 352 Barneys Home,2,This object already exists. Skipping.353 Wilmas Asylum,1,This object already exists. Skipping.354 Freds Dinoburgers,10,This object already exists. Skipping.355 Joeys Drive-in,110,This object already exists. Skipping.352 Barneys Home,2,This object already exists. 353 Wilmas Asylum,1,This object already exists. 354 Freds Dinoburgers,10,This object already exists. 355 Joeys Drive-in,110,This object already exists. 356 356 357 357
Note: See TracChangeset for help on using the changeset viewer.