Changeset 12871
- Timestamp:
- 23 Apr 2015, 07:26:37 (10 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
main/waeup.kofa/trunk/docs/source/userdocs/datacenter/import_stages.rst
r12870 r12871 4 4 ************************** 5 5 6 The term 'data import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The stages of the import process are as follows. 6 The term 'data import' actually understates the range of funcnctions 7 importers really have. As already stated, many importers do not only 8 restore data once backed up by exporters or, in other words, take 9 values from CSV files and write them one-on-one into the database. 10 The data undergo a complex staged data processing algorithm. 11 Therefore, we prefer calling them 'batch processors' instead of 12 importers. The stages of the import process are as follows. 7 13 8 14 Stage 1: File Upload … … 11 17 Users with permission 12 18 :py:class:`waeup.manageDataCenter<waeup.kofa.permissions.ManageDataCenter>` 13 are allowed to access the data center and also to use the upload page. On this page they can see a long table of available batch processors. The table lists required, optional and non-schema fields (see below) for each processor. It also provides a CSV file template which can be filled and uploaded to avoid header errors. 19 are allowed to access the data center and also to use the upload 20 page. On this page they can see a long table of available batch 21 processors. The table lists required, optional and non-schema fields 22 (see below) for each processor. It also provides a CSV file template 23 which can be filled and uploaded to avoid header errors. 14 24 15 Data center managers can upload any kind of CSV file from their local computer. The uploader does not check the integrity of the content but the validity of its CSV encoding (see :py:func:`check_csv_charset<waeup.kofa.utils.helpers.check_csv_charset>`). It also checks the filename extension and allows only a limited number of files in the data center. 25 Data center managers can upload any kind of CSV file from their 26 local computer. The uploader does not check the integrity of the 27 content but the validity of its CSV encoding (see 28 :py:func:`check_csv_charset<waeup.kofa.utils.helpers.check_csv_charset>`). 29 It also checks the filename extension and allows only a limited 30 number of files in the data center. 16 31 17 32 .. autoattribute:: waeup.kofa.browser.pages.DatacenterUploadPage.max_files 33 :noindex: 18 34 19 If the upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded. 35 If the upload succeeded the uploader sends an email to all import 36 managers (users with role 37 :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) 38 of the portal that a new file was uploaded. 20 39 21 The uploader changes the filename. An uploaded file ``foo.csv`` will be stored as ``foo_USERNAME.csv`` where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below). 40 The uploader changes the filename. An uploaded file ``foo.csv`` will 41 be stored as ``foo_USERNAME.csv`` where username is the user id of 42 the currently logged in user. Spaces in filename are replaced by 43 underscores. Pending data filenames remain unchanged (see below). 22 44 23 After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import (**import step 1**). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (**import step 2**). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user is able to proceed to the next step (**import step 3**) by selecting the appropriate processor and an import mode. In import mode ``create`` new objects are added to the database, `in `update`` mode existing objects are modified and in ``remove`` mode deleted. 45 After file upload the data center manager can click the 'Process 46 data' button to open the page where files can be selected for import 47 (**import step 1**). After selecting a file the data center manager 48 can preview the header and the first three records of the uploaded 49 file (**import step 2**). If the preview fails or the header 50 contains duplicate column titles, an error message is raised. The 51 user cannot proceed but is requested to replace the uploaded file. 52 If the preview succeeds the user is able to proceed to the next step 53 (**import step 3**) by selecting the appropriate processor and an 54 import mode. In import mode ``create`` new objects are added to the 55 database, `in `update`` mode existing objects are modified and in 56 ``remove`` mode deleted. 24 57 25 58 Stage 2: File Header Validation 26 59 =============================== 27 60 28 Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column titles or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data import manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the column titles is temporary, it does not modify the uploaded file. Consequently, it does not make sense to set new column titles if the file is not imported afterwards. 61 Import step 3 is the stage where the file content is assessed for 62 the first time and checked if the column titles correspond with the 63 fields of the processor chosen. The page shows the header and the 64 first record of the uploaded file. The page allows to change column 65 titles or to ignore entire columns during import. It might have 66 happened that one or more column titles are misspelled or that the 67 person, who created the file, ignored the case-sensitivity of field 68 names. Then the data import manager can easily fix this by selecting 69 the correct title and click the 'Set headerfields' button. Setting 70 the column titles is temporary, it does not modify the uploaded 71 file. Consequently, it does not make sense to set new column titles 72 if the file is not imported afterwards. 29 73 30 The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step (**import step 4**). 74 The page also calls the `checkHeaders` method of the batch processor 75 which checks for required fields. If a required column title is 76 missing, a warning message is raised and the user can't proceed to 77 the next step (**import step 4**). 31 78 32 79 Stage 3: Data Validation and Import 33 80 =================================== 34 81 35 Import step 4 is the actual data import. The import is started by clicking the 'Perform import' button. This action requires the :py:class:`waeup.importData<waeup.kofa.permissions.ImportData>`. If data managers don't have this permission they will be redirected to the login page. 82 Import step 4 is the actual data import. The import is started by 83 clicking the 'Perform import' button. This action requires the 84 :py:class:`waeup.importData<waeup.kofa.permissions.ImportData>` 85 permission. If data managers don't have this permission, they will 86 be redirected to the login page. 36 87 37 Kofa does not validate the data in advance. It tries to import the data row-by-row while reading the CSV file. The reason is that import files very often contain thousands or even tenthousands of records. It is not feasable for data managers to edit import files until they are error-free. Very often such an error is not really a mistake made by the person who compiled the file. Example: The import file contains course results although the student has not yet registered the courses. Then the import of this single record has to wait, i.e. it has to be marked pending, until the student has added the course ticket. Only then it can be edited by the batch processor. 88 Kofa does not validate the data in advance. It tries to import the 89 data row-by-row while reading the CSV file. The reason is that 90 import files very often contain thousands or even tenthousands of 91 records. It is not feasable for data managers to edit import files 92 until they are error-free. Very often such an error is not really a 93 mistake made by the person who compiled the file. Example: The 94 import file contains course results although the student has not yet 95 registered the courses. Then the import of this single record has to 96 wait, i.e. it has to be marked pending, until the student has added 97 the course ticket. Only then it can be edited by the batch processor. 38 98 39 99 The core import method is: … … 45 105 ======================== 46 106 47 The data import is finalized by calling :py:meth:`distProcessedFiles<waeup.kofa.datacenter.DataCenter.distProcessedFiles>`. This method moves the ``.pending`` and ``.finished`` files from their temporary to their final location in the storage path of the filesystem from where they can be accessed through browser user interface. 107 The data import is finalized by calling 108 :py:meth:`distProcessedFiles<waeup.kofa.datacenter.DataCenter.distProcessedFiles>`. 109 This method moves the ``.pending`` and ``.finished`` files from 110 their temporary to their final location in the storage path of the 111 filesystem from where they can be accessed through browser user 112 interface.
Note: See TracChangeset for help on using the changeset viewer.