Ignore:
Timestamp:
21 Apr 2015, 20:56:58 (10 years ago)
Author:
Henrik Bettermann
Message:

More docs.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/waeup.kofa/trunk/docs/source/userdocs/datacenter/import.rst

    r12867 r12868  
    44***********
    55
    6 The term 'Data Import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The staged import process is described in the following.
     6Stages of Batch Processing
     7==========================
    78
    8 1. File Upload
    9 ==============
     9The term 'data import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The stages of the import process are as follows.
     10
     11Stage 1: File Upload
     12--------------------
    1013
    1114Users with permission
     
    1720.. autoattribute:: waeup.kofa.browser.pages.DatacenterUploadPage.max_files
    1821
    19 When the upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded.
     22If the upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded.
    2023
    21 The uploader changes the filename. An uploaded file foo.csv will be stored as foo_USERNAME.csv where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below)
     24The uploader changes the filename. An uploaded file ``foo.csv`` will be stored as ``foo_USERNAME.csv`` where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below).
    2225
    23 After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import (import step 1). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (import step 2). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user can proceed to the next step (import step 3) by selecting the appropriate processor and an import mode.
     26After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import (**import step 1**). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (**import step 2**). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user is able to proceed to the next step (**import step 3**) by selecting the appropriate processor and an import mode. In import mode ``create`` new objects are added to the database, `in `update`` mode existing objects are modified and in ``remove`` mode deleted.
    2427
    25 2. File Header Validation
    26 =========================
     28Stage 2: File Header Validation
     29-------------------------------
    2730
    28 Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column fields or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data center manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the header fields is temporary, it does not change the file itself.
     31Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column titles or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data import manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the column titles is temporary, it does not modify the uploaded file. Consequently, it does not make sense to set new column titles if the file is not imported afterwards.
    2932
    30 The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step.
     33The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step (**import step 4**).
    3134
    32 3. Data Validation and Import
    33 =============================
     35Stage 3: Data Validation and Import
     36-----------------------------------
     37
     38Import step 4 is the actual data import. The import is started by clicking the 'Perform import' button.
    3439
    3540Kofa does not validate the data in advance. It tries to import the data row-by-row while reading the CSV file. The reason is that import files very often  contain thousands or even tenthousands of records. It is not feasable for data managers to edit import files until they are error-free. Very often such an error is not really a mistake made by the person who compiled the file. Example: The import file contains course results although the student has not yet registered the courses. Then the import of this single record has to wait, i.e. it has to be marked pending, until the student has added the course ticket. Only then it can be edited by the batch processor.
     
    4045   :noindex:
    4146
     47Stage 4: Post-Processing
     48------------------------
    4249
     50The data import is finalized by calling :py:meth:`distProcessedFiles<waeup.kofa.datacenter.DataCenter.distProcessedFiles>`. This method moves the ``.pending`` and ``.finished`` files from their temporary to their final location in the storage path of the filesystem from where they can be accessed through browser user interface.
    4351
     52Batch Processors
     53================
     54
     55All batch processors inherit their methods from the :py:class:`waeup.kofa.utils.batching.BatchProcessor` base class. The core ``doImport`` method always remains unchanged.
Note: See TracChangeset for help on using the changeset viewer.