Changeset 12871


Ignore:
Timestamp:
23 Apr 2015, 07:26:37 (10 years ago)
Author:
Henrik Bettermann
Message:

Wrap lines.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • main/waeup.kofa/trunk/docs/source/userdocs/datacenter/import_stages.rst

    r12870 r12871  
    44**************************
    55
    6 The term 'data import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The stages of the import process are as follows.
     6The term 'data import' actually understates the range of funcnctions
     7importers really have. As already stated, many importers do not only
     8restore data once backed up by exporters or, in other words, take
     9values from CSV files and write them one-on-one into the database.
     10The data undergo a complex staged data processing algorithm.
     11Therefore, we prefer calling them 'batch processors' instead of
     12importers. The stages of the import process are as follows.
    713
    814Stage 1: File Upload
     
    1117Users with permission
    1218:py:class:`waeup.manageDataCenter<waeup.kofa.permissions.ManageDataCenter>`
    13 are allowed to access the data center and also to use the upload page. On this page they can see a long table of available batch processors. The table lists required, optional and non-schema fields (see below) for each processor. It also provides a CSV file template which can be filled and uploaded to avoid header errors.
     19are allowed to access the data center and also to use the upload
     20page. On this page they can see a long table of available batch
     21processors. The table lists required, optional and non-schema fields
     22(see below) for each processor. It also provides a CSV file template
     23which can be filled and uploaded to avoid header errors.
    1424
    15 Data center managers can upload any kind of CSV file from their local computer. The uploader does not check the integrity of the content but the validity of its CSV encoding (see :py:func:`check_csv_charset<waeup.kofa.utils.helpers.check_csv_charset>`). It also checks the filename extension and allows only a limited number of files in the data center.
     25Data center managers can upload any kind of CSV file from their
     26local computer. The uploader does not check the integrity of the
     27content but the validity of its CSV encoding (see
     28:py:func:`check_csv_charset<waeup.kofa.utils.helpers.check_csv_charset>`).
     29It also checks the filename extension and allows only a limited
     30number of files in the data center.
    1631
    1732.. autoattribute:: waeup.kofa.browser.pages.DatacenterUploadPage.max_files
     33   :noindex:
    1834
    19 If the upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded.
     35If the upload succeeded the uploader sends an email to all import
     36managers (users with role
     37:py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`)
     38of the portal that a new file was uploaded.
    2039
    21 The uploader changes the filename. An uploaded file ``foo.csv`` will be stored as ``foo_USERNAME.csv`` where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below).
     40The uploader changes the filename. An uploaded file ``foo.csv`` will
     41be stored as ``foo_USERNAME.csv`` where username is the user id of
     42the currently logged in user. Spaces in filename are replaced by
     43underscores. Pending data filenames remain unchanged (see below).
    2244
    23 After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import (**import step 1**). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (**import step 2**). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user is able to proceed to the next step (**import step 3**) by selecting the appropriate processor and an import mode. In import mode ``create`` new objects are added to the database, `in `update`` mode existing objects are modified and in ``remove`` mode deleted.
     45After file upload the data center manager can click the 'Process
     46data' button to open the page where files can be selected for import
     47(**import step 1**). After selecting a file the data center manager
     48can preview the header and the first three records of the uploaded
     49file (**import step 2**). If the preview fails or the header
     50contains duplicate column titles, an error message is raised. The
     51user cannot proceed but is requested to replace the uploaded file.
     52If the preview succeeds the user is able to proceed to the next step
     53(**import step 3**) by selecting the appropriate processor and an
     54import mode. In import mode ``create`` new objects are added to the
     55database, `in `update`` mode existing objects are modified and in
     56``remove`` mode deleted.
    2457
    2558Stage 2: File Header Validation
    2659===============================
    2760
    28 Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column titles or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data import manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the column titles is temporary, it does not modify the uploaded file. Consequently, it does not make sense to set new column titles if the file is not imported afterwards.
     61Import step 3 is the stage where the file content is assessed for
     62the first time and checked if the column titles correspond with the
     63fields of the processor chosen. The page shows the header and the
     64first record of the uploaded file. The page allows to change column
     65titles or to ignore entire columns during import. It might have
     66happened that one or more column titles are misspelled or that the
     67person, who created the file, ignored the case-sensitivity of field
     68names. Then the data import manager can easily fix this by selecting
     69the correct title and click the 'Set headerfields' button. Setting
     70the column titles is temporary, it does not modify the uploaded
     71file. Consequently, it does not make sense to set new column titles
     72if the file is not imported afterwards.
    2973
    30 The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step (**import step 4**).
     74The page also calls the `checkHeaders` method of the batch processor
     75which checks for required fields. If a required column title is
     76missing, a warning message is raised and the user can't proceed to
     77the next step (**import step 4**).
    3178
    3279Stage 3: Data Validation and Import
    3380===================================
    3481
    35 Import step 4 is the actual data import. The import is started by clicking the 'Perform import' button. This action requires the :py:class:`waeup.importData<waeup.kofa.permissions.ImportData>`. If data managers don't have this permission they will be redirected to the login page.
     82Import step 4 is the actual data import. The import is started by
     83clicking the 'Perform import' button. This action requires the
     84:py:class:`waeup.importData<waeup.kofa.permissions.ImportData>`
     85permission. If data managers don't have this permission, they will
     86be redirected to the login page.
    3687
    37 Kofa does not validate the data in advance. It tries to import the data row-by-row while reading the CSV file. The reason is that import files very often  contain thousands or even tenthousands of records. It is not feasable for data managers to edit import files until they are error-free. Very often such an error is not really a mistake made by the person who compiled the file. Example: The import file contains course results although the student has not yet registered the courses. Then the import of this single record has to wait, i.e. it has to be marked pending, until the student has added the course ticket. Only then it can be edited by the batch processor.
     88Kofa does not validate the data in advance. It tries to import the
     89data row-by-row while reading the CSV file. The reason is that
     90import files very often contain thousands or even tenthousands of
     91records. It is not feasable for data managers to edit import files
     92until they are error-free. Very often such an error is not really a
     93mistake made by the person who compiled the file. Example: The
     94import file contains course results although the student has not yet
     95registered the courses. Then the import of this single record has to
     96wait, i.e. it has to be marked pending, until the student has added
     97the course ticket. Only then it can be edited by the batch processor.
    3898
    3999The core import method is:
     
    45105========================
    46106
    47 The data import is finalized by calling :py:meth:`distProcessedFiles<waeup.kofa.datacenter.DataCenter.distProcessedFiles>`. This method moves the ``.pending`` and ``.finished`` files from their temporary to their final location in the storage path of the filesystem from where they can be accessed through browser user interface.
     107The data import is finalized by calling
     108:py:meth:`distProcessedFiles<waeup.kofa.datacenter.DataCenter.distProcessedFiles>`.
     109This method moves the ``.pending`` and ``.finished`` files from
     110their temporary to their final location in the storage path of the
     111filesystem from where they can be accessed through browser user
     112interface.
Note: See TracChangeset for help on using the changeset viewer.