Changeset 12867


Ignore:
Timestamp:
20 Apr 2015, 17:32:55 (10 years ago)
Author:
Henrik Bettermann
Message:

Backup work in progress.

Location:
main/waeup.kofa/trunk
Files:
4 edited

Legend:

Unmodified
Added
Removed
  • main/waeup.kofa/trunk/docs/source/userdocs/datacenter/import.rst

    r12863 r12867  
    44***********
    55
    6 .. contents:: Table of Contents
    7    :local:
     6The term 'Data Import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The staged import process is described in the following.
     7
     81. File Upload
     9==============
     10
     11Users with permission
     12:py:class:`waeup.manageDataCenter<waeup.kofa.permissions.ManageDataCenter>`
     13are allowed to access the data center and also to use the upload page. On this page they can see a long table of available batch processors. The table lists required, optional and non-schema fields (see below) for each processor. It also provides a CSV file template which can be filled and uploaded to avoid header errors.
     14
     15Data center managers can upload any kind of CSV file from their local computer. The uploader does not check the integrity of the content but the validity of its CSV encoding (see :py:func:`check_csv_charset<waeup.kofa.utils.helpers.check_csv_charset>`). It also checks the filename extension and allows only a limited number of files in the data center.
     16
     17.. autoattribute:: waeup.kofa.browser.pages.DatacenterUploadPage.max_files
     18
     19When the upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded.
     20
     21The uploader changes the filename. An uploaded file foo.csv will be stored as foo_USERNAME.csv where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below)
     22
     23After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import (import step 1). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (import step 2). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user can proceed to the next step (import step 3) by selecting the appropriate processor and an import mode.
     24
     252. File Header Validation
     26=========================
     27
     28Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column fields or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data center manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the header fields is temporary, it does not change the file itself.
     29
     30The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step.
     31
     323. Data Validation and Import
     33=============================
     34
     35Kofa does not validate the data in advance. It tries to import the data row-by-row while reading the CSV file. The reason is that import files very often  contain thousands or even tenthousands of records. It is not feasable for data managers to edit import files until they are error-free. Very often such an error is not really a mistake made by the person who compiled the file. Example: The import file contains course results although the student has not yet registered the courses. Then the import of this single record has to wait, i.e. it has to be marked pending, until the student has added the course ticket. Only then it can be edited by the batch processor.
     36
     37The core import method is:
     38
     39.. automethod:: waeup.kofa.utils.batching.BatchProcessor.doImport()
     40   :noindex:
     41
     42
     43
  • main/waeup.kofa/trunk/docs/source/userdocs/datacenter/intro.rst

    r12866 r12867  
    2424Administrators of web portals, which store their data in relational databases, are used to getting direct access to the portal's database. There are even tools to handle the administration of these databases over the Internet, like phpMyAdmin or phpPgAdmin to handle MySQL or PostgreSQL databases respectively. These user interfaces bypass the portals' user interfaces and give direct access to the database. They allow to easily import or export (dump) data tables or the entire database structure into CSV or SQL files. What at first sight appears to be very helpful and administration-friendly proves to be very dangerous on closer inspection. Data structures can be easily damaged or destroyed, or data can be easily manipulated by circumventing the portal's security machinery or logging system. Kofa does not provide any external user interface to access the ZODB_ directly, neither for viewing nor for editing data. This includes also the export and import of sets of data. Exports and imports are handled via the Kofa user interface itself. This is called batch processing which means either producing CSV files (comma-separated values) from portal data (export) or processing CSV files in order to add, update or remove portal data (import). Main premise of Kofa's batch processing technology is that the data stored in the ZODB_ can be specifically backed up and restored by exporting and importing data. But that's not all. Batch processors can do much more. They are an integral part of the student registration management.
    2525
     26.. note::
     27
     28  Although exporters are part of Kofa's batch processing module, we will not call them batch processors. Only importers are called batch processors. Exporters produce CSV files, importer process them.
     29
    2630
    2731.. _ZODB: http://www.zodb.org/
  • main/waeup.kofa/trunk/src/waeup/kofa/browser/templates/datacenteruploadpage.pt

    r11558 r12867  
    8484      <a i18n:translate="" class="btn btn-primary btn-xs"
    8585         tal:attributes="href python: 'skeleton?name=' + importer['name']">
    86          Download CSV Skeleton File
     86         Download CSV File Template
    8787      </a>
    8888    </td>
  • main/waeup.kofa/trunk/src/waeup/kofa/utils/batching.py

    r12861 r12867  
    284284    def doImport(self, path, headerfields, mode='create', user='Unknown',
    285285                 logger=None, ignore_empty=True):
    286         """Perform actual import.
     286        """In contrast to most other methods, ``doImport`` is not supposed to
     287        be customized, neither in custom packages nor in derived batch
     288        processor classes. Therefore, this is the only place where we
     289        do import data.
     290
     291        Before this method starts creating or updating persistent data, it
     292        prepares two more files in a temporary folder of the filesystem: (1)
     293        a file for pending data with file extension ``.pending`` and (2)
     294        a file for successfully processed data with file extension
     295        ``.finished``. Then the method starts iterating over all rows of
     296        the CSV file. Each row is treated as follows:
     297
     298        1. An empty row is skipped.
     299
     300        2. Empty strings are replaced by ignore-markers.
     301
     302        3. The `BatchProcessor.checkConversion` method validates all
     303           values in the row. If the validation fails
     304
     305        4.
     306
     307        5.
     308
     309        6.
     310
    287311        """
    288312        time_start = time.time()
Note: See TracChangeset for help on using the changeset viewer.