Context navigation

← Previous change
Next change →

Changeset 12867 for main

Timestamp:

20 Apr 2015, 17:32:55 (10 years ago)

Author:

Henrik Bettermann

Message:

Backup work in progress.

Location:

main/waeup.kofa/trunk

Files:

: 4 edited

docs/source/userdocs/datacenter/import.rst (modified) (1 diff)
docs/source/userdocs/datacenter/intro.rst (modified) (1 diff)
src/waeup/kofa/browser/templates/datacenteruploadpage.pt (modified) (1 diff)
src/waeup/kofa/utils/batching.py (modified) (1 diff)

Legend:

: Unmodified
: Added
: Removed

main/waeup.kofa/trunk/docs/source/userdocs/datacenter/import.rst

-                      r12863
+                      r12867
 ***********
+.. contents:: Table of Contents
+   :local:
+The term 'Data Import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The staged import process is described in the following.
+. File Upload
+==============
+Users with permission
+:py:class:`waeup.manageDataCenter<waeup.kofa.permissions.ManageDataCenter>`
+are allowed to access the data center and also to use the upload page. On this page they can see a long table of available batch processors. The table lists required, optional and non-schema fields (see below) for each processor. It also provides a CSV file template which can be filled and uploaded to avoid header errors.
+Data center managers can upload any kind of CSV file from their local computer. The uploader does not check the integrity of the content but the validity of its CSV encoding (see :py:func:`check_csv_charset<waeup.kofa.utils.helpers.check_csv_charset>`). It also checks the filename extension and allows only a limited number of files in the data center.
+.. autoattribute:: waeup.kofa.browser.pages.DatacenterUploadPage.max_files
+When the upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded.
+The uploader changes the filename. An uploaded file foo.csv will be stored as foo_USERNAME.csv where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below)
+After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import (import step 1). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (import step 2). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user can proceed to the next step (import step 3) by selecting the appropriate processor and an import mode.
+. File Header Validation
+=========================
+Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column fields or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data center manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the header fields is temporary, it does not change the file itself.
+The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step.
+. Data Validation and Import
+=============================
+Kofa does not validate the data in advance. It tries to import the data row-by-row while reading the CSV file. The reason is that import files very often  contain thousands or even tenthousands of records. It is not feasable for data managers to edit import files until they are error-free. Very often such an error is not really a mistake made by the person who compiled the file. Example: The import file contains course results although the student has not yet registered the courses. Then the import of this single record has to wait, i.e. it has to be marked pending, until the student has added the course ticket. Only then it can be edited by the batch processor.
+The core import method is:
+.. automethod:: waeup.kofa.utils.batching.BatchProcessor.doImport()
+   :noindex:

main/waeup.kofa/trunk/docs/source/userdocs/datacenter/intro.rst

-                      r12866
+                      r12867
 Administrators of web portals, which store their data in relational databases, are used to getting direct access to the portal's database. There are even tools to handle the administration of these databases over the Internet, like phpMyAdmin or phpPgAdmin to handle MySQL or PostgreSQL databases respectively. These user interfaces bypass the portals' user interfaces and give direct access to the database. They allow to easily import or export (dump) data tables or the entire database structure into CSV or SQL files. What at first sight appears to be very helpful and administration-friendly proves to be very dangerous on closer inspection. Data structures can be easily damaged or destroyed, or data can be easily manipulated by circumventing the portal's security machinery or logging system. Kofa does not provide any external user interface to access the ZODB_ directly, neither for viewing nor for editing data. This includes also the export and import of sets of data. Exports and imports are handled via the Kofa user interface itself. This is called batch processing which means either producing CSV files (comma-separated values) from portal data (export) or processing CSV files in order to add, update or remove portal data (import). Main premise of Kofa's batch processing technology is that the data stored in the ZODB_ can be specifically backed up and restored by exporting and importing data. But that's not all. Batch processors can do much more. They are an integral part of the student registration management.
+.. note::
+  Although exporters are part of Kofa's batch processing module, we will not call them batch processors. Only importers are called batch processors. Exporters produce CSV files, importer process them.
 .. _ZODB: http://www.zodb.org/

main/waeup.kofa/trunk/src/waeup/kofa/browser/templates/datacenteruploadpage.pt

r11558	r12867
84	84	<a i18n:translate="" class="btn btn-primary btn-xs"
85	85	tal:attributes="href python: 'skeleton?name=' + importer['name']">
86		Download CSV ~~Skeleton Fil~~e
	86	Download CSV File Template
87	87	</a>
88	88	</td>

main/waeup.kofa/trunk/src/waeup/kofa/utils/batching.py

-                      r12861
+                      r12867
     def doImport(self, path, headerfields, mode='create', user='Unknown',
                  logger=None, ignore_empty=True):
+        """Perform actual import.
+        """In contrast to most other methods, ``doImport`` is not supposed to
+        be customized, neither in custom packages nor in derived batch
+        processor classes. Therefore, this is the only place where we
+        do import data.
+        Before this method starts creating or updating persistent data, it
+        prepares two more files in a temporary folder of the filesystem: (1)
+        a file for pending data with file extension ``.pending`` and (2)
+        a file for successfully processed data with file extension
+        ``.finished``. Then the method starts iterating over all rows of
+        the CSV file. Each row is treated as follows:
+. An empty row is skipped.
+. Empty strings are replaced by ignore-markers.
+. The `BatchProcessor.checkConversion` method validates all
+           values in the row. If the validation fails
+.
+.
+.
         """
         time_start = time.time()

Note: See TracChangeset for help on using the changeset viewer.