Ignore:
Timestamp:
21 Apr 2015, 20:56:58 (10 years ago)
Author:
Henrik Bettermann
Message:

More docs.

Location:
main/waeup.kofa/trunk
Files:
6 edited

Legend:

Unmodified
Added
Removed
  • main/waeup.kofa/trunk/docs/source/userdocs/datacenter/import.rst

    r12867 r12868  
    44***********
    55
    6 The term 'Data Import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The staged import process is described in the following.
     6Stages of Batch Processing
     7==========================
    78
    8 1. File Upload
    9 ==============
     9The term 'data import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The stages of the import process are as follows.
     10
     11Stage 1: File Upload
     12--------------------
    1013
    1114Users with permission
     
    1720.. autoattribute:: waeup.kofa.browser.pages.DatacenterUploadPage.max_files
    1821
    19 When the upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded.
     22If the upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded.
    2023
    21 The uploader changes the filename. An uploaded file foo.csv will be stored as foo_USERNAME.csv where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below)
     24The uploader changes the filename. An uploaded file ``foo.csv`` will be stored as ``foo_USERNAME.csv`` where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below).
    2225
    23 After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import (import step 1). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (import step 2). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user can proceed to the next step (import step 3) by selecting the appropriate processor and an import mode.
     26After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import (**import step 1**). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (**import step 2**). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user is able to proceed to the next step (**import step 3**) by selecting the appropriate processor and an import mode. In import mode ``create`` new objects are added to the database, `in `update`` mode existing objects are modified and in ``remove`` mode deleted.
    2427
    25 2. File Header Validation
    26 =========================
     28Stage 2: File Header Validation
     29-------------------------------
    2730
    28 Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column fields or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data center manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the header fields is temporary, it does not change the file itself.
     31Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column titles or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data import manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the column titles is temporary, it does not modify the uploaded file. Consequently, it does not make sense to set new column titles if the file is not imported afterwards.
    2932
    30 The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step.
     33The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step (**import step 4**).
    3134
    32 3. Data Validation and Import
    33 =============================
     35Stage 3: Data Validation and Import
     36-----------------------------------
     37
     38Import step 4 is the actual data import. The import is started by clicking the 'Perform import' button.
    3439
    3540Kofa does not validate the data in advance. It tries to import the data row-by-row while reading the CSV file. The reason is that import files very often  contain thousands or even tenthousands of records. It is not feasable for data managers to edit import files until they are error-free. Very often such an error is not really a mistake made by the person who compiled the file. Example: The import file contains course results although the student has not yet registered the courses. Then the import of this single record has to wait, i.e. it has to be marked pending, until the student has added the course ticket. Only then it can be edited by the batch processor.
     
    4045   :noindex:
    4146
     47Stage 4: Post-Processing
     48------------------------
    4249
     50The data import is finalized by calling :py:meth:`distProcessedFiles<waeup.kofa.datacenter.DataCenter.distProcessedFiles>`. This method moves the ``.pending`` and ``.finished`` files from their temporary to their final location in the storage path of the filesystem from where they can be accessed through browser user interface.
    4351
     52Batch Processors
     53================
     54
     55All batch processors inherit their methods from the :py:class:`waeup.kofa.utils.batching.BatchProcessor` base class. The core ``doImport`` method always remains unchanged.
  • main/waeup.kofa/trunk/src/waeup/kofa/browser/batchprocessing.txt

    r12439 r12868  
    574574    >>> print open(pending_file).read()
    575575    title_prefix,code,title,--ERRORS--
    576     faculty,FAC1,Faculty 1,This object already exists. Skipping.
     576    faculty,FAC1,Faculty 1,This object already exists.
    577577    faculty,FAC 5,Faculty 5,code: Invalid input
    578578
  • main/waeup.kofa/trunk/src/waeup/kofa/students/tests/test_batching.py

    r12811 r12868  
    548548            '2,Aaren,C123456,m,aa@aa.ng,1234,admitted,1990-01-04,Berson,mypw1,100000,matric_number: Invalid input\r\n'
    549549            '1,Frank,F123456,m,aa@aa.ng,1234,,1990-01-06,Meyer,,100000,reg_number: Invalid input; matric_number: Invalid input\r\n'
    550             '3,Uli,A123456,m,aa@aa.ng,1234,,1990-01-07,Schulz,,100002,This object already exists. Skipping.\r\n'
     550            '3,Uli,A123456,m,aa@aa.ng,1234,,1990-01-07,Schulz,,100002,This object already exists.\r\n'
    551551            )
    552552        shutil.rmtree(os.path.dirname(fin_file))
     
    891891        self.assertEqual(fail_file,
    892892            'reg_number,code,mandatory,level,level_session,score,matric_number,--ERRORS--\r\n'
    893             '1,COURSE1,,nonsense,,5,,Not all parents do exist yet. Skipping\r\n'
     893            '1,COURSE1,,nonsense,,5,,Not all parents do exist yet.\r\n'
    894894            '1,NONSENSE,,100,,5,,code: non-existent\r\n'
    895895            '1,COURSE1,,200,2004,6,,level_session: does not match 2008\r\n'
     
    11761176            '1,942,online,BTECHBDT,2010/11/26 19:59:33.744 GMT+1,0,'
    11771177            '19500,schoolfee,19500,2015,paid,'
    1178             'Same payment has already been made. Skipping.'
     1178            'Same payment has already been made.'
    11791179            in content)
    11801180        shutil.rmtree(os.path.dirname(fin_file))
  • main/waeup.kofa/trunk/src/waeup/kofa/university/tests/test_batching.py

    r11790 r12868  
    448448            'local_roles: user_name or local_role missing\r\n'
    449449            'FAC11,DEP2,"[{\'user_name\':\'anne\',\'local_role\':\'waeup.local.DepartmentManager\'}]",'
    450             'Not all parents do exist yet. Skipping\r\n'
     450            'Not all parents do exist yet.\r\n'
    451451            )
    452452        # Anne got a local role in department DEP2.
     
    537537            'local_roles: user_name or local_role missing\r\n'
    538538            'FAC11,DEP2,CRS2,"[{\'user_name\':\'anne\',\'local_role\':\'waeup.local.Lecturer\'}]",'
    539             'Not all parents do exist yet. Skipping\r\n'
     539            'Not all parents do exist yet.\r\n'
    540540            )
    541541        # Anne got a local role in course CRS2.
     
    636636            'local_roles: user_name or local_role missing\r\n'
    637637            'FAC11,DEP2,CRT2,"[{\'user_name\':\'anne\',\'local_role\':\'waeup.local.CourseAdviser100\'}]",'
    638             'Not all parents do exist yet. Skipping\r\n'
     638            'Not all parents do exist yet.\r\n'
    639639            )
    640640        # Anne got a local role in certificate CRT2.
     
    726726            'faculty_code,course,level,department_code,certificate_code,'
    727727            '--ERRORS--\r\nFAC1,CRS1,100,DEP1,CRT1,'
    728             'This object already exists. Skipping.\r\nFAC1,CRS1,100,DEP1,CRT2,'
    729             'Not all parents do exist yet. Skipping\r\n'
     728            'This object already exists.\r\nFAC1,CRS1,100,DEP1,CRT2,'
     729            'Not all parents do exist yet.\r\n'
    730730
    731731            )
  • main/waeup.kofa/trunk/src/waeup/kofa/utils/batching.py

    r12867 r12868  
    298298        1. An empty row is skipped.
    299299
    300         2. Empty strings are replaced by ignore-markers.
    301 
    302         3. The `BatchProcessor.checkConversion` method validates all
    303            values in the row. If the validation fails
    304 
    305         4.
    306 
    307         5.
    308 
    309         6.
     300        2. Empty strings in the row are replaced by ignore-markers.
     301
     302        3. The `BatchProcessor.checkConversion` method validates and converts
     303           all values in the row. Conversion means the transformation of strings
     304           into Python objects. For instance, number expressions have to be
     305           transformed into integers, dates into datetime objects, phone number
     306           expressions into phone number objects, etc. The converter returns a
     307           dictionary with converted values or, if the validation of one of the
     308           elements fails, an appropriate warning message. If the conversion
     309           fails a pending record is created and stored in the pending data file
     310           together with a warning message the converter has raised.
     311
     312        4. In **create mode** only:
     313
     314           The parent object must be found and a child
     315           object with same object id must not exist. Otherwise the row
     316           is skipped, a corresponding warning message is raised and a
     317           record is stored in the pending data file.
     318
     319           Now ``doImport`` tries to add the new object with the data
     320           from the conversion dictionary. In some cases this
     321           may fail and a DuplicationError is raised. For example, a new
     322           payment ticket is created but the same payment for same session
     323           has already been made. In this case the object id is unique, no
     324           other object with same id exists, but making the 'same' payment
     325           twice does not make sense. The import is skipped and a
     326           record is stored in the pending data file.
     327
     328        5. In **update mode** only:
     329
     330           If the object can't be found, the row is skipped,
     331           a ``no such entry`` warning message is raised and a record is
     332           stored in the pending data file.
     333
     334           The `BatchProcessor.checkUpdateRequirements` method checks additional
     335           requirements the object must fulfill before being updated. These
     336           requirements are not imposed by the data type but the context
     337           of the object. For example, post-graduate students have a different
     338           registration workflow. With this method we do forbid certain workflow
     339           transitions or states.
     340
     341           Finally, ``doImport`` updates the existing object with the data
     342           from the conversion dictionary.
     343
     344        6. In **remove mode** only:
     345
     346           If the object can't be found, the row is skipped,
     347           a ``no such entry`` warning message is raised and a record is
     348           stored in the pending data file.
     349
     350           Finally, ``doImport`` removes the existing object.
    310351
    311352        """
     
    364405                    self.writeFailedRow(
    365406                        failed_writer, string_row,
    366                         "Not all parents do exist yet. Skipping")
     407                        "Not all parents do exist yet.")
    367408                    continue
    368409                if self.entryExists(row, site):
     
    370411                    self.writeFailedRow(
    371412                        failed_writer, string_row,
    372                         "This object already exists. Skipping.")
     413                        "This object already exists.")
    373414                    continue
    374415                obj = self.callFactory()
     
    382423                    num_warns += 1
    383424                    self.writeFailedRow(
    384                         failed_writer, string_row,
    385                         "%s Skipping." % error.message)
     425                        failed_writer, string_row, error.message)
    386426                    continue
    387427                except DuplicationError, error:
    388428                    num_warns += 1
    389429                    self.writeFailedRow(
    390                         failed_writer, string_row,
    391                         "%s Skipping." % error.msg)
     430                        failed_writer, string_row, error.msg)
    392431                    continue
    393432            elif mode == 'remove':
  • main/waeup.kofa/trunk/src/waeup/kofa/utils/batching.txt

    r9739 r12868  
    309309    >>> print open(result[3]).read()
    310310    owner,name,taxpayer,dinoports,--ERRORS--
    311     Barney,Barneys Home,1,2,This object already exists. Skipping.
    312     Wilma,Wilmas Asylum,1,1,This object already exists. Skipping.
    313     Fred,Freds Dinoburgers,0,10,This object already exists. Skipping.
    314     Joey,Joeys Drive-in,0,110,This object already exists. Skipping.
     311    Barney,Barneys Home,1,2,This object already exists.
     312    Wilma,Wilmas Asylum,1,1,This object already exists.
     313    Fred,Freds Dinoburgers,0,10,This object already exists.
     314    Joey,Joeys Drive-in,0,110,This object already exists.
    315315
    316316This way we can correct the faulty entries and afterwards retry without
     
    350350    >>> print open(result[3], 'rb').read()
    351351    name,dinoports,--ERRORS--
    352     Barneys Home,2,This object already exists. Skipping.
    353     Wilmas Asylum,1,This object already exists. Skipping.
    354     Freds Dinoburgers,10,This object already exists. Skipping.
    355     Joeys Drive-in,110,This object already exists. Skipping.
     352    Barneys Home,2,This object already exists.
     353    Wilmas Asylum,1,This object already exists.
     354    Freds Dinoburgers,10,This object already exists.
     355    Joeys Drive-in,110,This object already exists.
    356356
    357357
Note: See TracChangeset for help on using the changeset viewer.