Changeset 12870


Ignore:
Timestamp:
23 Apr 2015, 06:45:54 (10 years ago)
Author:
Henrik Bettermann
Message:

Restructure the import data section.

Location:
main/waeup.kofa/trunk
Files:
2 added
4 edited

Legend:

Unmodified
Added
Removed
  • main/waeup.kofa/trunk/docs/source/userdocs/datacenter/export.rst

    r12865 r12870  
    44***********
    55
    6 Regular data exporters (1) collect objects from specific containers, (2) iterate over the collected objects, (3) extract and mangle information from each object, (4) write the information of each object into a row of a CSV file and (5) finally provide the file for download. The CSV file is neither stored in the database nor archived in the filesystem. (3) and (4) means a flattening of the hierarchical data structure, i.e. a mapping of objects to flat relational data to be stored in a CSV table. The extracted information must not necessarily be based only on static attributes of the collected object. The data, finally stored in the CSV file, can also be derived from parent or child objects, or dynamically computed by the object's methods and property attributes. These methods and properties can retrieve information from everywhere in the portal's database.
    7 
    8 In the following we list all exporter classes including two attributes and a method description. The `fields` attribute contain the column titles of the export file. These are not necessarily only attributes of the exported objects.
     6Regular data exporters (1) collect objects from specific containers,
     7(2) iterate over the collected objects, (3) extract and mangle
     8information from each object, (4) write the information of each object
     9into a row of a CSV file and (5) finally provide the file for
     10download. The CSV file is neither stored in the database nor archived
     11in the filesystem. (3) and (4) means a flattening of the hierarchical
     12data structure, i.e. a mapping of objects to flat relational data to
     13be stored in a CSV table. The extracted information must not
     14necessarily be based only on static attributes of the collected
     15object. The data, finally stored in the CSV file, can also be derived
     16from parent or child objects, or dynamically computed by the object's
     17methods and property attributes. These methods and properties can
     18retrieve information from everywhere in the portal's database. In the
     19following we list all exporter classes including two attributes and a
     20method description. The `fields` attribute contain the column titles
     21of the export file. These are not necessarily only attributes of the
     22exported objects.
    923
    1024.. note::
    1125
    12   The list of exported columns usually underlies heavy customizations. In the Kofa base package only very few columns are being exported. In some Kofa custom packages tons of data are being gathered from applicants and students and the number of columns increase accordingly.
    13 
    14 The `title` attribute unveils the name of the exporter under which this exporter will be displayed in the user interface. The `mangle_value()` method shows how some of fields are being dynamically computed.
     26The list of exported columns usually underlies heavy customizations.
     27In the Kofa base package only very few columns are being exported. In
     28some Kofa custom packages tons of data are being gathered from
     29applicants and students and the number of columns increase accordingly.
     30
     31The `title` attribute unveils the name of the exporter under which
     32this exporter will be displayed in the user interface. The
     33`mangle_value()` method shows how some of fields are being dynamically
     34computed.
    1535
    1636Regular Exporters
     
    139159======================
    140160
    141 When starting a Student Data Exporter in the Data Center all student records will be taken into consideration, no matter what or where a student is studying. The exporter can also be started 'locally' at various levels in the academic section. Starting one of the exporters e.g. at faculty or department level means that only the data of students are exported who study in this faculty or department respectively. The exporter can also be started at certificate level. Then only the data of students, who are studying the named study course, will be taken into account. At course level the data of those students are being exported who have attended or taken this specific course.
    142 
    143 Student Data Exporter can be further configured through a configuration page. Search parameters like the student's current level, current session and current study mode  can be set to filter sets of students in order to decrease the size of the export file. The set of filter parameters varies and depends on the 'locatation' from where the exporter is called. A completely different set of filter parameters is provided for courses. In this case the session and level can be selected when the course was taken by the student.
     161When starting a Student Data Exporter in the Data Center all student
     162records will be taken into consideration, no matter what or where a
     163student is studying. The exporter can also be started 'locally' at
     164various levels in the academic section. Starting one of the exporters
     165e.g. at faculty or department level means that only the data of
     166students are exported who study in this faculty or department
     167respectively. The exporter can also be started at certificate level.
     168Then only the data of students, who are studying the named study
     169course, will be taken into account. At course level the data of those
     170students are being exported who have attended or taken this specific
     171course.
     172
     173Student Data Exporter can be further configured through a
     174configuration page. Search parameters like the student's current level,
     175 current session and current study mode can be set to filter sets of
     176students in order to decrease the size of the export file. The set of
     177filter parameters varies and depends on the 'locatation' from where
     178the exporter is called. A completely different set of filter
     179parameters is provided for courses. In this case the session and level
     180can be selected when the course was taken by the student.
    144181
    145182Student Exporter
     
    203240  .. automethod:: waeup.kofa.students.export.BedTicketExporter.mangle_value()
    204241
    205 The above exporters refer to a specific content type (object class). They export all attributes of these objects and a few additional parameters derived from the parent objects. These exporters can be used for reimport, or more precisely for backing up and restoring data. The following 'special' exporters are made on request of some universities to collect and compose student data for analysis and postprocessing by the university.
     242The above exporters refer to a specific content type (object class).
     243They export all attributes of these objects and a few additional
     244parameters derived from the parent objects. These exporters can be
     245used for reimport, or more precisely for backing up and restoring
     246data. The following 'special' exporters are made on request of some
     247universities to collect and compose student data for analysis and
     248postprocessing by the university.
    206249
    207250DataForBursary Exporter
  • main/waeup.kofa/trunk/docs/source/userdocs/datacenter/import.rst

    r12869 r12870  
    44***********
    55
    6 Stages of Batch Processing
    7 ==========================
     6.. toctree::
     7   :maxdepth: 3
    88
    9 The term 'data import' actually understates the range of functions importers really have. As already stated, many importers do not only restore data once backed up by exporters or, in other words, take values from CSV files and write them one-on-one into the database. The data undergo a complex staged data processing algorithm. Therefore, we prefer calling them 'batch processors' instead of importers. The stages of the import process are as follows.
     9   import_stages
     10   import_processors
    1011
    11 Stage 1: File Upload
    12 --------------------
    13 
    14 Users with permission
    15 :py:class:`waeup.manageDataCenter<waeup.kofa.permissions.ManageDataCenter>`
    16 are allowed to access the data center and also to use the upload page. On this page they can see a long table of available batch processors. The table lists required, optional and non-schema fields (see below) for each processor. It also provides a CSV file template which can be filled and uploaded to avoid header errors.
    17 
    18 Data center managers can upload any kind of CSV file from their local computer. The uploader does not check the integrity of the content but the validity of its CSV encoding (see :py:func:`check_csv_charset<waeup.kofa.utils.helpers.check_csv_charset>`). It also checks the filename extension and allows only a limited number of files in the data center.
    19 
    20 .. autoattribute:: waeup.kofa.browser.pages.DatacenterUploadPage.max_files
    21 
    22 If the upload succeeded the uploader sends an email to all import managers (users with role :py:class:`waeup.ImportManager<waeup.kofa.permissions.ImportManager>`) of the portal that a new file was uploaded.
    23 
    24 The uploader changes the filename. An uploaded file ``foo.csv`` will be stored as ``foo_USERNAME.csv`` where username is the user id of the currently logged in user. Spaces in filename are replaced by underscores. Pending data filenames remain unchanged (see below).
    25 
    26 After file upload the data center manager can click the 'Process data' button to open the page where files can be selected for import (**import step 1**). After selecting a file the data center manager can preview the header and the first three records of the uploaded file (**import step 2**). If the preview fails or the header contains duplicate column titles, an error message is raised. The user cannot proceed but is requested to replace the uploaded file. If the preview succeeds the user is able to proceed to the next step (**import step 3**) by selecting the appropriate processor and an import mode. In import mode ``create`` new objects are added to the database, `in `update`` mode existing objects are modified and in ``remove`` mode deleted.
    27 
    28 Stage 2: File Header Validation
    29 -------------------------------
    30 
    31 Import step 3 is the stage where the file content is assessed for the first time and checked if the column titles correspond with the fields of the processor chosen. The page shows the header and the first record of the uploaded file. The page allows to change column titles or to ignore entire columns during import. It might have happened that one or more column titles are misspelled or that the person, who created the file, ignored the case-sensitivity of field names. Then the data import manager can easily fix this by selecting the correct title and click the 'Set headerfields' button. Setting the column titles is temporary, it does not modify the uploaded file. Consequently, it does not make sense to set new column titles if the file is not imported afterwards.
    32 
    33 The page also calls the `checkHeaders` method of the batch processor which checks for required fields. If a required column title is missing, a warning message is raised and the user can't proceed to the next step (**import step 4**).
    34 
    35 Stage 3: Data Validation and Import
    36 -----------------------------------
    37 
    38 Import step 4 is the actual data import. The import is started by clicking the 'Perform import' button.
    39 
    40 Kofa does not validate the data in advance. It tries to import the data row-by-row while reading the CSV file. The reason is that import files very often  contain thousands or even tenthousands of records. It is not feasable for data managers to edit import files until they are error-free. Very often such an error is not really a mistake made by the person who compiled the file. Example: The import file contains course results although the student has not yet registered the courses. Then the import of this single record has to wait, i.e. it has to be marked pending, until the student has added the course ticket. Only then it can be edited by the batch processor.
    41 
    42 The core import method is:
    43 
    44 .. automethod:: waeup.kofa.utils.batching.BatchProcessor.doImport()
    45    :noindex:
    46 
    47 Stage 4: Post-Processing
    48 ------------------------
    49 
    50 The data import is finalized by calling :py:meth:`distProcessedFiles<waeup.kofa.datacenter.DataCenter.distProcessedFiles>`. This method moves the ``.pending`` and ``.finished`` files from their temporary to their final location in the storage path of the filesystem from where they can be accessed through browser user interface.
    51 
    52 Batch Processors
    53 ================
    54 
    55 All batch processors inherit from the :py:class:`waeup.kofa.utils.batching.BatchProcessor` base class. The `doImport` method, described above, always remains unchanged. All processors have a property `available_fields` which defines the set of importable data. They correspond with the column titles of the import file. Available fields are usually composed of location fields, interface fields and additional fields. Overlaps are possible. Location fields define the minumum set of fields which are necessary to locate an existing object in order to update or remove it. Interface fields (schema fields) are the fields defined in the interface of the data entity. Additional fields are additionally needed for data processing. We further distinguish between required and optional fields or between schema and non-schema fields.
    56 
    57 In the following we list all available processors of the Kofa base package including some important methods which describe them best. We do not list available fields of each processor here. Available fields are shown in the browser user interface on the upload page of the portal.
    58 
    59 User Processor
    60 --------------
    61 
    62 .. autoclass:: waeup.kofa.authentication.UserProcessor()
    63   :noindex:
    64 
    65 Faculty Processor
    66 -----------------
    67 
    68 .. autoclass:: waeup.kofa.university.batching.FacultyProcessor()
    69   :noindex:
    70 
    71 Department Processor
    72 --------------------
    73 
    74 .. autoclass:: waeup.kofa.university.batching.DepartmentProcessor()
    75   :noindex:
    76 
    77 Certificate Processor
    78 ---------------------
    79 
    80 .. autoclass:: waeup.kofa.university.batching.CertificateProcessor()
    81   :noindex:
    82 
    83 Course Processor
    84 ----------------
    85 
    86 .. autoclass:: waeup.kofa.university.batching.CourseProcessor()
    87   :noindex:
    88 
    89 
    90 Certificate Course Processor
    91 ----------------------------
    92 
    93 .. autoclass:: waeup.kofa.university.batching.CertificateCourseProcessor()
    94   :noindex:
    95 
    96 Applicants Container Processor
    97 ------------------------------
    98 
    99 .. autoclass:: waeup.kofa.applicants.batching.ApplicantsContainerProcessor()
    100   :noindex:
    101 
    102 Applicant Processor
    103 -------------------
    104 
    105 .. autoclass:: waeup.kofa.applicants.batching.ApplicantProcessor()
    106   :noindex:
  • main/waeup.kofa/trunk/docs/source/userdocs/datacenter/intro.rst

    r12867 r12870  
    77===================
    88
    9 Most web portals store their data in a relational database like PostgreSQL, MySQL or Oracle. A relational database is organized in tables of rows and columns, with a unique key for each row. Each data entity gets its own table. Rows in tables can be linked to rows in other tables by storing the unique key of the row to which it should be linked. This sounds quite simple. Many computer users are familiar with this kind of data storage because they are used to spreadsheet programmes like Excel oder Calc which also organize data in tables.
     9Most web portals store their data in a relational database like
     10PostgreSQL, MySQL or Oracle. A relational database is organized in
     11tables of rows and columns, with a unique key for each row. Each data
     12entity gets its own table. Rows in tables can be linked to rows in
     13other tables by storing the unique key of the row to which it should
     14be linked. This sounds quite simple. Many computer users are familiar
     15with this kind of data storage because they are used to spreadsheet
     16programmes like Excel oder Calc which also organize data in tables.
     17Kofa's persistent data are stored in a native object database designed
     18for the Python programming language, the so-called ZODB_. An object
     19database stores objects with attributes and not records as rows with
     20columns in tables. These persistent objects can hold any kind of
     21information in attributes and must not adhere to a specific schema
     22like records in tables of a relational database.
    1023
    11 Kofa's persistent data are stored in a native object database designed for the Python programming language, the so-called ZODB_. An object database stores objects with attributes and not records as rows with columns in tables. These persistent objects can hold any kind of information in attributes and must not adhere to a specific schema like records in tables of a relational database.
    12 
    13 The ZODB_ also supports a hierarchical, treelike storage of objects. Objects can contain other objects if they are declared as containers. Objects are stored like folders and files in a filesystem. This makes the object handling very fast and transparent because we can access objects, or more precisely views of objects, by indicating their path in the database, i.e. by traversing the database tree to the object's location. Furthermore, we are accessing the views of objects through a web browser by entering a URL (Uniform Resource Locator). This publication path corresponds more or less to the traversal path of our objects. In Kofa the path always contains the object identifiers of all objects which are passed when traversing the database tree. Example:
     24The ZODB_ also supports a hierarchical, treelike storage of objects.
     25Objects can contain other objects if they are declared as containers.
     26Objects are stored like folders and files in a filesystem. This makes
     27the object handling very fast and transparent because we can access
     28objects, or more precisely views of objects, by indicating their path
     29in the database, i.e. by traversing the database tree to the object's
     30location. Furthermore, we are accessing the views of objects through a
     31web browser by entering a URL (Uniform Resource Locator). This
     32publication path corresponds more or less to the traversal path of our
     33objects. In Kofa the path always contains the object identifiers of
     34all objects which are passed when traversing the database tree.
     35Example:
    1436
    1537https://kofa-demo.waeup.org/students/K1000000/studycourse/100/DCO
    1638
    17 is the URL which requests a display view of a course ticket with id ``DCO``. This object is stored in a study level container object with id ``100``, stored in a study course container object with id ``studycourse``, stored in the student container object with id ``K1000000``, stored in the students root container, stored in the root container of the application, stored in the root of the database itself.
     39is the URL which requests a display view of a course ticket with id
     40``DCO``. This object is stored in a study level container object with
     41id ``100``, stored in a study course container object with id
     42``studycourse``, stored in the student container object with id
     43``K1000000``, stored in the students root container, stored in the
     44root container of the application, stored in the root of the database
     45itself.
    1846
    19 This kind of storage requires that each object gets a unique object identifier (object id) within its container. The id string is visible in the browser address bar. Though it's technically possible for ids to contain spaces or slashes we do not allow these kinds of special characters in object ids to facilitate the readability of URLs.
     47This kind of storage requires that each object gets a unique object
     48identifier (object id) within its container. The id string is visible
     49in the browser address bar. Though it's technically possible for ids
     50to contain spaces or slashes we do not allow these kinds of special
     51characters in object ids to facilitate the readability of URLs.
    2052
    2153Batch Processing
    2254================
    2355
    24 Administrators of web portals, which store their data in relational databases, are used to getting direct access to the portal's database. There are even tools to handle the administration of these databases over the Internet, like phpMyAdmin or phpPgAdmin to handle MySQL or PostgreSQL databases respectively. These user interfaces bypass the portals' user interfaces and give direct access to the database. They allow to easily import or export (dump) data tables or the entire database structure into CSV or SQL files. What at first sight appears to be very helpful and administration-friendly proves to be very dangerous on closer inspection. Data structures can be easily damaged or destroyed, or data can be easily manipulated by circumventing the portal's security machinery or logging system. Kofa does not provide any external user interface to access the ZODB_ directly, neither for viewing nor for editing data. This includes also the export and import of sets of data. Exports and imports are handled via the Kofa user interface itself. This is called batch processing which means either producing CSV files (comma-separated values) from portal data (export) or processing CSV files in order to add, update or remove portal data (import). Main premise of Kofa's batch processing technology is that the data stored in the ZODB_ can be specifically backed up and restored by exporting and importing data. But that's not all. Batch processors can do much more. They are an integral part of the student registration management.
     56Administrators of web portals, which store their data in relational
     57databases, are used to getting direct access to the portal's database.
     58There are even tools to handle the administration of these databases
     59over the Internet, like phpMyAdmin or phpPgAdmin to handle MySQL or
     60PostgreSQL databases respectively. These user interfaces bypass the
     61portals' user interfaces and give direct access to the database. They
     62allow to easily import or export (dump) data tables or the entire
     63database structure into CSV or SQL files. What at first sight appears
     64to be very helpful and administration-friendly proves to be very
     65dangerous on closer inspection. Data structures can be easily damaged
     66or destroyed, or data can be easily manipulated by circumventing the
     67portal's security machinery or logging system. Kofa does not provide
     68any external user interface to access the ZODB_ directly, neither for
     69viewing nor for editing data. This includes also the export and import
     70of sets of data. Exports and imports are handled via the Kofa user
     71interface itself. This is called batch processing which means either
     72producing CSV files (comma-separated values) from portal data (export)
     73or processing CSV files in order to add, update or remove portal data
     74(import). Main premise of Kofa's batch processing technology is that
     75the data stored in the ZODB_ can be specifically backed up and
     76restored by exporting and importing data. But that's not all. Batch
     77processors can do much more. They are an integral part of the student
     78registration management.
    2579
    2680.. note::
    2781
    28   Although exporters are part of Kofa's batch processing module, we will not call them batch processors. Only importers are called batch processors. Exporters produce CSV files, importer process them.
     82Although exporters are part of Kofa's batch processing module, we will
     83not call them batch processors. Only importers are called batch
     84processors. Exporters produce CSV files, importer process them.
    2985
    3086
  • main/waeup.kofa/trunk/src/waeup/kofa/applicants/batching.py

    r12869 r12870  
    8787
    8888    In update or remove mode `container_code` and `application_number` columns
    89     must not exist. The applicant object is solely localized by
    90     `applicant_id` or `reg_number`.
     89    must not exist. The applicant object is solely localized by searching
     90    the applicants catalog for `reg_number` or `applicant_id` .
    9191    """
    9292    grok.implements(IBatchProcessor)
Note: See TracChangeset for help on using the changeset viewer.