October 13, 2009
Next function looked at was one that runs its processing through the event queue. There were some core structural changes here because of limitations in PostgreSQL triggers. The old method was based on statement data set processing which added multiple entries for an event to the queue. The event daemon deleted events from the queue on notification which triggered the processing functions again via a statement data set. The event notification system can be replaced almost like for like in PostgreSQL with NOTIFY/LISTEN. The new structure involves the event daemon updating records in the queue to flag a particular set of records for an event to be processed. The processing function can then query the queue for these updated records associated with its event handling as a virtual data set in much the same way as the original virtual statement data set. The function must then explicitly delete the entries from the event queue after processing is finished. This is all working now and there is a PostgreSQL equivalent event daemon (still in testing) and a re-implemented function carrying out queue entry and queue removal for unit work marks. Some new issues to be aware of in conversion are below.
- PostgreSQL does not do short circuit evaluation, so IF OLD IS NOT NULL AND OLD.X = S will not work when OLD is NULL as the remainder of the condition will still be evaluated causing an error. Conditions must be written like, IF OLD IS NOT NULL THEN IF OLD.X = S THEN ELSE … END IF; ELSE … END IF; which tends to be a lot more clunky.
- Most of the unit processing functions are multi-talented and cover different sets of update types with the same code base. Handling of this is controlled by trigger updates based on changes to specific named columns. Of course you can’t do this in PostgreSQL, so all the separate trigger conditions must be amalgamated into one and folded into the start of the function itself which of course doesn’t do short circuit evaluation … This logic is generally attached to the function pushing the event into the queue so it is at least contained in a reasonably clear and consistent place.
- Again due to the lack of a CASE statement in OpenIngres a table called “er_relop_map” was used to provide a query based mechanism to do IF A > B THEN 1 ELSE 0 type mappings by using normal joins. This could probably be replaced by simply using CASE statements in the expressions directly and remove the need for the additional table and joins. This hasn’t been done yet (waiting until a the core unit processing functions have been re-implemented).
October 5, 2009
Some monologue for converting the first of the unit assessment functions. This was the us function, standing for unit score. It recalculates statistical values for any given “unit” (built from templates for each of a module, course and year). It is triggered via a touch process from any changes in associated unit marks. We aren’t looking at that specific trigger process here, although the re-implementation of the touch process is included. It is also triggered when unit marks are deleted or the ignore status changes. We aren’t going to cover any of the general substitution conversion parts of the process here, just interesting things specific to this function.
- Using the old nomenclature, the function is triggered on U(s=s)cs; D,U(i)cm. So an UPDATE on class_score when the update affects the session@ column and the value is unchanged, a DELETE on the class_mark table or an UPDATE to the ignore column on the class_mark table. These have been replaced by Ucs; D,Ucm in PostgreSQL. These are separate triggers. In the latter case logic was added to the function so that the function will drop out when TG_OP is UPDATE and TG_TABLE_NAME is class_mark and OLD.ignore = NEW.ignore. In the former case logic was added to the function so that the function will drop out when TG_OP is UPDATE and the update changes any non-derived column or the update does not change a derived column to NULL (this latter is critical to avoid loops as the function itself updates the class_score table thus re-triggering itself).
- Logic was added so that the Vsession and Vunit variables always have values (by assigning from NEW.session or OLD.session depending on TG_OP).
Lets consider an alternative approach.
- We could make the function an ordinary non-trigger function. It is called directly by the function that normally forces the touch and it is called indirectly via a trigger wrapper function to cover the triggers from class_mark. The wrapper function in this cases handles the ignore column check logic. The problem with this is that the function causing the touch is using statement based set processing to update batches of rows, this could be done with a select loop, however it is as yet unclear how to deal with statement based processing in PostgreSQL – that should be the next function to convert!
- Alternatively the function that normally forces the touch could do so by deleting the corresponding row(s). The function would have to then re-insert, but it could do this using a single SELECT. However this approach may not work for other cases.
Finally although the new function behaves the same way there are some improvements.
- The original OpenIngres function executes 14 separate SELECT statements to collate numbers of different grades. In PostgreSQL with the use of the CASE expression function these could all be reduced to a single SELECT with 14 column results instead.
- The original OpenIngres function executes two SELECT statements – one to collate MIN, MAX and AVG aggregates and one to calculate the standard deviation as a further aggregation using the previously calculated AVG values. However, PostgreSQL has a STDDEV aggregate function that actually produces identical results so that could be used instead to reduce this to a single SELECT. It would also remove the need to ensure that done counts of less than 2 do not need to be nullified and coalesced back to 0 (this was done to avoid a division by zero error when the standard deviation was aggregated over one row).
- The two new PostgreSQL SELECT statements could be rolled into one now, so the original 16 are done as one.
None of the refinements above have been made yet pending alterations to other key functions to see how everything will work together.
October 5, 2009
With some functional affect the following are specific implementation issues with conversion from OpenIngres rules and procedures to PostgreSQL triggers and functions.
One of the things that happens a lot is the use of a “touch” to push changes. This is implemented by one function updating a column value in another table to the same value, generally session@. So the triggering SQL would be something like “UPDATE class_score SET session@ = session@ WHERE …” constrained based on current processing set of data. This triggers the procedure by a rule of the form “AFTER UPDATE(session@) ON class_score WHERE session@ = session@”. Basically the procedure is only triggered when session@ is updated to the same value. This cannot easily be mapped to PostgreSQL at all. Firstly it does not have the column restricted update nor does it have the constraining clause, you can simply do “AFTER UPDATE ON class_score”. Secondly you can’t as a result test for “session@ = session@” within the procedure body because it could very well equal itself on any other updates to that table which change other columns but not the “session@”.
There is no simple (compatible) fix to this problem. Instead a different approach will be taken which also offers enhanced functionality. Changes are pushed in this way to force derived marks to update (from known underlying events to avoid too many multiple re-calculations). Instead of pushing a “touch” now a change is pushed by NULLifying all derived column values in the target table. The function triggers on update but internal logic now only triggers re-calculation when the update has only modified to NULL derived columns. This loosely replicates the previous behaviour with the advantage that derived columns are given an indeterminate state prior to re-calculation and any glitch that causes re-calculation to fail will leave them in the indeterminate state rather than with values calculated previously which may now be incorrect. To further clean things up permissions on the table should prevent derived columns being changed except via the function.
October 5, 2009
Carrying on with function conversion, again without functional change. The following are more structural changes due to language differences in OpenIngres stored procedures and PostgreSQL plpgsql.
- Return value. OpenIngres does not need one. In the case of functions called via rules (triggers in PostgreSQL parlance) “before” an event the return value does not matter, OpenIngres will carry on with the event regardless (except where the function has raised an exception). In PostgreSQL a return value is required. If that return value is NULL then the event triggering the function will be aborted in the same way as an exception. Hence some care is required to ensure that the function converted to PostgreSQL behaves the same way. This means ensuring that all function exit points return either NEW or OLD depending on what they are triggered on which leads us to … this matters less for functions triggered “after” events as the return value is discarded (however a return value is still required for any trigger function, so at the least a “RETURN NULL” is required for these).
- In OpenIngres on a delete or an insert triggered function the NEW and OLD content is the same and can be used interchangeably. In PostgreSQL in a delete NEW is NULL (and any reference to NEW.x would raise an exception) and in an insert OLD is NULL (and any reference to OLC.x would raise an exception). This is a bit of a pain. In general it needs the function needs to do more checking about how it has been called. This can be done using the TG_OP value. For example, a function that can be called on any event and just returns the event row value which in OpenIngres would have nothing in PostgreSQL needs to be written as “IF TG_OP = ‘DELETE’ THEN RETURN OLD; ELSE RETURN NEW; END IF;”.
- Arguments. In OpenIngres these are named in the function definition, for example “CREATE PROCEDURE a ( x = CHAR(2), y = INT ) AS …”. Although you can do similarly for PostgreSQL in normal functions you cannot in trigger functions, you must look at the TG_ARGV content. The example would be re-written as “CREATE FUNCTION a() AS … DECLARE x CHAR(2) := TG_ARGV; y INT := TG_ARGV; …”. However in some cases you may be able to do away with any arguments at all replacing them by for example TG_NAME (name of the trigger that triggered the function), TG_TABLE_NAME (name of the table that triggered the function), TG_OP etc.
- Arguments. In OpenIngres value arguments can be passed by the rule depending on the triggering row. In PostgreSQL only string literal arguments can be passed. In cases so far the equivalent NEW.x or OLD.x within the function can be used.
- Update columns. In OpenIngres rules the update trigger can be constrained to work on only certain columns, for example “ON UPDATE(x)” would only trigger on an update writing a value to the “x” column (even if the value is the same as the current value which causes other problems with our local conversion). In PostgreSQL can only do “ON UPDATE” so any column changing triggers the procedure. These must be folded back into the function itself as additional logic (particularly if it assumes it can only be called on change to particular columns). For example, “IF TG_OP = ‘UPDATE’ AND OLD.x <> NEW.x THEN … END IF;”.
- Contraints. In OpenIngres rules the triggering can be further constrained by a “WHERE” clause so for example where “ON UPDATE(x) WHERE x <> y” will only trigger when “x” updated to be not the same as “y”. PostgreSQL does not support this, the logic must be folded back into the function itself.
- Statement processing. In PostgreSQL functions triggered by statement have no content in OLD or NEW. In OpenIngres you can pass the whole set of affected data as (in essence) a virtual table that can be used as a normal table in SQL statements within the function. This is powerful and makes large batch updates very efficient. Haven’t worked out how to translate this across to PostgreSQL yet …
October 5, 2009
Started on converting triggers and functions from OpenIngres to PostgreSQL. This is a list of the main easy things to do first for function conversion. These are all non-affective (should not have any functional change) and are just substituting one thing for another in a simple way.
- Convert the actual function name, in general just replace “$” with “_”.
- Convert referenced table and column names. This was covered in a previous post but there have been some additions, so below is the current rule set.
- First table name mapping.
- if OI_TABLE_NAME = “order” then PG_TABLE_NAME = “ing_order”
- else if OI_TABLE_NAME ends with “@” then PG_TABLE_NAME = OI_TABLE_NAME – “@” + “_key”
- else if OI_TABLE_NAME ends with “#” then PG_TABLE_NAME = OI_TABLE_NAME – “#” + “_num”
- else PG_TABLE_NAME = OI_TABLE_NAME
- Next column name mapping.
- repeat the following until no clause matches:
- if OI_COLUMN_NAME = “oid@” then PG_COLUMN_NAME = “_gui_oid”
- else if OI_COLUMN_NAME = “order” then PG_COLUMN_NAME = “ing_order”
- else if OI_COLUMN_NAME = “grant” then PG_COLUMN_NAME = “ing_grant”
- else if OI_COLUMN_NAME = “group” then PG_COLUMN_NAME = “ing_group”
- else if OI_COLUMN_NAME = “left” then PG_COLUMN_NAME = “ing_left”
- else if OI_COLUMN_NAME = “right” then PG_COLUMN_NAME = “ing_right”
- else if OI_COLUMN_NAME ends with “@” then PG_COLUMN_NAME = OI_COLUMN_NAME – “@”
- else if OI_COLUMN_NAME ends with “#” then PG_COLUMN_NAME = OI_COLUMN_NAME – “#”
- if ( OI_COLUMN_NAME ends with “term” and ( OI_TABLE_NAME = “class_group_event” or OI_TABLE_NAME = “module_work_event” or OI_TABLE_NAME = “class_work_event” or OI_TABLE_NAME = “module_event” or OI_TABLE_NAME = “module_group_event” or OI_TABLE_NAME = “year_group_event” ) ) or ( OI_COLUMN_NAME ends with “pages” and OI_TABLE_NAME = “publication” ) or OI_COLUMN_NAME ends with “session” and OI_TABLE_NAME = “class” ) then PG_COLUMN_NAME = OI_COLUMN_NAME + “_num”
- else if OI_COLUMN_NAME starts with “_” and OI_COLUMN_NAME ends with “$d” then PG_COLUMN_NAME = OI_COLUMN_NAME – “$d”
- else if OI_COLUMN_NAME ends with “$d” then PG_COLUMN_NAME = “_” + OI_COLUMN_NAME – “$d
- Variable name mapping. In OpenIngres variables in a function are preceeded by “:”. This has the advantage of clearly distinguishing variables with the same names as tables and columns from the actual tables and columns. PostgreSQL does not tag variables in such a way. So for example in “select * from x where a = :b and b = :a”. In OpenIngres it is clear that “:b” is a reference to variable “b” and “:a” is a reference to variable “a”. However, in PostgreSQL “select * from x where a = b and b = a” the “a” and “b” references are all interpreted as columns in table “x”. To be safe we should have a standard mapping for variable names. Since variable names often match column names the first mapping to apply is the standard column name mapping above, then preceed the variable name with a “V” (although PostgreSQL is case insensitive here uppercase helps to distinguish the variable). For example, “:mark$d” would become “V_mark” and “:session@” would become “Vsession”. The same applies in variable declarations, here OpenIngres does not require the leading “:”, but the variable must obviously still by “V” in PostgreSQL to match later references.
- Variable declaration mapping. In OpenIngres the following syntax is used “NAME1, NAME2, NAME3 TYPE” which declares all NAME variables as the same type. In PostgreSQL this has to be rewritten as “NAME1 TYPE; NAME2 TYPE; NAME3 TYPE”.
- Variable type mapping. The following apply so far. “INTEGER2″ (or “INT2″) -> “SMALLINT”. “FLOAT8″ -> “FLOAT”. “FLOAT4 -> REAL”. For consistency also map all CHAR(X) types to VARCHAR(X).
- Cast mapping. The following OpenIngres functions are used for casting, “CHAR(X)”, “FLOAT4(X)”. In PostgresSQL these are re-written as “TEXT(X)” and “CAST(X AS REAL)” respectively.
- Null mapping. In OpenIngres the IFNULL(X,Y) function is used a lot. In PostgresSQL this is re-written as “COALESCE(X,Y)”. In fact “COALESCE()” can take multiple arguments so nested IFNULLs could be replaced by one COALESCE. For example, “IFNULL(X,IFNULL(Y,Z))” would just become “COALESCE(X,Y,Z)”.
- Boolean mapping. In OpenIngres there was no boolean type, instead an INT type with a value 0 for false and -1 for true was used (-1 because this worked more naturally with the graphical interface). This means there are conditional tests like “_ignore <> -1″ for example. Since in PostgreSQL we have converted these columns to real boolean types conditions like this would now fail and need to be rewritten as “_ignore <> TRUE” for example. Use the literals “TRUE” and “FALSE” in place of “-1″ and “0″ where a comparison is made on a boolean column.
- The behaviour of MIN(), MAX() and AVG() aggregate functions in PostgreSQL differs from OpenIngres when the SELECT returns no rows. In OpenIngres they will have the value 0 whereas in PostgreSQL it will be NULL. For compatibility they should be rewritten, for example, “MAX(X)” should be re-written as “COALESCE(MAX(X),0″ in PostgreSQL.
- Statement terminators. This is a semi-colon in both. However in OpenIngres you can get away without a terminator in a number of places (for example single statements within “IF … ELSE … ENDIF” clauses). In PostgreSQL make sure every statement is terminated with a semi-colon.
- Functions in PostgreSQL to to be marked as “SECURITY DEFINER” so they run as the user that created them rather than the user that invokes/triggers them. This was the default in OpenIngres. This allows the procedure to carry out any operation even if the user does not have permission to do the same manually. In addition to this change functions should have the default public right to execute revoked and specific execute permissions granted as per base tables. The Theon schema can handle this easily enough by adding capability elements to the function definition although this isn’t implemented yet.
July 23, 2009
Ok, looks like we should use name space extensions under <xs:appinfo> to do extensions as its the standard way and won’t give us any back end problems. This does make the resulting schema definition clunky however its not necessarily the case that it would always be hand edited – we would expect a simple ER design tool to generate it for example, or we envisage translators to go from an SQL DDL or any other data model representation to a Theon XML Schema. We could even provide a slightly abstracted and more user friendly version of the schema language that is then translated via XSLT to the base standard. So not overly concerned about the clunkiness at this level. Next to define the structures under <xs:appinfo> to represent a simple RBAC (Role Based Access Control) system (because XACML looks fiendish and massively over complex for what we need), triggers and procedure wrappers (procedures themselves are code and could be in all manner of different base languages so we don’t intend to do anything other than provide a way of encapsulating them for version control and distribution) and indexes as these are used extensively for performance improvement in the current system.
July 22, 2009
Natively in XML Schema we can nicely express tables, columns, keys and referential integrity. However, our data model schema needs more than this – we need to express permissions/roles, triggers/procedures and indexes/views at the very least. We already use the annotation extension feature of XML Schema for adding some meta data (documentation and data supporting the existing user interface) so what are the options for adding these other structures, which are not strictly speaking meta data (although they would not be present in an XML document that instantiated content from the schema)?
- Use a separate name space with structures under <xs:appinfo>, as we do for meta data.
- Use foreign namespaced attributes. These can be added to any XSD element and work a bit like extensions under <xs:appinfo> but a more restricted since they can only take simple types.
- Extend our “implicit” structure of database->clusters->tables->columns with a parallel “abstract” structure representing the additional information. This would mean, for example, defining <xs:element name=”wibble” abstract=”true” type=”Role”> where “Role” is defined like any other complexType. Such elements could never be instantiated but are still part of the schema which we process directly. This is probably the most user friendly approach, but we would need a separate schema to define it (ie. this would move out of our existing schema representation to an “instantiation” (substitutedtype) of some kind of “ancillary data” schema.
- Copy and extend the base XML-Schema directly. Problem with this approach is it would likely break standard parsing tools.
For now we will assume a structure under <xs:appinfo> with a separate namespace defining the schema for it as we do for meta data.
July 21, 2009
So, as anticipated, we are going to have to do significant rewrites of the existing triggers and procedures. PostgreSQL triggers are unfortunately just not as sophisticated as Ingres rules. PostgreSQL does not support specific column updates in triggers. PostgreSQL does not support WHERE type clauses to further constrain when triggers fire. Most critically PostgreSQL does not support passing of triggering data in “statement level” triggers (it does for “row level” triggers) – this is used a lot in existing mark processing for efficiency as the data set from the triggering statement is used as a virtual temporary table to join against (limit data sets on) other tables for updates.
July 21, 2009
We now have a prototype XML Schema built from existing dictionary information. It includes some annotation which preserves meta-data from the dictionary that just passes through to the backend. Roughly the schema structure matches the proposal. Next is to produce SQL DDL from this to load the schema into PostgreSQL as well as load the user interface data into PostgreSQL. Also now looking at how to incorporate rules/triggers, procedures and permissions into all of this.
July 16, 2009
Our new database system (or revamped system if you prefer) needs a proper name. Following a suggestion from Iain Rae I think we should call it Theon. Euclid (the mathematician) is most famous for writing a set of 13 books called Elements. However, historians no longer believe Euclid had sole authorship. Many other mathematicians at the time wrote commentaries that later became incorporated into the piece of work we know today as Elements. Theon of Alexandria was one of these. Given our local system was originally to be replaced by the Euclid project but is now going to be retained to preserve local functionality supplemental to Euclid this name seems most appropriate and heralds perhaps a more collaborative approach to future development.