Adopting Theon – Part 3.2

The first part of this series discussed why you might use Theon as a Schema Management Tool for your PostgreSQL database. The second part part covered the necessary packages and installation. This final part is an introductory tutorial.

In the first part of this tutorial we created a database and started to manage it in Theon.  In this second part we will use TheonCoupler to load data automatically into the database. In the third part we will use TheonUI and create a desktop for the database.  In the fourth and final part we will package and distribute the result.

External data can be brought into (or accessed from) PostgreSQL in many ways. Here we will transfer it in at regular intervals from an external source. The command below will act as our live source of data.

rpm -qa --qf '"%{NAME}","%{VERSION}","%{RELEASE}","%{ARCH}","%{BUILDTIME}"\n'

This returns information on all installed packages in a CSV format. We need to create a table in our database to hold this data. In this example it will be the same as our package table, we can’t use that table directly since it is the sync target table – in most cases the external source data is not a direct alignment for an existing table like here. Though in this case, since it is, we can create it by cheating and using our existing table as a template.

psql packages -c "CREATE TABLE rpmdata AS SELECT * FROM package;"

Since this change was made on the live database we need to import it back into Theon.

ttkm import

Next we need to configure the Stream and Couple that will process the data. This needs to be done via TheonUI. So first create an instance of the server for the Theon management database. The default configuration will suffice.

ttkm self server ui

Then using a locally running browser point it at localhost:8000/ui.html to bring up the user interface. Follow the steps below to add the Stream.

Open the Streaming desktop.
On the first panel (Choose Model) ensure the right model is shown (in this case packages should be the only one) and then:
Add a new row in the Streams box:
 Stream-Label = rpm
 Origin = RPM
 ? Feed Entity = rpmdata
 Source = Pipe
 Format = CSV
 Name[Info] = rpm

Above creates a stream agent called rpm which takes data from a standard unix pipe in CSV format and maintains the rpmdata table as a copy of that content.

Now define a Couple to sync data from the source to our target table. We only need one in this instance.

Open the Stream panel and ensure rpm is shown
and then add one new row into the Couples box:
 Position SourceEntity TargetEntity Name[Info]
 1 rpmdata package rpmpackage

Open the Couple panel and enter and save the following for the couple (keep quotes):
 Source Origin Value = 'RPM'
 Then add two new rows in the Bonds box:
 Position Type
 1 Handle
 2 Column

Open the Bond panel and add rows in the Pairs box for each bond type as below:
 For Handle:
 Position SourceColumn TargetColumn
 1 name name
 2 architecture architecture
 For Column:
 Position SourceColumn TargetColumn
 1 version version
 2 release release
 3 buildtime buildtime

For the couple we have defined the source table (rpmdata) and target table (package). We have also defined the unique handle pair (name:name + architecture:architecture) to tie records together between source and target. Finally we have defined the data column pairs, the values of which will be maintained by the stream process.

Now exit the server process with Control-C.

Export the modified schema and review the new content.

ttkm export
[timc@babel packages]$ ls -R schemat
schemat:
live  packages.xsd  process  relation  stream

schemat/live:
packages.xsd

schemat/process:
_th_coupler_package_check.xsd  _th_rpm_package_couple.xsd

schemat/relation:
archcount.xsd  package.xsd  rpmdata.xsd

schemat/stream:
rpm.xsd

The rpmdata table has been added. The stream configuration has been added. Some stream management processes have been created. There will also be some changes to the target table (package) structure (metadata columns and triggers that have been added to support the content being managed by TheonCoupler).

Now we need to upgrade the live database structures. First remove the temporarily created rpmdata table (see part one for why).

psql packages -c "DROP TABLE rpmdata;"
ttkm upgrade

Create the stream configuration for the toolkit and have a look at it.

ttkm derive coupler
cat derived/coupler/rpm.cpl

Now we can run the stream, load the data and process all associated couples (just the one here) to load the database content. Then check that content using the view.

rpm -qa --qf '"%{NAME}","%{VERSION}","%{RELEASE}","%{ARCH}","%{BUILDTIME}"\n'| ttkm stream agent rpm refresh
psql packages -c "SELECT * FROM archcount;"
  arch  | total 
--------+-------
 (none) |     7
 i686   |   111
 x86_64 |  2614
 i486   |     1
 noarch |  1750
(5 rows)

Rows in package will now be added, deleted and updated automatically based on what is in the source table (rpmdata) and ultimately the content of the live data fed in from the rpm command every time the above is run.

Next creating configuration for TheonUI to support easier user access to the database content.