Monitoring Tables: The first phase of the
Database Change Notifications-to-RSS Project
Overview:
Create an RSS item for each INSERT, UPDATE, or
DELETE of a row from selectively monitored
tables
Operational
Status:
Can be determined by reading the directions on
this page
Subscribe:
Subsrcibe to Scott's RSS using http://www.scharp.org/rss/finance_tables.rss
Requirements:
- Monitor the following tables on the main
database for changes on the server
db.scharp.org
- pdb.protocol (network_id field tag in
the xml file for filtering; example at
W:\web\cgi-bin\rss)
- admin.employee (employee_id tag)
- admin.employee_job (supervisor_id tag)
- admin.tbl_tbn (supervisor_id tag)
- public.grants (network tag)
- public.projectidcontractmapping
(project_id tag)
- public.projectid_budref(project_id tag)
- vidi_secure.tbl_employee (employee_id)
- Create a separate RSS feed for each
monitored table.
- Feed entries will be in the form of this
sample:
- The RSS Feeds will have no inherit security
other than being located inside the Hutch
firewall.
- We will utilize Wolfe's Email-to-RSS script
for creating the new RSS items. This we'll
prepend new RSS items to the top of the feed.
Project Milestones & Timeline
- TBD - Rough estimate is 80 hours of
development time - estimate July 15th for
testing with Finance.
-
Mija estimates that
the admin and pdb
schemas will no longer be
replicated sometime in the 2nd half of May.
That's a prequisite for using the
trigger-based mechanism for monitoring changes
made to a table. Done.
-
Test whether
triggers will work on the admin and pdb
schemas after the new sync'ng mechanism is
in place. Chuck has given me some guidance on
how to test this ( 6 hours). Yes, they do
work. More rigorous testing (2 more
hours) as recommended by Chuck also
successful..
-
Implement a common pg/PSQL
trigger for monitoring tables and recording
changes in a new event table and a shadow or
history table for each monitored table. Using
Robert's shadow table PL/pgperl script
as a base. (12 hours). Mostly done. Forecast 8
more hours.
- Need to ensure recorded data is in a
format that is easy to extract for
monitoring script and html page generation
script. (+4 hours)
- Optimize functions for database
performance by batching table INSERT's,
preferring pg_catalog over
information_schema tables, and adding
appropriate indexes. (+4 hours)
- Because the PL/perl language doesn't
allow you to catch database exceptions, if
one of the triggers for this project
throws an exception, it may (probably
will) cause the nightly sync process to
fail. As discussed with Robert, we will
either have to turn on the unsafe
(unsecure) version of PL/perl language
that does have the ability to catch
database exceptions; or I will need to
rewrite our trigger functions in a safe PL
language that can handle exceptions. The
only such language appears to be PL/pgsql.
(Probable +16 hours).
- Working
with Robert on the issue of how to
extract the fields of a RECORD Pseudo
Data Type, e.g., OLD and NEW, into an
array of indexable TEXT values. May
involve coding a user-function for PostgreSQL in C..
- Aborting
the trigger mechanism for now. Should be
more easily doable in PostgreSQL 8.4..
-
Due to the
difficulty of implementing the above and the
concerns expressed by Robert, Mija, and Chuck,
we've decided to switch from using database
triggers for our implementation to using
polling. Estimate 40 hour to implement the
polling version.
- Worked with Lloyd Albin to modify his Materialized view
functionality to allow storing of
both the current and last version of a
materialized view, so that differences
between the two, i.e., the changes, can
easily be detected. (Done.)
- Write a PL/pgSQL function to record the
primary keys of a monitored table into a
meta-table to be used by the difference
function (2 hours).
- Write a PL/pgSQL function to record the
differences between two versions of a
materialized view into event tables. (6
hours). Had to re-write this as a PL/Perl
function in order to extract the column
names, due to their dynamic nature. (7
more hours).
-
Implement a script
to monitor the event table for new events and
send the results to the email-to-RSS program.
As per Robert, the scripts will be written in
the python language. (16 hours)
-
Create an RSS feed for each table to be
monitored. (Should this be automated? Should a
master index of available RSS feeds be created
somewhere?) (1 hour or 4 hours) For now, went
with a single rss feed that receives the
updates for all monitored tables -
finance_tables.rss.
-
Implement a script
to create a html page for each database event
to display the details of the database change
- to be linked to from the RSS entries. (22
hours)
-
Doug expressed a
desire to have a PostgreSQL view for
each monitored table that records the complete
status of a row as it changes over time: it's
current values; how it was changed; and the
date of change; such that these views could be
imported into Microsoft Excel and be acted on
as a 'pivot table'. This would facilitate them
being able to filter and drill-down with the
data in a way that suits their work practice.
When certain fields on a row are UPDATE'd, it
will not be necessary to record the previous
values side-by-side with the current values.
Instead, Finance will be able to compare the
newer version of the row to the earlier
version of the row - as long as the 'original'
copy of the row is loaded into the view at the
time monitoring begins.
Future TODO's (updated 9/1/09, DL)
- Request: In the RSS view,
we need views that return condensed, relevant
information as opposed to the entire table
view. For instance, change in admin database
for Bartz, Traci... we have to use a separate
tool (Pg III) to obtain the meaningful
metadata. A second example was from the
protocol table, where change from stage 2 to
3... it would be best to have the description
or name of the stage instead of 2 and 3. The
goal is to reach a condensed
screen shot that the ordinary user could
quickly scan in the morning.
- pdb_protocol
- Instead of using tbl_protocol,
please use
views.views.rss_update_protocol. The
primary key is probably a composite
key and we may have to add protocol_id
to the view. We stripped it since it
was not as useful. Let's talk about
this.
- admin_tbl_employee
- use views.views.rss_update_employee
- admin_tbl_employee_job
- use
views.views.rss_update_employee_job
- admin_tbl_tbn
- No change to view. Continue with
table only.
- public.grants
- No change to view. Continue with
table only.
- public.projectidcontractmapping
- No change to view. Continue with
table only.
- public.projectid_budref
- No change to view. Continue with
table only.
- vidi.views.employee_list.
- This was suggested as an addition,
as table only on 9/2/09.
- Request: Pursue the
sophisticated version of query. Higher levels
of changes https://twiki.vidi.org/bin/view/PopSci/SCHARP/CustomQueryDatabaseMonitoring
- Request: List of what we
can answer by using a view.
- Note: The "Open History
Table" did work IF we were already logged into
phpPGadmin.
- Cancelled: Since we would
like to rollout specific feeds to various
users (e.g. Admin - Rose or HVTN - Drienna), it
would be best if the RSS could be subscribed
to using separate feeds as opposed to a single
feed.
- Three user groups:
- All current views and tables -
Finance & DDs, where feed name
Finance_tables
- Admin tables (tbn, employee, and
employee job) - Rose and Helen, where
feed name admin_tables
- Protocol and Finance table
(protocol, grants,
projectidcontractmapping,
projectid_budref, where feed name is
public_pdb_tables
- Cancelled on 9/1/09. Can
we have the table open in Google spreadsheets
on select (Ben). Our original thought was to
cut/paste the html in excel or use the history
table view in tableau. Doug talked with Ben
about this and demonstrated Excel solution and
this seems good enough for now.
Bug's and
Issues
- Can only monitor database tables with a
primary key defined. (Composite - or multiple
- primary keys are also supported). Lloyd has
indicated that matviews are basically tables
and do therefore have a primary key.
- If a change to the structure (columns) or
constraints is made to the monitored table,
those changes may cause the monitoring of that
table to completely fail - with an error
message - or for some data to be silently
ignored. This is because the 'definition' of
the monitored table is recorded the first time
monitoring is setup on a table and never
updated subsequently. (The first version of
the materialized view table contains an exact
copy of the original's table structure and
constraints.) Being able to accurately detect
these circumstances would be possible with
more coding - checking the table structure and
constraints on every poll and comparing with
the previously detected values. Being able to
automatically handle/compensate for such
changes without manual intervention may not be
possible in the current implementation.
- Security for this system, as to who can
view database changes, is only partially
implemented at this time. The content of
changes is secure on the webserver
but not on the filesystem.
- If the target RSS feed does not already
exist, the RSS notifications will fail. The
only indication for such failures will be
bounced emails that say Undelivered Mail
Returned to Sender, unknown user. There is
currently no easy way to resend such failed
RSS notifications. From a engineering point of
view, it would be better to abandon the use of
the email-to-RSS.pl script in order to make
this system more robust.
- Because we are using polling - instead of
triggers - we are not capturing the
information on which database user made a
change to the data.
- There is no user-interface for setting up
the monitoring on particular tables, nor for
monitoring the status and health of the
application. Everything is done via Postgresql
tables and functions.
Project
Deliverables
Current Program Source Code
-- ScottLangley - 07 May
2009