PostgreSQL database model

Database structure

`cust_cluster`

The description of a single deployed StorPool cluster, whether in our internal testing environment or at a customer site.

Data model fields:

name
cltype

Additional fields:

deleted_at (timestamp): the moment this record was marked as deleted, or the special value -infinity otherwise

Referential integrity:

cltype references cust_cluster_type(id)

Automated event journalling:

cust_cluster_record_trigger

`cust_cluster_type`

The type (support state) of a StorPool cluster, whether deployed in our testing environment or in a customer's production site.

Data model fields:

id
name

Additional fields:

None

Automated event journalling:

None

`cust_host_kver`

A single kernel version installed on a host in a StorPool cluster.

Data model fields:

cluster_name
host_name
distro
version
current

Additional fields:

deleted_at (timestamp): the moment this record was marked as deleted, or the special value -infinity otherwise

Referential integrity:

cluster_name references cust_cluster(name)
(distro, version) reference kver(distro, version)

Automated event journalling:

cust_host_kver_record_trigger

`distro`

A Linux distribution that StorPool can be installed on.

Data model fields:

name
variant
supported

Additional fields:

None

Automated event journalling:

distro_record_trigger

`event`

An automatically-maintained log-like table of changes made to some types of records, e.g. distro, kver, sp_build, etc.

Data model fields:

table_name
pkey
action
generation
fields
tstamp

Additional fields:

id (integer): a unique identifier of this event, assigned by the database management system

Functions invoked by client:

generation_drop_empty(generation_id)
generation_drop_empty_current()

`generation`

A single set of related changes made to the sqern database.

Data model fields:

id
comment
username
ipaddr
tstamp

Additional fields:

None

Functions invoked by the clients:

generation_id()
generation_start(comment)

Internal functions:

generation_id_if_assigned()

`kver`

A Linux kernel version for a specific Linux distribution. These objects carry no state at all by themselves; they exist for the purposes of referential integrity (foreign key constraints).

Informational queries should probably use the kver_not_ignored view instead.

Data model fields:

distro
version

Additional fields:

archived (boolean): is this kernel version no longer taken into consideration

Referential integrity:

distro references distro(name)

Automated event journalling:

kver_record_trigger

`kver_fetched`

A Linux kernel version that our build infrastructure can attempt to build a StorPool package for, i.e. the OS packages containing the kernel header files have been successfully fetched from the upstream repositories of the Linux distribution.

Data model fields:

distro
version
tstamp

Additional fields:

None

Referential integrity:

(distro, version) reference kver(distro, version)

Automated event journalling:

kver_fetched_record_trigger

`kver_ignore`

A Linux distribution-specific pattern that specifies that some kernel versions should be ignored.

Informational queries should not use this table directly, but the kver_not_ignored view instead.

Data model fields:

distro
pattern
comment

Additional fields:

deleted_at (timestamp): the moment this record was marked as deleted, or the special value -infinity otherwise

Automated event journalling:

kver_ignore_record_trigger

`kver_not_ignored`

A view over the kver table that skips any versions that match the patterns recorded in the kver_ignore table for the corresponding Linux distribution. This view is automatically refreshed after any changes to the kver and kver_ignore tables by triggers invoking the kver_update_not_ignored() function.

Fields inherited from the kver table:

distro
pattern
comment

Additional fields:

None

Automated refreshing:

kver_update_not_ignored()
kver_update_not_ignored_kver
kver_update_not_ignored_kver_ignore

`kver_seen`

A Linux kernel version that has been seen in the upstream package repositories of a specific Linux distribution.

Data model fields:

distro
version
tstamp

Additional fields:

None

Referential integrity:

(distro, version) reference kver(distro, version)

Automated event journalling:

kver_seen_record_trigger

`sp_available_kver`

FIXME: something about StorPool packages uploaded to the vault that may be used for testing or deployment or something.

Data model fields:

FIXME: this one is not even described in the data model...

Additional fields:

storpool_revision (string): the "21.0.243.620bc3729" version string of the StorPool source tree
release_target (string): the StorPool testing infrastructure suite that this build is initially meant for e.g. "sp-lab"
distro (string): the Linux distribution the modules were built for
version (string): the kernel version the modules were built for
deleted_at (timestamp): the moment this record was marked as deleted, or the special value -infinity otherwise

Referential integrity:

(distro, version) reference kver(distro, version)

Automated event journalling:

sp_available_kver_record_trigger

`sp_build`

An attempt to build StorPool from a specific source tree.

Data model fields:

id
tstamp
result
success (generated)
jenkins_job
storpool_revision
change_id
official
release_target
dockernel_profile

Additional fields:

deleted_at (timestamp): the moment this record was marked as deleted, or the special value -infinity otherwise

Automated event journalling:

sp_build_record_trigger

`sp_build_kver`

A single kernel version that a particular StorPool build attempts to build the StorPool kernel modules for.

Data model fields:

build_id
distro
version

Additional fields:

deleted_at (timestamp): the moment this record was marked as deleted, or the special value -infinity otherwise

Referential integrity:

build_id references sp_build(id)
(distro, version) reference kver(distro, version)

Automated event journalling:

sp_build_kver_record_trigger

`sp_test`

An attempt to run the StorPool testing suite on the packages produced by a StorPool build.

Data model fields:

id
tstamp
build_id
test_flavor
test_cluster

Additional fields:

deleted_at (timestamp): the moment this record was marked as deleted, or the special value -infinity otherwise

Referential integrity:

build_id references sp_build(id)

Automated event journalling:

sp_test_record_trigger

`sp_test_kver`

A single kernel version that a particular StorPool test run attempts to validate the StorPool kernel modules for.

Data model fields:

test_id
distro
version

Additional fields:

deleted_at (timestamp): the moment this record was marked as deleted, or the special value -infinity otherwise

Referential integrity:

test_id references sp_test(id)
(distro, version) reference kver(distro, version)

Automated event journalling:

sp_test_kver_record_trigger

`sp_test_result`

The result of a test run.

Data model fields:

test_id
tstamp
result
success

Additional fields:

deleted_at (timestamp): the moment this record was marked as deleted, or the special value -infinity otherwise

Referential integrity:

test_id references sp_test(id)

Automated event journalling:

sp_test_kver_record_trigger

General workflow recommendations

Use the psql command-line tool in shell programs, but carefully!

When passing string values to INSERT, SELECT, UPDATE, DELETE, etc. queries, the values MUST be quoted appropriately: any single-quote characters must be doubled. The following shell function may come in useful:

psql_qstring()
{
    local value="$1"

    printf -- '%s\n' "$value" | sed -e "s/'/''/g"
}

Newline characters within the string values will still pose a problem. If possible, avoid them. If not, take extra care and use PostgreSQL's string constants with C-style escapes ...but only with extreme care!

For multiline queries, as well as queries that insert more than one row, it is better to construct the query beforehand in a temporary file and use the psql tool's --filename (-f) option.

Run informational queries against the database (Python)

connect to the database using the appropriate PostgreSQL service=sqern-... record
do not issue a generation_start(...) or generation_id() query
issue SELECT ... queries as needed
disconnect from the database

Run informational queries against the database (shell programs)

use the psql command-line tool
specify the appropriate PostgreSQL service=sqern-... record
do not issue a generation_start(...) or generation_id() query
pass one or more -c 'SELECT ...' query parameters as needed
IMPORTANT: quote string values by doubling any single-quote characters!
use the psql tool's --no-align (-A), --field-separator (-F), --tuples-only (-t), and other output formatting options as needed
IMPORTANT: check the exit code of the psql tool

Add, update, or delete database records (Python)

connect to the database using the appropriate PostgreSQL service=sqern-... record
start a transaction using a BEGIN query
optionally (recommended), issue a generation_start() query with a brief comment describing the purpose of the changes to the database
insert new records using INSERT INTO ... (ON CONFLICT DO NOTHING if needed) or COPY ... FROM STDIN queries
optionally (recommended), issue a generation_drop_empty_current() or generation_drop_empty(generation_id) query if there is the possibility that no new records were inserted at all
commit the transaction using a COMMIT query

Note that several of the above steps (connect, start a transaction, invoke generation_start(), invoke generation_drop_empty(), commit the transaction) are made easier by the sqern_tools.util.connect() helper function:

with util.connect(comment="update customer host kernel versions") as (conn, cur, generation):
    # use the psycopg client cursor object cur's methods

If any of the values returned by the connect() helper function are not needed, _ placeholders may be used (e.g. with connect(...) as (_, cur, _):). FIXME: make connect() return a frozen dataclass instead.

Add, update, or delete database records (shell programs)

prepare the queries to run, either in shell variables or, if more than one record should be inserted, into a temporary file
IMPORTANT: if using INSERT, UPDATE, or DELETE queries, quote any string values by doubling any single-quote characters!
IMPORTANT: if using COPY ... FROM STDIN queries, make sure that the values do not contain the separator character!
use the psql command-line tool and pass at least the --single-transaction (-1) and -v 'ON_ERROR_STOP=on' command-line options
specify the appropriate PostgreSQL service=sqern-... record
optionally (recommended), pass a -c 'SELECT generation_start(...) query with a brief comment describing the purpose of the changes to the database
pass one or more -c 'INSERT/UPDATE/DELETE ...' or -f "$copyfile" query parameters as needed
optionally (recommended), pass a -c 'SELECT generation_drop_empty_current()' query if there is the possibility that no new records were inserted at all
IMPORTANT: check the exit code of the psql tool

Sample workflow scenarios

Record the kernel versions used by customer hosts

see the sample load-cust-data Python tool (python/src/sqern_tools/load_cust_data.py)
grab data from the monitoring reports
insert newly-seen kernel versions into the kver table
insert, update, and/or mark as deleted cluster records in the cust_cluster table (one record for each customer cluster)
insert, update, and/or mark as deleted host records in the cust_host_kver table: cluster, host, distro, version (many records for each customer host)
we do not need a separate cust_host table, right? (aggregate queries exist after all)

The StorPool automated build-and-test pipelines

check for new kernel versions in the upstream distros (sh/check-new-distro-kver.sh):
- insert newly-seen kernel versions into the kver table
- insert newly-seen kernel versions into the kver_seen table; this is the input table for the next step
fetch the development packages for newly-seen kernel versions (sh/fetch-distro-kver.sh):
- grab kver_seen records that are not yet present in kver_fetched
- run kernq to fetch the files
- insert newly-fetched kernel versions into the kver_fetched table; this is the input table for the next step
build StorPool for kernel versions that we have the devel files for (not yet implemented, autobuild? the Jenkins pipelines?):
- if we're building for newly-seen kernel versions, grab kver_fetched records that are not yet present in the sp_build table
- otherwise, make sure there are kver_fetched records for what we want to build
- when a build is started, insert a record into the sp_build table with a null result field, and note the id of the record (e.g. PostgreSQL's INSERT INTO ... RETURNING id syntax). Make the build ID part of the build artifacts so that the next step can use it.
- if we know which kernel versions we are building for, immediately insert records into the sp_build_kver table
- when a build succeeds or fails, update the result field (this will automatically make the success field non-null)
- possibly, only update the kernel versions now... figure this out
run the autotest suite for ready StorPool builds
- we should always know what we are testing for; make sure there is a sp_build record for that
- when a test run is started, insert a record into the sp_test table with a null result field, and note the id of the record (e.g. PostgreSQL's INSERT INTO ... RETURNING id syntax). Make the test run ID part of the run artifacts so that any reports can use it.
- we should always know which kernel versions we are testing for; immediately insert records into the sp_test_kver table
- when a test run succeeds or fails, update the result field (this will automatically make the success field non-null)