Skip to content

Implementation: various options

Pull-based model

One suggested option is to have each part of the StorPool build/test infrastructure store the result of its own actions into its own set of files or other mechanisms that allow outside queries, and have a central aggregator periodically (or as triggered) connect to all of them, collect all their current state, and generate the current state (either perform the necessary updates to the database, or directly generate a set of CSV, JSON, etc. files reprenting the current state).

Required changes:

  • teach each service to store its data locally in a structured way
  • implement an aggregator that collects the data, processes it, and presents it
  • teach each service to fetch and consume (possibly filter) the aggregated data

Pros:

  • no API server, so no custom client invocations needed on the services side
  • no SQL client invocations needed at the client side, either
  • the knowledge about what to update and how is contained in one single place - the aggregator
  • any restrictions as to which object attributes should not ever be updated are also only needed in the aggregator, unless the attributes come from the data provided by the services themselves
  • if the aggregated data is not presented as a database, the services need not be aware of any database structure at all, only of the format of the presented files
  • no authentication credentials needed at all on the services side

Cons:

  • each service must store the results of its operation locally; for some parts of the build/testing infrastructure this may mean designing new persistent ways to store structured data
  • not possible to implement any consistency restrictions: the local data generated by the services may conflict, and the aggregator may need to resolve those conflicts before producing the output data
  • not possible to implement any restrictions on which object attributes should probably never be modified after the initial creation
  • the aggregator needs to be aware of the way each service stores its data
  • the aggregator needs to be immediately aware of new or decommissioned services
  • if the aggregated data is not presented as a database, each service must grab it all and perform any filtering locally
  • the information is only available when the aggregator runs, so some time may pass before one service can see the result of another service's actions; this may lead to race conditions if one service triggers another before the data has been aggregated

Push-based model: direct database access

In this option each service records the results of its actions (new records, updated records, or combinations of both) directly into a database run centrally by a suitable RDBMS (e.g. PostgreSQL).

Required changes:

  • teach each service to connect to the database and fetch the input data it needs
  • teach each service to connect to the database and record the results of its actions

Pros:

  • no API server, so no custom client invocations needed at the services side
  • standard SQL as the way for services to perform the updates
  • if PostgreSQL is chosen as the RDBMS, the standard psql client has a simple way to execute two mostly-independent statements in a single transaction
  • the database will guarantee the data consistency
  • changes in a single service's behavior do not necessarily need to be reflected elsewhere; the way it gathers, processes, and outputs data and the changes it makes to the database are contained with the service itself
  • new services can be deployed or existing ones can be decommissioned completely independently of the operation of the other services
  • the updated information is available immediately for anyone who queries the database

Cons:

  • development and maintenance costs for the database access within the services
  • quoting values correctly may be difficult and/or cumbersome in shell programs that use the psql, mysql, etc. command-line clients
  • each service must learn to access the RDBMS, send read-only informational queries, and (in the suggested design that uses database generations) perform transactions comprising two queries
  • each service must be fully aware of the database structure, at least for the tables describing the objects that it handles
  • difficult to implement restrictions on which object attributes should probably never be modified after the initial creation (the RDBMS needs to support ACLs with table column granularity and differentiate between INSERT and UPDATE statements)
  • each service must have access to authentication credentials for the database

Push-based model: a JSON-over-HTTP API

In this option the database is accessed via a custom web service that handles requires via a Kubernetes-like API. The API has an OpenAPI specification, so that clients have a choice:

  • access the API directly using HTTP requests made using the library or tool of their choice
  • access the API through language bindings generated from the OpenAPI specification
  • access the API via a single, self-contained command-line client

Required changes:

  • design the API and provide an OpenAPI specification
  • implement a webserver that accepts the requests and performs the single-read or two-step-write database queries as necessary
  • teach each service to access the API and fetch the input data it needs
  • teach each service to access the API and record the results of its actions

Pros:

  • the API server and the database will guarantee the data consistency
  • the API server will guarantee that no service attempts to modify any object attributes other than the few that are expected to ever change
  • services do not need to be aware of the underlying database structure, only of the object types defined in the data model section
  • changes in a single service's behavior do not necessarily need to be reflected elsewhere; the way it gathers, processes, and outputs data and the changes it makes to the database are contained with the service itself
  • new services can be deployed or existing ones can be decommissioned completely independently of the operation of the other services
  • the updated information is available immediately for anyone who queries the database

Cons:

  • development and maintenance costs for the API server and the various ways to access it in the services
  • the custom API requires custom code in each service (even if only "invoke this command-line tool with these parameters")
  • each service must learn to access the API and send requests
  • each service must have access to authentication credentials for the API