The etsencrypt_certs variable defined here in the "static" group file
is overwritten by the host variable. This is not doing anything (and
we don't have a logs.openstack.org any more as it is all in object
storage), remove it.
This explicitly tests connection through the load-balancer to the
gitea backend to ensure correct operation.
Additionally, it adds a check of the haproxy output to make sure the
back-ends are active (that's the srv_op_state field, c.f. )
Move the paste testing server to paste99 to distinguish it in testing
from the actual production paste service. Since we have certificates
setup now, we can directly test against "paste99.opendev.org",
removing the insecure flags to various calls.
To make testing more like production, copy the OpenDev CA into the
haproxy container configuration directory during Zuul runs. We then
update the testing configuration to use SSL checking like production
does with this cert.
Some of our testing makes use of secure communication between testing
nodes; e.g. testing a load-balancer pass-through. Other parts
"loop-back" but require flags like "curl --insecure" because the
self-signed certificates aren't trusted.
To make testing more realistic, create a CA that is distributed and
trusted by all testing nodes early in the Zuul playbook. This then
allows us to sign local certificates created by the letsencrypt
playbooks with this trusted CA and have realistic peer-to-peer secure
The other thing this does is reworks the letsencrypt self-signed cert
path to correctly setup SAN records for the host. This also improves
the "realism" of our testing environment. This is so realistic that
it requires fixing the gitea playbook :). The Apache service proxying
gitea currently has to override in testing to "localhost" because that
is all the old certificate covered; we can now just proxy to the
hostname directly for testing and production.
We have moved to a situation where we proxy requests to gitea (3000)
via Apache listening on 3081 -- this is useful for layer 7 filtering
like matching on user-agents.
It seems like we missed some of this configuration in our
load-balancer testing. Update the https forward on the load-balancer
to port 3081 on the gitea test host.
Also, remove the explicit port opening in the testing group_vars; for
some reason this was not opening port 3080 (http). This will just use
the production settings when we don't override it.
For the past six months, all our mailing list sites have supported
HTTPS without incident. The main downside to the current
implementation is that Mailman itself writes some URLs with an
explicit scheme, causing people submitting forms from pages served
over HTTPS to get warnings because the forms are posting to plain
HTTP URLs for the same site. In order to correct this, we need to
tell Mailman to put https:// instead of http:// into these, but
doing so essentially eliminates any reason for us to continue
serving content over plain HTTP anyway.
Configure the default URL scheme of all our Mailman sites to use
HTTPS now, and set up permanent redirects from HTTP to HTTPS, per
the examples in the project's documentation:
Also update our testinfra functions to validate the blanket
redirects and perform all other testing over HTTPS.
Once this merges, the fix_url script will need to be run manually
against all lists for the current sites, as noted in that document.
When we migrated this to ansible I missed that we didn't bring across
the storage-aggregation.conf file.
This has had the unfortunate effect of regressing the xFilesFactor set
for every newly created graphite stat since the migration. This
setting is a percentage (0-1 float) of how much of a "bucket" needs to
be non-null to keep the value when rolling up changes. We want this
to be zero due to the sporadic nature of data (see the original change
This only affected newly created statistics, as graphite doesn't
modify this setting once it creates the whisper file. This probably
helped us overlook this for so long, as longer-existing stats were
operating correctly, but newer were dropping data when zoomed out.
Restore this setting, and double-check it in testinfra for the future.
For simplicity and to get this back to the prior state I will manually
update the on-disk .wsp files to this when this change applies.
We were using /var/run/ansible/zuul_reboot.lock to flock around this
cron job. Unfortauntely it seems /var/run/ansible does not exist so the
flock command fails. Move the file to /var/run/zuul_reboot.lock to work
Note that we want to use /var/run since it is a tmpfs which means if the
server unexpectedly reboots we'll automatically clear the lock.
A few formatting fixes
* try to more consistently use shell-session formatting for shell
sessions (makes it easier to copy-paste).
* fix up and use more `` around verbatim/code things.
* Gerrit Configuration : there's no db to set the ICLA fields in now,
* Duplicate Accounts : add required arg "origin" to git fetch command
* Deactivating account : can not delete comments via sql query,
This adds upgrade testing from our current Gerrit version (3.5) to the
likely future version of our next upgrade (3.6).
To do so we have to refactor the gerrit testing becase the 3.5 to 3.6
upgrade requires we run a command against 3.5. The previous upgrade
system assumed the old version could be left alone and jumped straight
into the upgrade finally testing the end state. Now we have split up the
gerrit bootstrapping and gerrit testing so that normal gerrit testing
and upgrade testing can run these different tasks at different points in
the gerrit deployment process.
Now the upgrade tests use the bootstrapping playbook to create users,
projects, and changes on the old version of gerrit before running the
copy-approvals command. Then after the upgrade we run the test assertion
portion of the job.
This adds Gerrit 3.6 image build jobs as well as CI testing for this
version of Gerrit. Once we've got images that build and function
generally we'll reenable the upgrade job and work through that.
This adds a weekly cronjob that will reboot and update our entire zuul
cluster gracefully. The time frame chosen for this should be after North
America begins its weekend and before Europe starts their week. The idea
is that we're doing this during the quiet time of our week.
We previously auto updated nodepool builders but not launchers when new
container images were present. This created confusion over what versions
of nodepool opendev is running. Use the same behavior for both services
now and auto restart them both.
There is a small chance that we can pull in an update that breaks things
so we run serially to avoid the most egregious instances of this
This is a new config option for Gerrit 3.5. While it defaults to true we
set it explicitly to true to avoid any changes in behavior should that
default change eventually with newer Gerrit. They note this is expensive
to calculate, but our users rely on it and it hasn't caused us problems
yet. We can always explicitly disable it in the future if that becomes
Add released Fedora 36 to the mirror. Traditionally we have kept two
releases (prior and current) around; but depending on what is broken
often we drop the prior release earlier if it is not worth fixing;
this is what happened with F34. Ergo this is adding 36 and leaving
35, for now.
As part of the Gerrit 3.5 upgrade we are also upgrading the reviewdb
to the latest mariadb LTS. This should be merged after the update
Previously the merger docker-compose restart value was set to always.
This caused the merger to immediately restart after asking it to
gracefully stop and our check for the merger stopping:
docker-compose ps -q | xargs docker wait
never saw it as being stopped.
Make the mergers match executors and restart only on failure. This
should allow us to gracefully stop the mergers with intention and detect
they are stopped for maintenance purposes.
This serves two purposes. The first is that not all packages are updated
by unattended-upgrades beacuse it may not be safe to upgrade packages
while services are running. We should be safe in this situation because
we've gracefully stopped services and can proceed with package updates.
The other is unattended upgrades runs daily which means we could end up
almost 24 hours out of date prior to rebooting. This ensures we have the
latest and greatest packages installed just prior to rebooting.
This handles rolling the mergers and executors, but not yet
Also, it does the executors in complete batches of 6, but could be
improved to stop 6 and then do each of the next as the first ones
3.4.5 is a fairly minor update. Some bugs are fixed and jgit is updated.
3.4.5 release notes:
3.5.2 is a bigger update and important adds support for being able to
upgrade to 3.6.0 later. There is a new copy-approvals command that must
be run offline on 3.5.2 before upgrading to 3.6.0. This copies approvals
in the notedb in a way that 3.6.0 can handle them apparently. The
release notes indicate this may take some time to run. We don't need to
run it now though and instead need to make note of it when we prepare
for the 3.6.0 upgrade.
3.5.2 release notes:
For now don't overthink things and instead just get up to date with our