The etsencrypt_certs variable defined here in the "static" group file
is overwritten by the host variable. This is not doing anything (and
we don't have a logs.openstack.org any more as it is all in object
storage), remove it.
Change-Id: I6910d6652c558c94d71b1609d1194b654bc5b42d
This explicitly tests connection through the load-balancer to the
gitea backend to ensure correct operation.
Additionally, it adds a check of the haproxy output to make sure the
back-ends are active (that's the srv_op_state field, c.f. [1])
[1] http://docs.haproxy.org/2.6/management.html#9.3-show%20servers%20state
Change-Id: Ia896134d6a9b6951acebfbf8b0b32a7ef8b87777
Move the paste testing server to paste99 to distinguish it in testing
from the actual production paste service. Since we have certificates
setup now, we can directly test against "paste99.opendev.org",
removing the insecure flags to various calls.
Change-Id: Ifd5e270604102806736dffa86dff2bf8b23799c5
To make testing more like production, copy the OpenDev CA into the
haproxy container configuration directory during Zuul runs. We then
update the testing configuration to use SSL checking like production
does with this cert.
Change-Id: I1292bc1aa4948c8120dada0f0fd7dfc7ca619afd
Some of our testing makes use of secure communication between testing
nodes; e.g. testing a load-balancer pass-through. Other parts
"loop-back" but require flags like "curl --insecure" because the
self-signed certificates aren't trusted.
To make testing more realistic, create a CA that is distributed and
trusted by all testing nodes early in the Zuul playbook. This then
allows us to sign local certificates created by the letsencrypt
playbooks with this trusted CA and have realistic peer-to-peer secure
communications.
The other thing this does is reworks the letsencrypt self-signed cert
path to correctly setup SAN records for the host. This also improves
the "realism" of our testing environment. This is so realistic that
it requires fixing the gitea playbook :). The Apache service proxying
gitea currently has to override in testing to "localhost" because that
is all the old certificate covered; we can now just proxy to the
hostname directly for testing and production.
Change-Id: I3d49a7b683462a076263127018ec6a0f16735c94
We have moved to a situation where we proxy requests to gitea (3000)
via Apache listening on 3081 -- this is useful for layer 7 filtering
like matching on user-agents.
It seems like we missed some of this configuration in our
load-balancer testing. Update the https forward on the load-balancer
to port 3081 on the gitea test host.
Also, remove the explicit port opening in the testing group_vars; for
some reason this was not opening port 3080 (http). This will just use
the production settings when we don't override it.
Change-Id: Ic5690ed893b909a7e6b4074a1e5cd71ab0683ab4
I494a21911a2279228e57ff8d2b731b06a1573438 didn't promote the gerrit
images, so 3.6 remains untagged. Update the stamp to trigger this.
Change-Id: I48c5a5d69fc31bb81f220566bc4360b762a51d63
For the past six months, all our mailing list sites have supported
HTTPS without incident. The main downside to the current
implementation is that Mailman itself writes some URLs with an
explicit scheme, causing people submitting forms from pages served
over HTTPS to get warnings because the forms are posting to plain
HTTP URLs for the same site. In order to correct this, we need to
tell Mailman to put https:// instead of http:// into these, but
doing so essentially eliminates any reason for us to continue
serving content over plain HTTP anyway.
Configure the default URL scheme of all our Mailman sites to use
HTTPS now, and set up permanent redirects from HTTP to HTTPS, per
the examples in the project's documentation:
https://wiki.list.org/DOC/4.27%20Securing%20Mailman%27s%20web%20GUI%20by%20using%20Secure%20HTTP-SSL%20%28HTTPS%29
Also update our testinfra functions to validate the blanket
redirects and perform all other testing over HTTPS.
Once this merges, the fix_url script will need to be run manually
against all lists for the current sites, as noted in that document.
Change-Id: I366bc915685fb47ef723f29d16211a2550e02e34
When we migrated this to ansible I missed that we didn't bring across
the storage-aggregation.conf file.
This has had the unfortunate effect of regressing the xFilesFactor set
for every newly created graphite stat since the migration. This
setting is a percentage (0-1 float) of how much of a "bucket" needs to
be non-null to keep the value when rolling up changes. We want this
to be zero due to the sporadic nature of data (see the original change
I5f416e798e7abedfde776c9571b6fc8cea5f3a33).
This only affected newly created statistics, as graphite doesn't
modify this setting once it creates the whisper file. This probably
helped us overlook this for so long, as longer-existing stats were
operating correctly, but newer were dropping data when zoomed out.
Restore this setting, and double-check it in testinfra for the future.
For simplicity and to get this back to the prior state I will manually
update the on-disk .wsp files to this when this change applies.
Change-Id: I57873403c4ca9783b1851ba83bfba038f4b90715
We were using /var/run/ansible/zuul_reboot.lock to flock around this
cron job. Unfortauntely it seems /var/run/ansible does not exist so the
flock command fails. Move the file to /var/run/zuul_reboot.lock to work
around this.
Note that we want to use /var/run since it is a tmpfs which means if the
server unexpectedly reboots we'll automatically clear the lock.
Change-Id: Ib0f4a434cbbf2152722493e80b5cc7a945c1f235
A few formatting fixes
* try to more consistently use shell-session formatting for shell
sessions (makes it easier to copy-paste).
* fix up and use more `` around verbatim/code things.
Fixes:
* Gerrit Configuration : there's no db to set the ICLA fields in now,
remove
* Duplicate Accounts : add required arg "origin" to git fetch command
* Deactivating account : can not delete comments via sql query,
remove
Change-Id: Ia481750aa59fc88bef5c00bb0fd9e6f9e23b2777
This adds upgrade testing from our current Gerrit version (3.5) to the
likely future version of our next upgrade (3.6).
To do so we have to refactor the gerrit testing becase the 3.5 to 3.6
upgrade requires we run a command against 3.5. The previous upgrade
system assumed the old version could be left alone and jumped straight
into the upgrade finally testing the end state. Now we have split up the
gerrit bootstrapping and gerrit testing so that normal gerrit testing
and upgrade testing can run these different tasks at different points in
the gerrit deployment process.
Now the upgrade tests use the bootstrapping playbook to create users,
projects, and changes on the old version of gerrit before running the
copy-approvals command. Then after the upgrade we run the test assertion
portion of the job.
Change-Id: Id58b27e6f717f794a8ef7a048eec7fbb3bc52af6
This adds Gerrit 3.6 image build jobs as well as CI testing for this
version of Gerrit. Once we've got images that build and function
generally we'll reenable the upgrade job and work through that.
Change-Id: I494a21911a2279228e57ff8d2b731b06a1573438
This removes our Gerrit 3.4 image builds as well as testing. We should
land this after an appropriate amount of time has passed since the 3.5
upgrade that we are unlikely to revert.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/847057
Change-Id: Iefa7cc1157311f0239794b15bea7c93f0c625a93
This adds a weekly cronjob that will reboot and update our entire zuul
cluster gracefully. The time frame chosen for this should be after North
America begins its weekend and before Europe starts their week. The idea
is that we're doing this during the quiet time of our week.
Change-Id: Ib9a54f273e11744fa1ddbf367c291289f86bddb7
We've upgraded to Gerrit 3.5 so now need to wait for the 3.5 image to
promote rather than the 3.4 image when deploying Gerrit.
Change-Id: Ic3a4d578aea955aeee51f4cac7f4c95de931a94b
We previously auto updated nodepool builders but not launchers when new
container images were present. This created confusion over what versions
of nodepool opendev is running. Use the same behavior for both services
now and auto restart them both.
There is a small chance that we can pull in an update that breaks things
so we run serially to avoid the most egregious instances of this
scenario.
Change-Id: Ifc3ca375553527f9a72e4bb1bdb617523a3f269e
OFTC's chanserv requires a channel description be provided when
registering it. Update the example in our documentation to reflect
that.
Change-Id: Iee61b8176b2b801b4843530e7570bad5000fe76e
This is a new config option for Gerrit 3.5. While it defaults to true we
set it explicitly to true to avoid any changes in behavior should that
default change eventually with newer Gerrit. They note this is expensive
to calculate, but our users rely on it and it hasn't caused us problems
yet. We can always explicitly disable it in the future if that becomes
necessary.
Change-Id: Idc002810de2d848af043978894ef9dc194ac5b6a
The zuul cli command is deprecated and creates a warning when it is
being used, replace it with zuul-admin.
Change-Id: Ifcc891f5da6f16824a65dc8dbf560b5d4c6ee9fc
Add released Fedora 36 to the mirror. Traditionally we have kept two
releases (prior and current) around; but depending on what is broken
often we drop the prior release earlier if it is not worth fixing;
this is what happened with F34. Ergo this is adding 36 and leaving
35, for now.
Change-Id: I9864666be0a6e32edc730b736f81d8883411bcb2
This updates the gerrit configuration to deploy 3.5 in production.
For details of the upgrade process see:
https://etherpad.opendev.org/p/gerrit-upgrade-3.5
Change-Id: I50c9c444ef9f798c97e5ba3dd426cc4d1f9446c1
As part of the Gerrit 3.5 upgrade we are also upgrading the reviewdb
to the latest mariadb LTS. This should be merged after the update
process; see
https://etherpad.opendev.org/p/gerrit-upgrade-3.5
Change-Id: Ie30c84eeb003ee86a7a66e0c1c5fd7f95ddf3f5f
Previously the merger docker-compose restart value was set to always.
This caused the merger to immediately restart after asking it to
gracefully stop and our check for the merger stopping:
docker-compose ps -q | xargs docker wait
never saw it as being stopped.
Make the mergers match executors and restart only on failure. This
should allow us to gracefully stop the mergers with intention and detect
they are stopped for maintenance purposes.
Change-Id: Ia8d12fbf6a45e4ca85174ccafd18b5d2351c26c1
This serves two purposes. The first is that not all packages are updated
by unattended-upgrades beacuse it may not be safe to upgrade packages
while services are running. We should be safe in this situation because
we've gracefully stopped services and can proceed with package updates.
The other is unattended upgrades runs daily which means we could end up
almost 24 hours out of date prior to rebooting. This ensures we have the
latest and greatest packages installed just prior to rebooting.
Change-Id: Id351b5478e925ed1b4fbb6b3e27f2c0b6af8b897
This handles rolling the mergers and executors, but not yet
the schedulers.
Also, it does the executors in complete batches of 6, but could be
improved to stop 6 and then do each of the next as the first ones
complete.
Change-Id: I2dca104194c2f129b68dcef7721d7d08cb987c46
3.4.5 is a fairly minor update. Some bugs are fixed and jgit is updated.
3.4.5 release notes:
https://www.gerritcodereview.com/3.4.html#345
3.5.2 is a bigger update and important adds support for being able to
upgrade to 3.6.0 later. There is a new copy-approvals command that must
be run offline on 3.5.2 before upgrading to 3.6.0. This copies approvals
in the notedb in a way that 3.6.0 can handle them apparently. The
release notes indicate this may take some time to run. We don't need to
run it now though and instead need to make note of it when we prepare
for the 3.6.0 upgrade.
3.5.2 release notes:
https://www.gerritcodereview.com/3.5.html#352
For now don't overthink things and instead just get up to date with our
images.
Change-Id: I837c2cbb09e9a4ff934973f6fc115142d459ae0f
This appears to be a straightforward bug fix release according to the
release notes:
https://github.com/go-gitea/gitea/blob/v1.16.8/CHANGELOG.md
No template change between v1.16.7 and v1.16.8 according to git.
Change-Id: I0b9bb2f15beb7d3b1541c02e6e96601d25449e33