This was introduced with I19db98fcec5715c33b62c9c9ba5234fd55700fd8
opendev-infra-prod-setup-src is the abstract parent job, we should be
The current opendev-infra-prod-base job sets up the executor to log
into bridge AND copies in Zuul's checkout of system-config to
This presents an issue for parallel operation, as every production job
is cloning system-config ontop of each other.
Since they all operate in the same buildset, we only need to clone
system-config from Zuul once, and then all jobs can share that repo.
This adds a new job "infra-prod-setup-src" which does this. It is a
dependency of the base job so should run first.
All other jobs now inhert from opendev-infra-prod-setup-keys, which
only sets up the executor for logging into bridge.
Having two groups here was confusing. We seem to use the review group
for most ansible stuff so we prefer that one. We move contents of the
gerrit group_vars into the review group_vars and then clean up the use
of the old group vars file.
Previously we had set up the test gerrit instance to use the same
hostname as production: review02.opendev.org. This causes some confusion
as we have to override settings specifically for testing like a reduced
heap size, but then also copy settings from the prod host vars as we
override the host vars entirely. Using a new hostname allows us to use a
different set of host vars with unique values reducing confusion.
Previously we had a test specific group vars file for the review Ansible
group. This provided junk secrets to our test installations of Gerrit
then we relied on the review02.opendev.org production host vars file to
set values that are public.
Unfortunately, this meant we were using the production heapLimit value
which is far too large for our test instances leading to the occasionaly
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 9596567552 bytes for committing reserved memory.
We cannot set the heapLimit in the group var file because the hostvar
file overrides those values. To fix this we need to replace the test
specific group var contents with a test specific host var file instead.
To avoid repeating ourselves we also create a new review.yaml group_vars
file to capture common settings between testing and prod. Note we should
look at combining this new file with the gerrit.yaml group_vars.
On the testing side of things we set the heapLimit to 6GB, we change the
serverid value to prevent any unexpected notedb confusion, and we remove
This bumps the gerrit image up to our 3.3 image. Followup changes will
shift upgrade testing to test 3.3 to 3.4 upgrades, clean up no longer
needed 3.2 images, and start building 3.4 images.
Avoid running the letsencrypt job when other roles add handlers for
their certificates. We don't need to run this job explicitly in that
Co-Authored-By: Jeremy Stanley <firstname.lastname@example.org>
Currently we connect to the LE staging environment with acme.sh during
CI to get the DNS-01 tokens (but we never follow-through and actually
generate the certificate, as we have nowhere to publish the tokens).
We've known for a while that LE staging isn't really meant to be used
by CI like this, and recent instability has made the issue pronounced.
This modifies the driver script to generate fake tokens which work to
ensure all the DNS processing, etc. is happening correctly.
I have put this behind a flag so the letsencrypt job still does this
however. I think it is worth this job actually calling acme.sh to
validate this path; this shouldn't be required too often.
As of https://github.com/ansible/ansible/commit/724800c (and now
2.12.0b1), ansible started requiring Python 3.8 or later on
controllers. Switch our representative bridge.openstack.org test
nodes to the ubuntu-focal label which has 3.8.10 as its default
python3 so we can determine whether it's safe to upgrade production
To do this we also update jinja-init to bullseye and gitea seems to be
the only user of this image. The impact of this should be fairly self
contained to gitea.
Note this update isn't urgent, but good hygiene. We should coordinate
this update with the 1.15.x gitea upgrade and do them in such a sequence
that we can identify problems easily if they pop up.
This switch testing of lists.openstack.org to Focal and we make a CGI
env var update to accomodate newer mailman.
Specifically newer mailman's CGI scripts filter env vars that it will
pass through. We were setting MAILMAN_SITE_DIR to vhost our mailman
installs with apache2, but that doesn't pass the filter and is removed.
HOST is passed through so we update our scripts, apache vhost configs,
exim, and init scripts to use the HOST env var instead.
This uses the opendev assets bundle image created with
The mount options require using BuildKit, hence the Dockerfile update.
Otherwise conceptually it's fairly simple; copy in the files from the
opendevorg/assets image rather than the file-system.
Move some common assets into a top-level assets/ directory. Services
can reference these assets via
in <img> tags, etc.
Some services want to embed these into their images, but we wish to
only keep one canonical copy. For this, add a Dockerfile and jobs
that creates a simple bundle of assets in opendevorg/assets. This can
be referenced in other builds; the new BuildKit bind-mount is
particularly useful for this
The Open Infrastructure Foundation's developers who maintain the
OpenStackID software are taking over management of the site itself,
and have deployed it on new servers. DNS records have already been
updated to the new IP address, so it's time to clean up our end in
preparation for deleting the old servers we've been running.
OpenStackID is still used by some services we run, like RefStack and
Zanata, and we're still hosting the OpenStackID Git repository and
documentation, so this does not get rid of all references to it.
We have a subdir in inventory called base that includes the shared
files that we don't have a good way to distinguish between services.
Limit the file matchers to inventory/base so that we don't trigger
all of the services anytime a single service's host_vars changes.
This will double check that we can run our ansible against focal without
trouble. Once the production server is updated we can land this change
to reflect the server state.
We create (a currently test only) playbook that upgrades zuul. This job
then runs through project creation and renaming and testinfra testing on
the upgraded gerrit version.
Future improvements should consider loading state on the old gerrit
install before we upgrade that can be asserted as well.