This used to be called "bridge", but was then renamed with
Ia7c8dd0e32b2c4aaa674061037be5ab66d9a3581 to install-ansible to be
clearer.
It is true that this is installing Ansible, but as part of our
reworking for parallel jobs this is the also the synchronisation point
where we should be deploying the system-config code to run for the
buildset.
Thus naming this "boostrap-bridge" should hopefully be clearer again
about what's going on.
I've added a note to the job calling out it's difference to the
infra-prod-service-bridge job to hopefully also avoid some of the
inital confusion.
Change-Id: I4db1c883f237de5986edb4dc4c64860390cc8e22
This playbook was renamed "install-ansible.yaml" with
Ia7c8dd0e32b2c4aaa674061037be5ab66d9a3581
We want all jobs to match on this; it will make them run if we update
the ansible version on the bastion host, bridge.
Change-Id: Id38fc39f8f6b4d8f532eb9796259e8f4bf18d861
This adds a keycloak server so we can start experimenting with it.
It's based on the docker-compose file Matthieu made for Zuul
(see https://review.opendev.org/819745 )
We should be able to configure a realm and federate with openstackid
and other providers as described in the opendev auth spec. However,
I am unable to test federation with openstackid due its inability to
configure an oauth app at "localhost". Therefore, we will need an
actual deployed system to test it. This should allow us to do so.
It will also allow use to connect realms to the newly available
Zuul admin api on opendev.
It should be possible to configure the realm the way we want, then
export its configuration into a JSON file and then have our playbooks
or the docker-compose file import it. That would allow us to drive
change to the configuration of the system through code review. Because
of the above limitation with openstackid, I think we should regard the
current implementation as experimental. Once we have a realm
configuration that we like (which we will create using the GUI), we
can chose to either continue to maintain the config with the GUI and
appropriate file backups, or switch to a gitops model based on an
export.
My understanding is that all the data (realms configuration and session)
are kept in an H2 database. This is probably sufficient for now and even
production use with Zuul, but we should probably switch to mariadb before
any heavy (eg gerrit, etc) production use.
This is a partial implementation of https://docs.opendev.org/opendev/infra-specs/latest/specs/central-auth.html
We can re-deploy with a new domain when it exists.
Change-Id: I2e069b1b220dbd3e0a5754ac094c2b296c141753
Co-Authored-By: Matthieu Huin <mhuin@redhat.com>
Having two groups here was confusing. We seem to use the review group
for most ansible stuff so we prefer that one. We move contents of the
gerrit group_vars into the review group_vars and then clean up the use
of the old group vars file.
Change-Id: I7fa7467f703f5cec075e8e60472868c60ac031f7
Previously we had set up the test gerrit instance to use the same
hostname as production: review02.opendev.org. This causes some confusion
as we have to override settings specifically for testing like a reduced
heap size, but then also copy settings from the prod host vars as we
override the host vars entirely. Using a new hostname allows us to use a
different set of host vars with unique values reducing confusion.
Change-Id: I4b95bbe1bde29228164a66f2d3b648062423e294
Previously we had a test specific group vars file for the review Ansible
group. This provided junk secrets to our test installations of Gerrit
then we relied on the review02.opendev.org production host vars file to
set values that are public.
Unfortunately, this meant we were using the production heapLimit value
which is far too large for our test instances leading to the occasionaly
failure:
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 9596567552 bytes for committing reserved memory.
We cannot set the heapLimit in the group var file because the hostvar
file overrides those values. To fix this we need to replace the test
specific group var contents with a test specific host var file instead.
To avoid repeating ourselves we also create a new review.yaml group_vars
file to capture common settings between testing and prod. Note we should
look at combining this new file with the gerrit.yaml group_vars.
On the testing side of things we set the heapLimit to 6GB, we change the
serverid value to prevent any unexpected notedb confusion, and we remove
replication config.
Change-Id: Id8ec5cae967cc38acf79ecf18d3a0faac3a9c4b3
This shifts our Gerrit upgrade testing ahead to testing 3.3 to 3.4
upgrades as we have upgraded to 3.3 at this point.
Change-Id: Ibb45113dd50f294a2692c65f19f63f83c96a3c11
Avoid running the letsencrypt job when other roles add handlers for
their certificates. We don't need to run this job explicitly in that
case.
Change-Id: Ic2e9b7fc81b73ecf7af197b83496e3589bb28bb0
Co-Authored-By: Jeremy Stanley <fungi@yuggoth.org>
Currently we connect to the LE staging environment with acme.sh during
CI to get the DNS-01 tokens (but we never follow-through and actually
generate the certificate, as we have nowhere to publish the tokens).
We've known for a while that LE staging isn't really meant to be used
by CI like this, and recent instability has made the issue pronounced.
This modifies the driver script to generate fake tokens which work to
ensure all the DNS processing, etc. is happening correctly.
I have put this behind a flag so the letsencrypt job still does this
however. I think it is worth this job actually calling acme.sh to
validate this path; this shouldn't be required too often.
Change-Id: I7c0b471a0661aa311aaa861fd2a0d47b07e45a72
As of https://github.com/ansible/ansible/commit/724800c (and now
2.12.0b1), ansible started requiring Python 3.8 or later on
controllers. Switch our representative bridge.openstack.org test
nodes to the ubuntu-focal label which has 3.8.10 as its default
python3 so we can determine whether it's safe to upgrade production
similarly.
Change-Id: Ie1dc4dfaaf08ab74bf59717610231855926e9d19
This switch testing of lists.openstack.org to Focal and we make a CGI
env var update to accomodate newer mailman.
Specifically newer mailman's CGI scripts filter env vars that it will
pass through. We were setting MAILMAN_SITE_DIR to vhost our mailman
installs with apache2, but that doesn't pass the filter and is removed.
HOST is passed through so we update our scripts, apache vhost configs,
exim, and init scripts to use the HOST env var instead.
Change-Id: I5c8c70c219669e37b7b75a61001a2b7f7bb0bb6c
This will double check that we can run our ansible against focal without
trouble. Once the production server is updated we can land this change
to reflect the server state.
Change-Id: I1a572ee13ea4c3fae38f84e5cc300a610efa94ae
We create (a currently test only) playbook that upgrades zuul. This job
then runs through project creation and renaming and testinfra testing on
the upgraded gerrit version.
Future improvements should consider loading state on the old gerrit
install before we upgrade that can be asserted as well.
Change-Id: I364037232cf0e6f3fa150f4dbb736ef27d1be3f8
If the hound service is shutdown uncleanly (like the server stops on
us) it can leave behind lock files that stop processing. Clear old
lock files on start before indexing begins.
Also fix the job matching
Change-Id: I6e6c57d7121702eb61124c3d4c4cdc7afd3e75c3
Thin runs the new matrix-eavesdrop bot on the eavesdrop server.
It will write logs out to the limnoria logs directory, which is mounted
inside the container.
Change-Id: I867eec692f63099b295a37a028ee096c24109a2e
It would be useful to test our rename playbook against gitea and gerrit
when we make changes to these related playbooks, roles, and docker
images. To do this we need to converge our test and production setups
for gerrit a bit more. We create an openstack-project-creator account in
the test gerrit to match prod and we have rename_repos.yaml talk to
localhost for gerrit ssh commands.
With that done we can run the rename_repos.yaml playbook from
test-gitea.yaml and test-gerrit.yaml to help ensure the playbook
functions as expected against these services.
Co-Authored-By: Ian Wienand <iwienand@redhat.com>
Change-Id: I49ffaf86828e87705da303f40ad4a86be030c709
This moves review02 out of the review-staging group and into the main
review group. At this point, review01.openstack.org is inactive so we
can remove all references to openstack.org from the groups. We update
the system-config job to run against a focal production server, and
remove the unneeded rsync setup used to move data.
This additionally enables replication; this should be a no-op when
applied as part of the transition process is to manually apply this,
so that DNS setup can pull zone changes from opendev.org.
It also switches to the mysql connector, as noted inline we found some
issues with mariadb.
Note backups follow in a separate step to avoid doing too much at
once, hence dropping the backup group from the testing list.
Change-Id: I7ee3e3051ea8f3237fd5f6bf1dcc3e5996c16d10
The paste service needs an upgrade; since others have created a
lodgeit container it seems worth us keeping the service going if only
to maintain the historical corpus of pastes.
This adds the ansible to deploy lodgeit and a sibling mariadb
container. I have imported a dump of the old data as a test. The
dump is ~4gb and imported it takes up about double that; certainly
nothing we need to be too concerned over. The server will be more
than capable of running the db container alongside the lodgeit
instance.
This should have no effect on production until we decide to switch
DNS.
Change-Id: I284864217aa49d664ddc3ebdc800383b2d7e00e3
This installs statusbot on eavesdrop01.opendev.org.
Otherwise it's just config translation and bringing up the daemon.
Change-Id: I246b2723372594e65bcd1ba90215d6831d4c0c72
This installs our Limnoira/meetbot container and configures it on
eavesdrop01.opendev.org. I have ported the configuration from the old
puppet as best I can (it is very verbose); my procedure was to use the
Limnoira wizard to start a new config file then backport everything
from the old file. I felt this was best to not miss any new options.
This does channel logging (via built-in ChannelLogger plugin, along
with a cron job for logs2html) and runs our fork of meetbot.
It exports the channel logs via HTTP to /irclogs and meetings logs to
/meetings. meetings.opendev.org will proxy to these two locations
when the server is active.
Note this has not ported the channel list; so the bot will not be
listening in our channels.
Change-Id: I9f9a466c271e1a706f9f98f816de0e84047519f1
This moves these services to eavesdrop01.opendev.org, a new
Focal-based server to host IRC services.
We have stopped running puppet on eavesdrop01.openstack.org so there
is nothing left for it to do (note the server is still running
meetbot/ptgbot). Remove the commented out puppet run, and remove the
server from puppet groups. Update the host in the Zuul jobs to the
new node.
Change-Id: I809f9af3e78f566362142790f6c79654ef5b8959
When the testinfra tests for listservs change we should run the
system-config-run-lists job. Make it so with file matchers.
Change-Id: I64402345ceaab2056d8ed1b3a579063de0b29c9b
ARA's master branch now has static site generation, so we can move
away from the stable branch and get the new reports.
In the mean time ARA upstream has moved to github, so this updates the
references for the -devel job.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/793530
Change-Id: I008b35562994f1205a4f66e53f93b9885a6b8754
Nodepool doesn't use any puppetry anymore. We don't need to test changes
to puppet against nodepool.
Change-Id: I0ec824e21c17b074dd82eb298de6d196802aac28
This reduces the scope of our puppet related testing to things that
continue to use puppet. This is probably not strictly necessary but
helps keep us up to date with our TODO list.
Change-Id: I52bfff09ad0ddeabe7ad151bcf88c912f86a76ec
This zuul02 instance will replace zuul01. There are a few items to
coordinate when doing an actual switch so we haven't removed zuul01 from
inventory here. In particular we need to update gearman server config
values in the zuul cluster and we need to save queues, shutdown zuul01,
then start zuul02's scheduler and restore queues there.
I believe landing this change is safe as we don't appear to start zuul
on new instances by default. Reviewers should double check this.
Depends-On: https://review.opendev.org/c/opendev/zone-opendev.org/+/791039
Change-Id: I524b456e494124d8293fbe8e1468de40f3800772
This converts our existing puppeted mailman configuration into a set of
ansible roles and a new playbook. We don't try to do anything new and
instead do our best to map from puppet to ansible as closely as
possible. This helps reduce churn and will help us find problems more
quickly if they happen.
Followups will further cleanup the puppetry.
Change-Id: If8cdb1164c9000438d1977d8965a92ca8eebe4df
These files are involved in creating gerrit docker images; make sure
we trigger jobs when they are modified.
Change-Id: I7c4436e066cfb0c2d0b2ca7adf54c99b09dac95f
We will be rotating zk01-03.openstack.org out and replacing them with
zk04-06.opendev.org. This is the first change in that process which puts
zk04 into the rotation. This should only be landed when operators are
ready to manually stop zookeeper on zk03 (which is being replaced by
zk04 in this change).
Change-Id: Iea69130f6b3b2c8e54e3938c60e4a3295601c46f