According to the docs [0] this shouldn't be necessary as performance
logging only happens if a performance tracing plugin is installed.
However according to this repo discuss thread [1] there is always a
dummy performanceLogging instance installed. This same thread identifies
this as a likely source for large increase in memory utilization by
Gerrit when upgrading to 3.5.
Let's explicitly disable this tracing due to the memory overhead in prep
for our 3.5 upgrade. We can always flip the setting if we install a
performance tracing plugin in our Gerrit.
[0] https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#tracing
[1] https://groups.google.com/g/repo-discuss/c/QUD7_LsEVks/m/kBDEeam4AgAJ
Change-Id: Iff438695aa6488fb5886120121946494b1edf003
Because we proxy to Gerrit and set listenUrl with a proxy-http:// prefix
httpd.requestLog is disabled by default. We choose to explicitly enable
it here to add more logging to the Gerrit system even if this logging is
slightly less useful when behind a proxy. In particular this logging
will track memory utilization per request which we can use to benchmark
change query memory cost between 3.4 and 3.5.
Change-Id: Ia3ccf820ee0e5ca7d68bcc37da7004dea2ad7128
As an overall clean-up effort to free space on our mirror volumes,
stop mirroring source packages for the debian package repository.
According to codesearch, the configure-mirrors role in
zuul/zuul-jobs adds deb-src lines for it, so we need to wait on
merging this until a change merges to switch its default behavior or
we override the default behavior in opendev/base-jobs. It also won't
take effect immediately, and will require a manual step to prune the
old packages from the filesystem once this configuration update has
deployed.
Change-Id: Ia5777770e178385b7326c97728c875f302c1e1f0
Depends-On: https://review.opendev.org/839593
As an overall clean-up effort to free space on our mirror volumes,
stop mirroring source packages for the ubuntu package repository.
According to codesearch, no jobs in our system add deb-src lines for
it so this should be entirely safe. It also won't take effect
immediately, and will require a manual step to prune the old
packages from the filesystem once this configuration update has
deployed.
Change-Id: Ie2aa6bdf43721c9ceacea54f7f5bc928883287a7
As an overall clean-up effort to free space on our mirror volumes,
stop mirroring source packages for the various debian-docker-*
repositories. Note that this should be a no-op as those repositories
don't appear to provide source packages anyway.
Change-Id: I51c723782c7aa08dc29c839b8e46c2dd6d8c8d41
This adds package mirroring for Jammy Docker packages. One of the (I'm
sure) many steps to spin up Jammy in our CI environment.
Change-Id: I2c3fe6fed07ac36aa65b6f060012705ff940f69a
Our CI jobs generally don't make use of source packages, and because
pulling them requires a separate line in sources.list anyway any
jobs which want them can add an official mirror source easily.
Remove from the configured architectures list in ubuntu-ports
initially, since ARM jobs are even less likely to be impacted by
this due to their scant number and limited frequency. This will
allow us to evaluate the fallout and determine whether manual
cleanup is needed afterward.
Change-Id: Ib647b6733529d965fa9caa92ad4d123639510083
We're trying to start mirroring Ubuntu 22.04 LTS packages, and their
indices are now signed with the 2018 Archive Signing Key which we
haven't yet imported. Add it.
Change-Id: I88cabf8a703ef0086e58b8f6cd65bf54321f7998
This update brings a number of bugfixes. The most visible to us is it
should fix partial clones for older clients. This means we can reenable
partial cloning again. Note that we have testing of partial clones which
should detect if this is working for us.
There are no template diffs for the three template files we override
between 1.16.5 and 1.16.6. The release notes can be found at:
https://github.com/go-gitea/gitea/blob/v1.16.6/CHANGELOG.md#1166---2022-04-20
Change-Id: Ie5b6a3bcd73f135fe8914b55896a3a428a41dccc
This started as an effort to clean up our mirroring of stretch packages.
It appears that the stretch content in our mirrors isn't actively
mirrored at this point and needs to be manually cleaned up. But while I
was looking in the config management i noticed that we still configure
jessie and stretch gpg keys which I don't think are necessary either.
Remove them.
Note the key fingerprint values were swapped around in our _keys dicts,
but the file content appears to have been correct for each of the keys.
We swap the key fingerprints around so that everything makes sense
again.
To confirm the key file content is correct:
$ gpg --show-keys playbooks/roles/reprepro/files/keys/debian-bullseye-security.asc
gpg: directory '/home/clark/.gnupg' created
gpg: keybox '/home/clark/.gnupg/pubring.kbx' created
pub rsa4096 2021-01-17 [SC] [expires: 2029-01-15]
AC530D520F2F3269F5E98313A48449044AAD5C5D
uid Debian Security Archive Automatic Signing Key (11/bullseye) <ftpmaster@debian.org>
sub rsa4096 2021-01-17 [S] [expires: 2029-01-15]
$ gpg --show-keys playbooks/roles/reprepro/files/keys/debian-bullseye.asc
pub rsa4096 2021-01-17 [SC] [expires: 2029-01-15]
1F89983E0081FDE018F3CC9673A4F27B8DD47936
uid Debian Archive Automatic Signing Key (11/bullseye) <ftpmaster@debian.org>
sub rsa4096 2021-01-17 [S] [expires: 2029-01-15]
And compare to the values published by debian:
https://ftp-master.debian.org/keys.html
Change-Id: If5d710051b03024512667a8cb9498320b88f5b33
We build our own images from scratch using yum/dnf and don't rely on the
isolinux/ or images/ content. Clean this up from our mirrors to reduce
their size and time to sync.
Change-Id: Ifa47c8ae1f3a080e69e4c00b0c2ef38f4094e1a2
This removes debug and non-oss packages from our opensuse mirrors. These
shouldn't be needed by any jobs so we don't need to mirror them. The
goal here is to reduce overall disk consumption where possible.
Change-Id: Ic48f421e8e7cf7bae039b68b8ed3bc2a5c8e278b
We don't have CentOS 7 arm64 test instances so we remove the aarch64
packages for CentOS 7. Additionally we remove unneeded source and
alternative arch packages from Epel for CentOS 7 and 8.
Change-Id: I8b5c6906c2ce698a02b794e635e8f8ba613c0fef
We indicated to the OpenStack TC that this service would be going away
after the Yoga cycle if no one stepped up to start maintaining it. That
help didn't arrive in the form of OpenDev assistance (there is effort
to use OpenSearch external to OpenDev) and Yoga has released. This means
we are now clear to retire and shutdown this service.
This change attempts to remove our configuration management for these
services so that we can shutdown the servers afterwards. It was a good
run. Sad to see it go but it wasn't sustainable anymore.
Note a follow-up will clean up elastic-recheck which runs on the status
server.
Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/837619
Change-Id: I5f7f73affe7b97c74680d182e68eb4bfebbe23e1
Git was recently updated to fix a security issue that prevents git from
operating on a repo as userA if the repo is owned by userB. In our gitea
tests we use our local zuul repo clone of system-config to push back
into gitea to get some real content into gitea. We were operating as
root but zuul owns that repo. Update the commands to run as zuul to
workaround the git error.
Change-Id: I87105bae4bdd69465cce4d5bc412241dc1c88623
Put the excludes in the order seen at a URL like
http://mirrors.us.kernel.org/fedora/releases/35/
Remove "Docker" and "CloudImages" which doesn't exist now, add
"Kinoite" and "Silverblue" which we don't use.
Change-Id: I07711cb83f88844cd036b81a198908498b380733
This section runs as root, but the system-config repo is cloned as
Zuul. This causes a problem for tox, when it installs it calls out to
git which is no longer operates in the directory [1].
[1] 8959555cee
Change-Id: I5c67208c025d29435dcc40c5eeb3b3aa8e5c4d5d
Our current mirror is out of sync, it has happened twice now. It's
been several years since we switched away from kernel.org mirrors
(Ie6050fbf20259df39eab11f8e326d58351d98ea5) so it might be worth
trying again (they are currently in sync). Also we are using the
".us" version here, to keep things more local to the update server.
Change-Id: I29cf3895e7fa8e401a1d686119e56b4fddb0cc47
At the moment, the "ensure-docker" role is broken due to the fact
that we're not mirroring the Docker upstream repositories for that
architecture.
This patch adds "arm64" to the list of architectures for Focal, but
it purposely ignored Xenial and Bionic as those are older distros
that are likely not using arm64 (or they would have noticed it was
broken!)
Change-Id: I33d62dc13fa786c15b352c012f9798348d09b8b0
The openstack/tap-as-a-service was evicted from the openstack/ Git
namespace along with all other unofficial projects during the great
schism. More recently, the project was adopted as part of OpenStack
Neutron and renamed back into the openstack/ but we didn't notice
there was a lingering redirect for their existing tarballs. Remove
that redirect now that it shadows the official location.
Change-Id: If4be90ddc246183a1843fc9bb480bdc6d00c18e3
Remove the (r) which is completely impossible to see when scaled down
to an icon, and just makes it look like the icon is off-centered with
a red squish. Make the background transparent too, so it looks better
in dark mode.
Change-Id: I29a0e320ec15656679f3706ad8b16aa908568528
The playbooks/roles/gitea-set-org-logos/files directory potentially
contains files which are not covered by any open source copyright
licenses, do add a document clarifying this.
Change-Id: I55cbc9c768d0c3c467a647aafbc82ece7cae989e
Over a few upgrades, we've managed to break some of the default avatar
logos you see when browsing code on opendev.org.
After investigating ways to fix this up, we established that there
isn't an exposed API for setting these, but we can do a simple query
to point to logo files on disk. This implements that.
One caveat is that the logos should be PNG files; particiularly we
note that SVG files don't work reliably because they don't get served
with the image/svg+xml mime-type.
Change-Id: Ie6799de2fb27e09f936c488258dc1bd1c638c370
Gitea 1.16 enabled clone filters by default. Unfortunately pip passes
--filter=blob:none when fetching git resources and the new gitea support
for filters breaks against that filter. We are working around this by
restoring the 1.15 behavior of not supporting filters and this change
will test the behavior is as expected.
Change-Id: I13d57e3cc7e135058ff320b3bd9bea76fb178064
Gitea 1.16 added partial clone support, but the clone filters pip
tries to apply (--filter=blob:none) don't work well when combined
with older cgit clients and lead to errors like "Server does not
allow request for unadvertised object" or "protocol error: bad pack
header".
Explicitly disable this feature server-side for now, so that clients
will fall back to making full clones.
Change-Id: Ia86394d5176c28567bf67b60578aadde6629c775
Depends-On: https://review.opendev.org/834196
None of the services we operate rely on openstackid.org any longer,
so we can drop our monitoring of its cert expiration safely (which
is currently complaining). We're already monitoring its successor,
id.openinfra.dev.
Change-Id: I059ef0492f05137fa542c819b64427bd9ef0eb0c
openEuler yum mirror in Russia is down. This patch change the
rsync url to the official HongKong one.
This patch also fix the openEuler mirror url nit.
Change-Id: Ifb930e34fd7f16f77ba55bc489e5389c641139de
Noticed this randomly from cron mail and unattended-upgrades. These are
vmware guest utilities. We don't run inside of vmware. We do not need
this installed.
Change-Id: Ieb2c7601c59f56d78fa350af7e0484c1cb6b8e9b
We thought byobu was removed but it is sneaky and eventually
changed name for some reason. Make sure both versions of the file
are absent.
Change-Id: I0cef293732b02228433dca5b4aa648d550ae5254
Because "." is a field separator for graphite, we're incorrectly
nesting the results.
A better idea seems to be to store these stats under the job name.
That's going to be more helpful when looking up in Zuul build results
anyway.
Follow-on to I90dfb7a25cb5ab08403c89ef59ea21972cf2aae2
Change-Id: Icbb57fd23d8b90f52bc7a0ea5fa80f389ab3892e
We used to track the runtime with the old cron-based system
(I299c0ab5dc3dea4841e560d8fb95b8f3e7df89f2) and had a dashboard view,
which was often helpful to see at a glance what might be going wrong.
Restore this for Zuul CD by simply sending the nested-Ansible task
time-delta and status to graphite. bridge.openstack.org is still
allowed to send stats to graphite from this prior work, so no ports
need to be opened.
Change-Id: I90dfb7a25cb5ab08403c89ef59ea21972cf2aae2
As found in Ie5d55b2a2d96a78b34d23cc6fbac62900a23fc37, the default for
this is to issue "OPTIONS /" which is kind of a weird request. The
Zuul hosts currently seem to return the main page content in response
to a OPTIONS request, which probably isn't right.
Make this more robust by just using "HEAD /" request.
Change-Id: Ibbd32ae744af9c33aedd087a8146195844814b3f
Apparently the check-ssl option only modifies check behavior, but
does not actually turn it on. The check option also needs to be set
in order to activate checks of the server. See §5.2 of the haproxy
docs for details:
https://git.haproxy.org/?p=haproxy-2.5.git;a=blob;f=doc/configuration.txt;h=e3949d1eebe171920c451b4cad1d5fcd07d0bfb5;hb=HEAD#l14396
Turn it on for all of our balance_zuul_https server entries.
Also set this on the gitea01 server entry in balance_git_https, so
we can make sure it's still seen as "up" once this change takes
effect. A follow-up change will turn it on for the other
balance_git_https servers out of an abundance of caution around that
service.
Change-Id: I4018507f6e0ee1b5c30139de301e09b3ec6fc494
Switch the port 80 and 443 endpoints over to doing http checks instead
of tcp checks. This ensures that both apache and the zuul-web backend
are functional before balancing to them.
The fingergw remains a tcp check.
Change-Id: Iabe2d7822c9ef7e4514b9a0eb627f15b93ad48e2
Change I5b9f9dd53eb896bb542652e8175c570877842584 introduced this tee
to capture and encrypt the logs. However, we should make sure to fail
if the ansible runs fail. Switch on pipefail, which will exit with an
error if the earlier parts of the pipeline fail. Also make sure we
run under bash.
Change-Id: I2c4cb9aec3d4f8bb5bb93e2d2c20168dc64e78cb
We hold off on landing this until we're sure all the cleanup is done
with this cloud provider. Removing it with everything else would make
cleanup more difficult.
Change-Id: I905a37b430cb313b10a239d7d1b843404af06403
We've been told these resources are going away. Trying to remove them
gracefully from nodepool. Once that is done we can remove our configs
here.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/831398
Change-Id: I396ca49ab33c09622dd398012528fe7172c39fe8
The dependent change enables the "detect-ref" option of hound, which
looks at the remote origin HEAD and indexes on that. That should
allow indexing of our mixed repos that have a mix of "master" and
"main".
Add cirros to the test, which should exercise this path, and take some
screenshosts because this a js/react app and just a "curl" doesn't
help.
Change-Id: I1850577c63566b594f9730f5b8f0bc10b07ff7e4
Depends-On: https://review.opendev.org/c/opendev/jeepyb/+/830919
These were added when we faced significant memory pressure on the old
server. That is no longer a problem and there is an issue with the
specification that breaks file compression due to destination files
already existing. It seems like the log specification is only able to
rotate once then it cannot keep moving files aside because they already
exist as eg jvm_gc.log.0.gz. This results in annoying errors in the
Gerrit error_log.
Note that it doesn't appear sufficient to remove this log specification
we also need to move the existing jvm_gc.log* files aside or delete
them. This was tested on a held zuul node and I stopped gerrit, updated
the docker-compose file, moved the files aside, then started gerrit and
that got rid of the startup errors in error_log. Merely updating
docker-compose resulted in the same errors on startup.
Change-Id: Ied1464c57b2e8331b9bdf7cbc9ad74f92dea2dfd
We have validated that the log encryption/export path is working, so
turn it on for all prod jobs.
Change-Id: Ic04d5b6e716dffedc925cb799e3630027183d890
- The extra "/" in the URL makes the download fail, remove it
- The old download python script would output the root on the first
line, then relative urls -- hence the loop was starting from 1.
This should be 0 here, as we just output the raw urls.
- Fix typo in build uuid output
Change-Id: I8ff2a38b3117ddcb0d197fe39f2c168b35ab372b
I didn't consider permissions on the production machine; since we run
Ansible as root the extant path can't access the logs.
By copying the logfile to encrypt to a staging area we can leave
everything else alone for now. Upon reflection it seems like a better
idea to do this in an ephemeral location anyway and not leave anything
behind. We move the cleanup into an always block too to ensure this.
Bump the codesearch playbook to trigger the prod job with these
changes.
Change-Id: I47f63df04d58b7a87bce445da0c0bdcb80edc8f9
This is just for testing a production run for the log publishing from
I9bd4ed0880596968000b1f153c31df849cd7fa8d
Change-Id: Iabc17e0b61c2a756cce16ee0b06bea86d2cb1cde
This fails if the variable isn't defined; because we limited
I9bd4ed0880596968000b1f153c31df849cd7fa8d to just one job to start,
the others fail with a missing definition.
Change-Id: I74b31f51494e7264e2a68f333943b143842f9a99
Once restarted onto the parent change, our Gerrit deployment will no
longer link to Gitiles representations of changes or the Git tree.
Explicitly deny access to the Gitiles URL base path in the Apache
vhost config, since we can't effectively remove the plugin itself.
This should help prevent search engines from finding its copies of
our projects rather than the ones we want people to use in Gitea.
Change-Id: I3c96221256662443f7a43344afd12194dce82b9d
This is a reimplementation of earlier change
I8efefe365f3b9ebe97c8c2ce322fa8c6f3b70b3a to link out to Gitea
instead of Gerrit's local Gitiles plugin. This should reduce the
complexity of what we're hosting on the Gerrit server, while at the
same time be less confusing for search engines and users. Configure
the Gitiles plugin to no longer take over Gerrit weblinks, and a
followup change will block access to its URL base path entirely.
Change-Id: I7e194fe5c907b39d53fd0663e06cbfd33a3ae410
Gerrit 3.4.0 stopped generating the is:mergeable predicated by default,
but it seems to be rather helpful for some reviewers. The computational
load caused by this is O(N^2) where N depends on the number of changes
open against a branch and their respective size. Since most of the
changes we process are rather small and also we didn't see a significant
reduction in load when we moved to 3.4, this isn't expected to be an
issue in our installation.
[0] https://www.gerritcodereview.com/3.4.html
[1] https://gerrit-documentation.storage.googleapis.com/Documentation/3.4.0/config-gerrit.html#change.mergeabilityComputationBehavior
Signed-off-by: Dr. Jens Harbott <harbott@osism.tech>
Change-Id: I9fce11b454255818e4a5817affed5b6e9c19f521
Without this our config changes are not applying to the running service
until something else reloads or restarts the service.
Change-Id: I4df229d1c42f06159a4b320d4b6a07c5239ca111
Based on the changes in I5b9f9dd53eb896bb542652e8175c570877842584,
enable returning encrypted log artifacts for the codesearch production
job, as an initial test.
Change-Id: I9bd4ed0880596968000b1f153c31df849cd7fa8d
Our production jobs currently only put their logging locally on the
bastion host. This means that to help maintain a production system,
you effectively need full access to the bastion host to debug any
misbehaviour.
We've long discussed publishing these Ansible runs as public logs, or
via a reporting system (ARA, etc.) but, despite our best efforts at
no_log and similar, we are not 100% sure that secret values may not
leak.
This is the infrastructure for an in-between solution, where we
publish the production run logs encrypted to specific GPG public keys.
Here we are capturing and encrypting the logs of the
system-config-run-* jobs, and providing a small download script to
automatically grab and unencrypt the log files. Obviously this is
just to exercise the encryption/log-download path for these jobs, as
the logs are public.
Once this has landed, I will propose similar for the production jobs
(because these are post-pipeline this takes a bit more fiddling and
doens't run in CI). The variables will be setup in such a way that if
someone wishes to help maintain a production system, they can add
their public-key and then add themselves to the particular
infra-prod-* job they wish to view the logs for.
It is planned that the extant operators will be in the default list;
however this is still useful over the status quo -- instead of having
to search through the log history on the bastion host when debugging a
failed run, they can simply view the logs from the failing build in
Zuul directly.
Depends-On: https://review.opendev.org/c/zuul/zuul-jobs/+/828818/
Change-Id: I5b9f9dd53eb896bb542652e8175c570877842584
Previously we were only checking that Apache can open TCP connections to
determine if Gitea is up or down on a backend. This is insufficient
because Gitea itself may be down while Apache is up. In this situation
TCP connection to Apache will function, but if we make an HTTP request
we should get back an error.
To check if both Apache and Gitea are working properly we switch to
using http checks instead. Then if Gitea is down Apache can return a 500
and the Gitea backend will be removed from the pool. Similarly if Apache
is non functional the check will fail to connect via TCP.
Note we don't verify ssl certs for simplicity as checking these in
testing is not straightforward. We didn't have verification with the old
tcp checks so this isn't a regression, but does represent something we
could try and improve in the future.
Change-Id: Id47a1f9028c7575e8fbbd10fabfc9730095cb541
We have removed Fedora 34 from our CI system. Fedora has a short
lifetime for each release, and in this particular case Fedora 34 did not
boot reliably in all our clouds. Now that the images have been removed
we can remove the release from our mirrors too.
Change-Id: I07dfca0ef88dc9531e4cb2c67ebbca5e0503594e
Currently `gerrit show-queue -w -q` indicates we are only using 2
threads for service users and have 100 available for interactive users.
Unfortunately we really have three current classes of users: Humans,
Zuul, and everything else. We can't currently separate these into
different pools so instead we'll try using a single large pool and
sharing. To do that we set batchThreads to the special value of 0.
Change-Id: I08681a6b88683355ea5780ac452de903c8c8a7a3
Gerrit replicates to gitea via ssh, but our current testing only checks
that we can push over https. Test that pushing over ssh works properly
as this is what we actually need to work in production.
Change-Id: I0f7764a6d07e7d413a5b07a7f3ba8a9be7b4f0e3
The default for existing Gerrit installations is that usernames remain
case sensitive when upgrading to 3.5. However new 3.5 installations will
be case insensitive by default. Since we have a long history of
usernames where switching to insensitive is not possible we force
usernames to be sensitive regardless of whether or not this is a new
install or an existing one.
A major reason for this is it means our system-config-run-review-3.5
jobs which install a new Gerrit 3.5 installation will match the
production behavior which will be upgraded from 3.4. Without this we may
get disparate behavior.
Note Gerrit 3.4 should ignore the setting and it should be safe to set
this on 3.4 as well as 3.5.
Change-Id: Ie4880bf580496a763cf042570bf8b9ff852ffb0e
This reenables Gerrit upgrade testing but tests the 3.4 to 3.5 upgrade
now. Note this may need some work to get happy once we have 3.5 images
which is why we've split it out into a separate change.
Change-Id: Ibbbd3f98ac2df8d99d4ffda57df59f4a47da3cd3
In CS9 the layout of the repos has changed and the SIGs repos are in a
separatee directory under centos-stream [1] so we need to add a new
rsync command. These repos includes messaging for rabbitmq-server, nfv
for openvswitch and storage for ceph.
[1] http://mirror.stream.centos.org/
Change-Id: I90890aade7ad5f42e15c4c171ed2c2545f2310c4
As of pip 22.0, its HTML parser no longer accepts any page which
doesn't start with the string "<!DOCTYPE html>" and, unfortunately,
Apache's mod_autoindex declares a very specific HTML 3.2 doctype
instead, causing pip to break any time our wheel cache is added to
its indices. The main index we generate has been updated with
https://review.opendev.org/826969 but we need this change to address
Apache's dynamically generated file lists for that site.
Configure Apache to supply a custom header file for file indices
within the /wheel/ subtree of our mirror vhosts, and alias it from
outside the docroot in order to reduce clutter of the top-level
directory index. Also instruct mod_autoindex to omit its own
document preamble which would otherwise include the original doctype
declaration. Note that this omits the header title and H1 level
headings from the resulting pages, but as these are only meant for
machine parsing anyway and not humans, it's a compromise to keep the
solution as simple and straightforward as possible.
Change-Id: Id71174954b13b80483256d37f773b781f4956c21
Pip 22.0 doesn't support Python versions prior to 3.7, so the
unversioned get-pip.py script refuses to run under Ubuntu Bionic's
default python3 interpreter. Add a 3.6-specific URL instead to work
around this.
Change-Id: Icab5f4dd45d8f290a2f52db083cdc564e5a08776
The sql connection is no longer supported, we need to use "database"
instead. The corresponding hostvars change has already been made
on bridge.
Change-Id: Ibcac56568f263bd50b2be43baa26c8c514c5272b
The gerrit.config template is a gitconfig-like file, so is expected
to use tab characters for indentation. Half the indented lines used
tabs already, so make the rest consistent.
Change-Id: I6e77f0278a25d688b9517c275614485518923bc9
These two apt.conf.d config files are installed by different packages
but have overlap in the configuration they set. Unfortunately if the
wrong one sets the flag to disable periodic updates it wins based on apt
conf's priority rules.
To ensure that we continue to auto update and handle different packages
supplying different config files we manage the entirety of the periodic
config in both of these files at the same time using a common source
file.
Change-Id: I5e408fd7c343adb1de9ec564fe430a6f31ecc360
Update the docs to reflect not having grafyaml in the container.
Also move the import into a separate helper script, which can be
manually run on the host if the container needs to be restarted
out-of-band for some reason.
Change-Id: Ib1f6aea7e16180d9b122552a2aa30ce223426941
We were using the zuul-scheduler group but there are two schedulers now
and we don't want the zk backups, secret rename, and secret delete to
run twice on two different schedulers. Address this by fixing the rename
playbook to zuul02.opendev.org.
Change-Id: Ic741e97bd4c930cc27db00c2e037dc724a460ef7
This file has been seen on a few servers with the Unattended-Upgrades
flag set to 0 disabling daily unattended upgrades. Most of our servers
have this set to 1 and are fine, but let's go ahead and manage this file
directly to ensure it is always 1 and auto upgrades are enabled.
Note that previously we had been setting this via apt.conf.d/10periodic
which seems to come from the update-notifier-common package on older
systems and is now no longer used. Since that file's prefix is smaller
than 20auto-upgrades the 20auto-ugprades file installed by
unattended-upgrades overrides this value. A future update would be to
coalesce both 10periodic and 20auto-upgrades together into one config
file.
Change-Id: Ic0bdaaf881780072fda7e60ff89b60b3a07b5804
The actually upgrade will be performed manually, but this change will be
used to update the docker-compose.yaml file.
If we land this change prior to the upgrade then note the
manage-projects commands will be updated to use the 3.4 image possibly
while gerrit 3.3 is still running. I don't expect this to be a problem
as manage-projects operates via network protocols.
Change-Id: I5775f4518ec48ac984b70820ebd2e645213e702a
It seems the images have split into enterprise and oss releases. The
OSS release is the one we want.
Since If0d584f848f213aeea385885e3decfaee6303de5 we don't run anything
in the container. So we can switch to the upstream default of the
Alpine-based container, which is their reccommendation.
Just use the :latest tag. Generally the API seems pretty stable. If
we do break, it's better to figure it out quickly rather than pin to
an old version and then have to make huge jumps.
Depends-On: https://review.opendev.org/c/opendev/grafyaml/+/825990
Change-Id: I312a141baf73d750591957197cb5ba829f503fcb
Current mirror http://mirror.dal10.us.leaseweb.net/centos/8-stream
is hitting issues and is impacting all CS8 jobs.
Switching to facebook mirror fixes the issue.
Related-Bug: #1958510
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
Change-Id: Ibddbee60c4318fd7bd34f281af95a20bf172c572
Instead of building a local grafana image with grafyaml installed,
use the plain upstream grafana image along with the newly created
separate opendev grafyaml image to run the dashboards.
Depends-On: https://review.opendev.org/780119
Change-Id: If0d584f848f213aeea385885e3decfaee6303de5
We set the START_AUDIO_MUTE value in our docker compose .env file, but
we didn't pass that value through to the container via the environment
setting in docker-compose.yaml. Fix this so that the jitsi meet config
templating can write out the expected config with the new value.
Change-Id: I3dbebad3ce67a8787ffd31c0db8d9583fe988e50
When you connect to meetpad it says:
You have started the conversation muted
It does this because we have muted video by default. Unfortunately, this
statement is ambiguous because audio is unmuted by default. Address this
by muting audio by default on join as well. Then when you are told you
are muted you can go unmute audio and video if you wish.
Change-Id: Iba399c92e1f8c6fba5e21ad45a2f4c7e5286429c
Currently dfw.mirror.rackspace.com is used for
syncing CS9 content on OpenDev mirror. But somehow it is
rejecting rsync connections.
Switching to facebook mirror[1] might fix the issue.
[1]. https://admin.fedoraproject.org/mirrormanager/mirrors/CentOS/9-stream/x86_64
Related-Bug: #1957950
Signed-off-by: Chandan Kumar (raukadah) <chkumar@redhat.com>
Change-Id: I119e468d6b38e4b3a0f73ab0e839f3bba85fd039
Tumbleweed images were an interesting idea to add forward looking
testing of a very new and up to date linux distro. Unfortunately, the
images don't receive the attention they deserve to remain in our CI
system, and nothing seems to use the images anyway. Clean up our
opensuse mirrors and stop mirroring tumbleweed as one step in this clean
up process.
This depends on the change that removes the CI label/images.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/824071
Change-Id: Ie1488b453463da750e1a08536116e44ec129828e
The OpenStackID project has been rebranded, and the old
openstackid.org deployment is being retained temporarily in order to
ease transition, but id.openinfra.dev is in place now and intended
as its successor.
Monitor its HTTPS cert like we do for its predecessor, so we'll be
aware of any impending expiration, as we host some services which
depend on it.
Change-Id: I7fee4d42db672bffa80fbca953979fad9896880e
The edge-computing discussion list is not OpenStack-specific. It was
originally included on the lists.openstack.org site when we didn't
yet have a more neutral list hosting location. While we're in the
process of moving other non-OpenStack mailing lists off the
lists.openstack.org site, rehome this one to lists.opendev.org by
setting up address forwarding and Web redirects, and moving the
existing mailman list entry for it in our configuration.
Note that this should be a no-op when it merges, as the list move
will be handled manually while deployment is temporarily disabled
for the server.
Change-Id: If5207f0237bee1571924855b769a22d653964af7
In keeping with its name change to the Open Infrastructure
Foundation, the summit sponsors mailing list is moving from
lists.openstack.org to lists.openinfra.dev. Set up address
forwarding and Web redirects to reflect this, and move the existing
mailman list entry for it in our configuration.
Note that this should be a no-op when it merges, as the list move
will be handled manually while deployment is temporarily disabled
for the server.
Change-Id: I29e1e94885fd16b0edd7001662f367caec591439
In keeping with its name change to the Open Infrastructure
Foundation, the foundation marketing mailing list is moving from
lists.openstack.org to lists.openinfra.dev. Set up address
forwarding and Web redirects to reflect this, and add a mailman list
entry for it (there's no old one to remove as it wasn't previously
included in our configuration).
Note that this should be a no-op when it merges, as the list move
will be handled manually while deployment is temporarily disabled
for the server.
Change-Id: Ibadc4bfc430656286774e25b4dce6d8e29b5acf7
In keeping with its name change to the Open Infrastructure
Foundation, the foundation gold member mailing list is moving from
lists.openstack.org to lists.openinfra.dev. Set up address
forwarding and Web redirects to reflect this, and add a mailman list
entry for it (there's no old one to remove as it wasn't previously
included in our configuration).
Note that this should be a no-op when it merges, as the list move
will be handled manually while deployment is temporarily disabled
for the server.
Change-Id: I6cd92e052b26705bd16a4b38b3725248cb5691fd
In keeping with its name change to the Open Infrastructure
Foundation, the confidential board mailing list is moving from
lists.openstack.org to lists.openinfra.dev. Set up address
forwarding and Web redirects to reflect this, and add a mailman list
entry for it (there's no old one to remove as it wasn't previously
included in our configuration).
Note that this should be a no-op when it merges, as the list move
will be handled manually while deployment is temporarily disabled
for the server.
Change-Id: I191676bcb7f878afab17ec3c1735219d91b4de4d
In keeping with its name change to the Open Infrastructure
Foundation, the foundation board mailing list is moving from
lists.openstack.org to lists.openinfra.dev. Set up address
forwarding and Web redirects to reflect this, and add a mailman list
entry for it (there's no old one to remove as it wasn't previously
included in our configuration).
Note that this should be a no-op when it merges, as the list move
will be handled manually while deployment is temporarily disabled
for the server.
Change-Id: Idcac72c067fab66b6322f08c027e9c451a488ca3
In keeping with its name change to the Open Infrastructure
Foundation, the foundation community mailing list is moving from
lists.openstack.org to lists.openinfra.dev. Set up address
forwarding and Web redirects to reflect this, and add a mailman list
entry for it (there's no old one to remove as it wasn't previously
included in our configuration).
Note that this should be a no-op when it merges, as the list move
will be handled manually while deployment is temporarily disabled
for the server.
Change-Id: I9fff3b920a7fd0f75a3cc7a704003eeb3aab4d8a
In keeping with its name change to the Open Infrastructure
Foundation, the general foundation mailing list is moving from
lists.openstack.org to lists.openinfra.dev. Set up address
forwarding and Web redirects to reflect this, and add a mailman list
entry for it (there's no old one to remove as it wasn't previously
included in our configuration).
Note that this should be a no-op when it merges, as the list move
will be handled manually while deployment is temporarily disabled
for the server.
Change-Id: I367dd2a3d9a1c70c14915efa729d643419375060
While the staff mailing list is hidden and private in production,
that configuration is set after creation, so in our deployment tests
we can absolutely verify that HTTP and HTTPS redirects for listinfo
and archives work anyway. This paves the way for any further
rewrites and associated testing we may need to do for other mailing
lists which move between domains, as well as testing redirects we
may set up as part of the v2 to v3 migration.
Change-Id: I68078554a72e3b59d8192ac4339e8654a8351f52
Add secondary vhosts for HTTPS to each mailman site, but don't
remove the plain HTTP ones for now. Before switching to Mailman 3
we'll replace the current HTTP vhosts with blanket redirects to
HTTPS.
Add tests to make sure this is working, and also add a command-line
test for the lists.openinfra.dev site now that it's got a first
non-default list of its own. Also collect Apache logs from the test
nodes so we can see for sure what might break.
Change-Id: I4d93d643381f17c9a968595587909f0ba3dd6f92
The apache2 package installs a "default" vhost automatically.
Disable it, since it interferes with vhost matching on the
multi-site lists.openstack.org server. These vhosts are not enabled
on our production servers, so this makes testing more like
production.
Change-Id: I32a3cea034ac0b198ec1f4610cc096a4502306e6
We're going to want Mailman 3 served over HTTPS for security
reasons, so start by generating certificates for each of the sites
we have in v2. Also collect the acme.sh logs for verification.
Change-Id: I261ae55c6bc0a414beb473abcb30f9a86c63db85
Vexxhost wants to change the routers for their IPv6 setup, which will
change their link-local addresses. Change our setup to use the global
addresses instead, which will stick.
Change-Id: I45c6a3b776645294a688329c60949c0c3c4529a5
We want to limit the time we remember possibly broken index responses
which we sometimes receive from the pypi CDN. We cannot set this per
location, so this is a comprise between reducing the impact of bad eggs
in the cache and trying not to throw out the good eggs too fast.
Change-Id: If88f10cb7e3cebfa9c37a71d284d513f25b8bb52
In order to be able to redirect list addresses which have moved from
one domain to another, we need a solution to alias the old addresses
to the new ones. We have simple aliases but they only match on the
local part. Add a new /etc/aliases.domain which matches full
local_part@domain addresses instead. Also collect this file in the
Mailman deployment test for ease of inspection.
Change-Id: I16f871e96792545e1a8cc8eb3834fa4eb82e31c8
It appears that simply setting stdin to an empty string is
insufficient to make newlist calls from Ansible correctly look like
they're coming from a non-interactive shell. As it turns out, newer
versions of the command include a -a (--automate) option which does
exactly what we want: sends list admin notifications on creation
without prompting for manual confirmation.
Drop the test-time addition of -q to quell listadmin notifications,
as we now block outbound 25/tcp from nodes in our deploy tests. This
has repeatedly exposed a testing gap, where the behavior in
production was broken because of newlist processes hanging awaiting
user input even though we never experienced it in testing due to the
-q addition there.
Change-Id: I550ea802929235d55750c4d99c7d9beec28260f0
Mailman utilizes on-disk queues to store its actions, so doesn't act
unless its queue runners are operating. They're not started at
setup, so perform a service restart to make sure they're running in
our tests.
Change-Id: I4365f6111d4d394ed7f845660d9f342551c31e80
Zuul change I6d7e7e7a9e19d46a744f9ffac8d532fc6b4bba01 introduced a
multi-line formatter that makes exceptions and other multi-line output
much easier to follow in the logs. Use it here for the simple
formatter in the production Zuul deployment.
Change-Id: I9a8aad8a90f5f4080cdb872d0ed65697a180f57c
Mailman v2.1 is still a Python2-only application, and expects
/usr/bin/python to be present. On Ubuntu Focal, there is no such
symlink provided by the Python 2.7 packages, and an extra
python-is-python2 transitional package is used to explicitly create
it in cases where that's required.
Change-Id: I37ca2bd7011afdb3b97e34cdc24ff455b9fb0498
Our deployment tests don't need to send E-mail messages. More to the
point, they may perform actions which would like to send E-mail
messages. Make sure, at the network level, they'll be prevented from
doing so. Also allow all connections to egress from the loopback
interface, so that services like mailman can connect to the Exim MTA
on localhost.
Add new rolevars for egress rules to support this, and also fix up
some missing related vars in the iptables role's documentation.
Change-Id: If4acd2d3d543933ed1e00156cc83fe3a270612bd
This adds a zuul-client config file as well as a convenience script
to execute the docker container to the schedulers.
Change-Id: Ief167c6b7f0407f5eaebecde552e8d91eb3d4ab9
This adds a keycloak realm to the Zuul auth config, so that we can
log into the zuul web ui with our test realm in keycloak.
Change-Id: Iec3777a6ea1cba0e108c7e44067d69b61cbb34a7
This used to be called "bridge", but was then renamed with
Ia7c8dd0e32b2c4aaa674061037be5ab66d9a3581 to install-ansible to be
clearer.
It is true that this is installing Ansible, but as part of our
reworking for parallel jobs this is the also the synchronisation point
where we should be deploying the system-config code to run for the
buildset.
Thus naming this "boostrap-bridge" should hopefully be clearer again
about what's going on.
I've added a note to the job calling out it's difference to the
infra-prod-service-bridge job to hopefully also avoid some of the
inital confusion.
Change-Id: I4db1c883f237de5986edb4dc4c64860390cc8e22
We thought the latest image would be happy based on updates that
tristanC made. Unfortunately ssh complains with:
2021-12-06 18:33:14.616 [ThreadId 19]: [ERROR] ssh process exited: 255
2021-12-06 18:33:15.618 [ThreadId 19]: Connecting to review.opendev.org:29418
No user exists for uid 11000
Undo only the uid:gid settings as the new image seems to work otherwise.
Change-Id: I9c0963ea5c78cecb99e0070b06f9ebd8876a3157
Some extra steps are needed to use keycloak with a reverse proxy.
This adjusts the apache config to send the required headers and
the keycloak server config to use them.
Since the openid configuration json page is constructed entirely
from these headers (and not from static configuration), this is
a good test that the entire system is working.
Change-Id: I662dc85836d640cb732f12f39e9a61607767fcf3
In reviews on https://review.opendev.org/819923 we discovered we
are inconsistent in how we create certs. Suggest a specific course
of action and record the reasoning.
Change-Id: I974a1717a74e759ca8805dcb707efc7fe29ba53f
This adds a keycloak server so we can start experimenting with it.
It's based on the docker-compose file Matthieu made for Zuul
(see https://review.opendev.org/819745 )
We should be able to configure a realm and federate with openstackid
and other providers as described in the opendev auth spec. However,
I am unable to test federation with openstackid due its inability to
configure an oauth app at "localhost". Therefore, we will need an
actual deployed system to test it. This should allow us to do so.
It will also allow use to connect realms to the newly available
Zuul admin api on opendev.
It should be possible to configure the realm the way we want, then
export its configuration into a JSON file and then have our playbooks
or the docker-compose file import it. That would allow us to drive
change to the configuration of the system through code review. Because
of the above limitation with openstackid, I think we should regard the
current implementation as experimental. Once we have a realm
configuration that we like (which we will create using the GUI), we
can chose to either continue to maintain the config with the GUI and
appropriate file backups, or switch to a gitops model based on an
export.
My understanding is that all the data (realms configuration and session)
are kept in an H2 database. This is probably sufficient for now and even
production use with Zuul, but we should probably switch to mariadb before
any heavy (eg gerrit, etc) production use.
This is a partial implementation of https://docs.opendev.org/opendev/infra-specs/latest/specs/central-auth.html
We can re-deploy with a new domain when it exists.
Change-Id: I2e069b1b220dbd3e0a5754ac094c2b296c141753
Co-Authored-By: Matthieu Huin <mhuin@redhat.com>
This will allow us to issue internally generated auth tokens so
that we can use the zuul CLI to perform actions against the REST
API.
Change-Id: I09cafa2e820f5d0e7fa9ada00b9622de093242c7
This makes the haproxy role more generic so we can run another (or
potentially even more) haproxy instance(s) to manage other services.
The config file is moved to a variable for the haproxy role. The
gitea specific config is then installed for the gitea-lb service by a
new gitea-lb role.
statsd reporting is made optional with an argument. This
enables/disables the service in the docker compose.
Role documenation is updated.
Needed-By: https://review.opendev.org/678159
Change-Id: I3506ebbed9dda17d910001e71b17a865eba4225d
Ansible Galaxy indexes tarballs of Ansible roles and collections at
a central site, which in turn points to a dedicated Amazon S3
subdomain. The tools which consume it support overriding the default
Galaxy URL with any arbitrary one, so should be able to take
advantage of this in CI jobs.
Change-Id: Ib5664e5588f7237a19a2cdb6eec3109452e8a107
It complains about not being able to get or create the default cache
directory (but doesn't tell us what that directory is). We'll have to
sort this out later.
Change-Id: I5ce7a875ede77c6203d1b5d06da97f8c52ee48e1
This updates the lodgeit paste service to run under a dedicated user. We
defer on updating the image to do this as we should coordinate with
vexxhost on how that will impact them. This should be fine though as
gerritbot updates proved we can run it this way.
Change-Id: I44d3c53a01be475db1bfa17200da0a4800f85628
The current opendev-infra-prod-base job sets up the executor to log
into bridge AND copies in Zuul's checkout of system-config to
/home/zuul/src.
This presents an issue for parallel operation, as every production job
is cloning system-config ontop of each other.
Since they all operate in the same buildset, we only need to clone
system-config from Zuul once, and then all jobs can share that repo.
This adds a new job "infra-prod-setup-src" which does this. It is a
dependency of the base job so should run first.
All other jobs now inhert from opendev-infra-prod-setup-keys, which
only sets up the executor for logging into bridge.
Change-Id: I19db98fcec5715c33b62c9c9ba5234fd55700fd8
Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/807807
The dependent change moves this into the common infra-prod-base job so
we don't have to do this in here.
Change-Id: I444d2844fe7c7560088c7ef9112893da1496ae62
Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/818189
The known_host key is written out by the parent infra-prod-base job in
the run-production-playbook.yaml step [1]. We don't need to do this
here again.
[1] 2c194e5cbf/playbooks/zuul/run-production-playbook.yaml (L1)
Change-Id: I514132b2dbc20ac321a79ca2eb6d4c8b11c4296d
Missed this with I483c2982a6931e7d6fc97ab82f7750b72d2ef265; this
ensure the mirror webserver exports the directory.
Change-Id: I6e14cdace213a6af6df65b8ddb09bb3a167fbf9b
This is a re-implementation of
I195ebee548071b0b89bd5bf64b251595271178ca that puts 9-stream in a
separate AFS volume
(Note the automated volume name "mirror.centos-stream" comes just
short of the limit)
Change-Id: I483c2982a6931e7d6fc97ab82f7750b72d2ef265
This reverts commit 8591ce2b5c.
It did not click that this is written to use
/afs/.openstack.org/mirror/centos-stream as the base directory. The
mirror/ directory has volumes mounted in it -- i.e. centos-stream has
to be a new volume (and also has to be "vos released" separately, the
existing script won't do it).
The simplest way to do this is to treat this separately. I'll propose
this in a follow-on.
Change-Id: If7b8239adf7635da4f0c317287d23daf5ab0f4bf
It picks the rackspace mirror from this list
https://admin.fedoraproject.org/mirrormanager/mirrors/CentOS/9-stream/x86_64
which is present in US.
It moves base directory to centos-stream to be consistent to centos
mirrors.
We will only synchronize x86_64 and aarch64 arches as those are the only
ones used in opendev CI. We also exculde source and debug directories to
optimize space usage as those are only required for debugging purposes.
Change-Id: I195ebee548071b0b89bd5bf64b251595271178ca
It looks like 6 hours is too infrequent and is enough time for the
disk to fill up when we're busy. Instead, purge old snapshots every
2 hours, which looks like it should give us plenty of headroom with
our current usage pattern.
Change-Id: Ieb92d052e633e9326c41367442f036cc333c40f2
Marking a file as "reviewed" will update the accountPatchDb database
and test the mariadb connection.
Change-Id: Ifaee5981e0977d7d1135275e7d8a0790075f670b
In order to avoid unfortunate collisions with statically assigned
container account UIDs and GIDs, cap normal users at 9999. That way
we can set our containers to use IDs 10000 and above.
Make sure adduser/addgroup's adduser.conf gets adjusted to match the
values we set in the login.defs referenced by the lower-level
useradd/groupadd tools too. We're not using non-Debian-derivative
servers these days, so don't bother to try making this work on other
distributions for the time being.
Change-Id: I0068d5cea66e898c35b661cd559437dc4049e8f4
The mariadb container is overriding these and we can race ansible
setting them back to root and the mariadb container starting up
resulting in a sad database.
Change-Id: Ib88f6aec83e73baf95a660165d13839f7baeed3d
See I8d8ce5c62c660875d5c6eed54c686996576ec9df; mariadb containers
chown this to their internal user, we don't want to reset it.
Change-Id: If33a26438c6aa63d0ef0e02bdad6a643070be922
We are currently re-chowning the running db directories back to root,
causing havoc for the db. Drop the explicit permissions to avoid
this.
Change-Id: I8d8ce5c62c660875d5c6eed54c686996576ec9df
Gerrit 3.4 deprecates HTML-based plugins, so the old theme doesn't
work. I have reworked this into a javascript plugin.
This should look the same, although I've achieved things in different
ways.
This doesn't register light and dark variants; since
background-primary-color is white, by setting the
header-background-color to this we get white behind the header bar,
and it correctly switches to the default black(ish) when in dark mode
(currently its seems the header doesn't obey dark mode, so this is an
improvement).
I'm not sure what's going on with the extant header-border-image which
is a linear gradient all of the same color. I modified this down to
1px (same as default) and made it fade in-and-out of the logo colour,
just for fun.
Change-Id: Ia2e32731c1cfe97639de2ec0e7660c7ed583e045
We may see an archive with ".checkpoint" on the end, as described in
[1]; the short version is this that borg stamps this every 30 minutes
and may appear if a long backup is interrupted. Skip this when making
the list of archives to prune.
We noticed this on wiki-test; for clarity the list of archives looks
like
...
wiki-upgrade-test-filesystem-2021-02-16T02:56:09.checkpoint Tue, 2021-02-16 02:56:11 [c444a0765e5791f3f68f08624d1efd80bf8a3ebc96bb225f08e4013befa2b460]
wiki-upgrade-test-filesystem-2021-02-16T17:45:04 Tue, 2021-02-16 17:45:06 [b901b55ac3bf9abecba024caebad5ba7cd1a966e3f00b366f6cff45feba7bdff]
wiki-upgrade-test-mysql-2021-02-16T18:35:09 Tue, 2021-02-16 18:35:11 [1d38cd3b4b1b3927b543e4ccc6c794cd3a513a70979ff025bbf303e1fe5e490f]
wiki-upgrade-test-filesystem-2021-02-17T17:45:05 Wed, 2021-02-17 17:45:07 [f665e275c0014a21b82efaece5d36525a4ce6cb423253d5bd0b1323b230fa53a]
...
[1] https://borgbackup.readthedocs.io/en/stable/faq.html#if-a-backup-stops-mid-way-does-the-already-backed-up-data-stay-there
Change-Id: Ia33f46305ef8f541efb7c7150d4bb2e977b01d46
We previously set the limit to 70200M on a ~98GB filesystem.
Unfortunately we are able to jump from the ~70GB limit to a full
filesystem before htcachclean happens to run again. Reduce the limit to
60000M to give us more headroom and hopefully avoid filling the fs
between cache clean runs.
Change-Id: I8aa45eb0c396b54dbb3ec84e5ba8fd4ec7da9e27
Rather than restarting the whole scheduler group, just restart
zuul02, which is our only production scheduler. That will allow us
to boot zuul01 as a secondary scheduler and manually add/remove it
for testing.
Once we can reliably run two schedulers, we can revert this change.
Change-Id: I5518ea1d3a6a1d48460b0436d4d1eaf9d52b7ddb
There were questions around project-config syncing and the way it is
done in manage-projects during our last project renaming. I've since
read through teh code to try and understand things better and have added
comments to manage-projects.yaml indicating the risks here and why we
approach it the way we do. There is no functional change only an attempt
at better understanding for the future.
Change-Id: I60aa58a36108edce3e00ecf2ac10be3dee7e8ea0
Last week when we were attempting to only update the subset of projects
that were renamed in gitea we accidentally updated all projects. The
good news is this didn't take significant amounts of time (just a few
minutes).
We should be able to enforce the metadata for all projects given the
cost is now much lower than it was in the past. This will keep things up
to date after renames but also generally if projects update descriptions
or bug tracking locations.
Change-Id: Ief2bb1eb2b11a13fafbe52650317d54d6a0fc824
This reverts commit a39a939e03.
Turns out that ansible module args don't get typed the way we expect
them. This means having a Boolean or List type argument just ends up in
confusion and always_update being truthy every which way. Revert until
we can fix this properly.
Change-Id: I596fe6883098ba636b1cad5196d1fdd76ff19076
The static server in the ptgbot container is very simple; it will be
much better to have apache caching the files which essentially never
change.
Change-Id: I8056d8c529c60f4b95aaca549528b6aa8465fa78
Setting the gitea_always_update var for the gitea-git-repos role to
a list will filter metadata updates to only the project names
included in the supplied list. False and True still have their prior
meanings of do no metadata updates or force metadata updates for
every project we host.
Add testing for this, and also actually test that the rename
playbook renamed something.
Get rid of the git clone in the playbook since it's no longer
relevant to how we run things anyway, we'll instead want to rely on
the Zuul supplied projects.yaml path.
Change-Id: Id8238b232caffc242c6bda9fe39eb7e65fe5e059
Sometimes we observe failures to clone acme.sh from GitHub. Retry it
up to three times with a two-second delay between each try, in hopes
of failing these jobs less often.
While we're here, update the URL to a more current one which doesn't
need redirecting.
Change-Id: I5179c0482afcb407b7d28d4c3ce73d96d41c6493
This removes the old config to choose the old change screen by default
as everything is polygerrit now.
We remove the pre plugin melody config as melody now ships as a plugin
and has separate configuration.
We remove old theming information as that is supplied via external files
now.
We remove anonymous git download config because we don't set
gerrit.canonicalGitUrl which is required for this to work. We don't set
that because we don't have a git:// server anymore.
Bump the lucene thread count from 4 to 8 as we have more cores on the
system we run on.
Finally add some comments to help make sense of config that is left in
place.
Change-Id: Ie0b48e544191839067e66647d2ea32f74ce19ed3
Having two groups here was confusing. We seem to use the review group
for most ansible stuff so we prefer that one. We move contents of the
gerrit group_vars into the review group_vars and then clean up the use
of the old group vars file.
Change-Id: I7fa7467f703f5cec075e8e60472868c60ac031f7
Previously we had set up the test gerrit instance to use the same
hostname as production: review02.opendev.org. This causes some confusion
as we have to override settings specifically for testing like a reduced
heap size, but then also copy settings from the prod host vars as we
override the host vars entirely. Using a new hostname allows us to use a
different set of host vars with unique values reducing confusion.
Change-Id: I4b95bbe1bde29228164a66f2d3b648062423e294
Previously we had a test specific group vars file for the review Ansible
group. This provided junk secrets to our test installations of Gerrit
then we relied on the review02.opendev.org production host vars file to
set values that are public.
Unfortunately, this meant we were using the production heapLimit value
which is far too large for our test instances leading to the occasionaly
failure:
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 9596567552 bytes for committing reserved memory.
We cannot set the heapLimit in the group var file because the hostvar
file overrides those values. To fix this we need to replace the test
specific group var contents with a test specific host var file instead.
To avoid repeating ourselves we also create a new review.yaml group_vars
file to capture common settings between testing and prod. Note we should
look at combining this new file with the gerrit.yaml group_vars.
On the testing side of things we set the heapLimit to 6GB, we change the
serverid value to prevent any unexpected notedb confusion, and we remove
replication config.
Change-Id: Id8ec5cae967cc38acf79ecf18d3a0faac3a9c4b3
The gerrit config diff after the 3.3 ugprade [1] seems to remove some
quotes. We also quote the bug URL, because it seems to think the
trailing # is a comment now.
[1] https://etherpad.opendev.org/p/gerrit-upgrade-3.3
Change-Id: I3ca0ec925a0e6da33a1cbe2333c118b1baa7257c
While under development, the subdomain for the PTG site was
originally written as ptgbot.opendev.org and this is what was
communicated to event organizers. Mass communications subsequently
went out including this for URLs to the service. In order to make
the content from those announcements viable, add the additional name
to our configuration so we can redirect from it to the name we
eventually settled on.
While we're adjusting vhost metadata, make the ServerAdmin
directives between the HTTP and HTTPS vhosts for the service
consistent.
Change-Id: I726069f83b792fa31d92b759adc5c1214ca087fa
In order to use Rewrite* directives, mod_rewrite must be activated
in the vhost via RewriteEngine.
Change-Id: I495ee5e9fd3b1d489122d6e282d3a91d1035c126
The default channel name in the ptgbot role defaults did not
correctly specify a starting hash which it requires, but also the
test jobs seem to need it set in the eavesdrop group vars specific
to testing.
Change-Id: I16cdeac4f7af50e2cac36c80d78f3a87f482e4aa
This shifts our Gerrit upgrade testing ahead to testing 3.3 to 3.4
upgrades as we have upgraded to 3.3 at this point.
Change-Id: Ibb45113dd50f294a2692c65f19f63f83c96a3c11
This bumps the gerrit image up to our 3.3 image. Followup changes will
shift upgrade testing to test 3.3 to 3.4 upgrades, clean up no longer
needed 3.2 images, and start building 3.4 images.
Change-Id: Id0f544846946d4c50737a54ceb909a0a686a594e
Set the channel we want ptgbot joining in production with a group
var, like we do for statusbot's channel list. Correct the password
var name to match what's used in the template for production (and
matches the override set in our private hostvars on the bastion).
Clean up the unnecessary auth nicks list which was copied from the
statusbot config but is entirely unused. Also get rid of some
unnecessary empty lines in the defaults as they really don't make
the file any more readable.
Change-Id: Id026b89d642eae13feba374e4f3ec610b543e530
We set the letsencrypt_self_generate_tokens value to True in testing
which means the variable is valid and exists in testing. However, in
production this variable isn't set and doesn't ahve a default so we get:
The task includes an option with an undefined variable. The error was:
'letsencrypt_self_generate_tokens' is undefined
Fix this by setting the default value for this var to False. Also, add
it to the README of letsencrypt-request-certs as this is where it is
primarily used.
Change-Id: I862df6ea3ff7f3a1df2a088b04d230bb618aaa85
The dependent change exports the ptgbot website on port 8000 in the
container. Proxy this through apache.
Depends-On: https://review.opendev.org/c/openstack/ptgbot/+/812417
Change-Id: Idf9e9f5ffad981427d24a3476c0c1f244721d917
Currently we connect to the LE staging environment with acme.sh during
CI to get the DNS-01 tokens (but we never follow-through and actually
generate the certificate, as we have nowhere to publish the tokens).
We've known for a while that LE staging isn't really meant to be used
by CI like this, and recent instability has made the issue pronounced.
This modifies the driver script to generate fake tokens which work to
ensure all the DNS processing, etc. is happening correctly.
I have put this behind a flag so the letsencrypt job still does this
however. I think it is worth this job actually calling acme.sh to
validate this path; this shouldn't be required too often.
Change-Id: I7c0b471a0661aa311aaa861fd2a0d47b07e45a72
We have seen a case where the weekly verification run conflicted with
an in-progress backup run. Make the verification step wait for up to
an hour for the lock to allow backups to complete.
Change-Id: Id87dd090c7cd652695ab0c4aa73477cf0d72c28d
This file used to be managed by puppet-gerrit and it seems we missed
converting it to Ansible. Add it with the contents from the server.
Change-Id: I10a10166446941d2676ae9181fc74b5a1408c5ed
This reverts commit aa5623982f.
The MIT mirror seems to now be missing Fedora 34 indices, but the
one we were using before at UH looks just fine now.
Change-Id: I59649ea93cc1ce13715096dcd0b8f828ce6b6724
Nginx doesn't seem to support explcit intermedate cert chains [0] and we
need to supply all of the certs together in a single file. Thankfully
acme.sh does this and calls it the fullchain.cer file. Use that in the
nginx config for graphite to fix issues with ssl verification to this
service.
[0] http://nginx.org/en/docs/http/configuring_https_servers.html#chains
Change-Id: I318fb92a30c1593c2a2e4cb37496b16f17472f1d
We move robots.txt to custom/ instead of custom/public/ as
custom/public/ is now served at /assets/ via the gitea webserver and we
need robots.txt at the root. Related to this we update image urls to be
prefixed with AssetUrlPrefix in their paths so that if this path changes
against in the future we should automatically accomodate that.
Change-Id: I8ce5fe8ff342617ff156a401be8418d593fd35c4
In order to avoid unnecessary browser requests to other sites,
install a copy of the OpenDev logo on the Lodgeit server and serve
it from there rather than pointing at one served from Gitea.
Change-Id: I4c3678a1de8ca4a41cd0c64aab71b2e0e25373af
When generically rejecting connections, we'd prefer to signal to
users clearly that it's the firewall rejecting them. For IPv4 we
previously emitted generic ICMP "no route to host" responses, but
this tends to make it look incorrectly like a routing failure.
Switch to flagging our error responses as "administratively
prohibited" which is more accurate and less confusing. We're also
already using icmp6-adm-prohibited for the v6 rules, so this makes
our v4 ruleset more consistent.
Note that the iptables-extensions(8) manpage indicates "Using
icmp-admin-prohibited with kernels that do not support it will
result in a plain DROP instead of REJECT" but all our kernels should
have support for it these days so this isn't a concern.
Change-Id: Id423f3ec03d0c3c4e40ddef34c38f97167b173f6
Previously we were doing this weekly. Gerrit does this daily. "Split"
the difference and do gitea every other day.
We have noticed that replication to gitea can be slow at times. One idea
is that the less packed repos on the gitea side may make negotiating the
updates slower. Pack more often to see if this helps.
Change-Id: I8961007dce3e448bfdbf1c5f3e8dfc5ec8eb82fb
Instead of using the opendev.org/... logo file, host a copy from
gerrit's static location and use that. This isolates us from changes
to the way gitea serves its static assets.
Change-Id: I8ffb47e636a59e5ecc3919cc7a16d93de3eae08d
Copy static files directly into the container image instead of
managing them dynamically with Ansible.
Change-Id: I0ebe40ad2a97e87b00137af7c93a3ffa84929a2e
This currently uses a file served from gitea's staic assets; to
isolate us from changes to gitea's file layout switch this to use the
canonical file directly from system-config/assets.
Change-Id: Ibf67040af2b0a18261621a120ee26c78020e3ace
This does local backups of the nodepool zk image image data to
/var/log/nodepool on the nodepool-builders. These hosts don't get
offsite backups but we run mutliple redundant servers. This data isn't
critical as we can start from scratch, but may be useful if we don't
want to go through all that trouble.
Change-Id: I7d150df9c0d9566ef2d32167cea535e29822cfa2
We are seeing that replication tasks occasionally sit around forever and
have had to take manual intervention. One theory is that this is related
to networking between the gerrit server and the gitea servers. We don't
set maxRetries which means replication should be retried infinitely
which means if we hit the timeout we should try again. 15 minutes was
sort of arbitrarily chosen as ~twice the time it takes to clone a large
repo like nova.
Change-Id: Iec2536ad149a2e625a1f0107b9fcee3079493607
This switch testing of lists.openstack.org to Focal and we make a CGI
env var update to accomodate newer mailman.
Specifically newer mailman's CGI scripts filter env vars that it will
pass through. We were setting MAILMAN_SITE_DIR to vhost our mailman
installs with apache2, but that doesn't pass the filter and is removed.
HOST is passed through so we update our scripts, apache vhost configs,
exim, and init scripts to use the HOST env var instead.
Change-Id: I5c8c70c219669e37b7b75a61001a2b7f7bb0bb6c
INAP mtl01 region is now owned by iWeb. This updates the cloud launcher
to use the new name and instructs the mirror in this cloud to provision
ssl certs for the old inap and new iweb names as well as updating
clouds.yaml files.
Change-Id: I1256a2e24df1c79dea06716ae4dfbcfe119c13f8
The Open Infrastructure Foundation's developers who maintain the
OpenStackID software are taking over management of the site itself,
and have deployed it on new servers. DNS records have already been
updated to the new IP address, so it's time to clean up our end in
preparation for deleting the old servers we've been running.
OpenStackID is still used by some services we run, like RefStack and
Zanata, and we're still hosting the OpenStackID Git repository and
documentation, so this does not get rid of all references to it.
Change-Id: I1d625d5204f1e9e3a85ba9605465f6ebb9433021
The rsync mirror we were relying on ended up incomplete on a recent
sync, causing all OpenSUSE 15 jobs to fail updating the package
lists. Switch to an alternative that seems to have all the same
things for which we used the previous one.
Change-Id: I661bdbfcbc766966793cd64d7f21201879d3dbaa
There is a change (the depends on) to modify how zuul executors handle
SIGTERM. Update our executor config to preserve the old behavior of
stopping the instance immediately rather than doing a graceful stop.
If we need to we can still request graceful stops directly using the
graceful stop command.
Depends-On: https://review.opendev.org/c/zuul/zuul/+/804464/
Change-Id: I76a2646a13a71d190be265354de18468bc93184c
This version fixed an issue introduced with the behalf of feature
where an extra space was added in "proposed :" where it should be
"proposed:".
Change-Id: I6c58622aa86a5234cc3e2dca957720be9f6549cd
We now depend on the reverse proxy not only for abuse mitigation but
also for serving .well-known files with specific CORS headers. To
reduce complexity and avoid traps in the future, make it non-optional.
Change-Id: I54760cb0907483eee6dd9707bfda88b205fa0fed
Zuul is moving to an unbridged Matrix room. Remove eavesdrop from
the OFTC room, and add the Matrix room to the two new Matrix bots.
Change-Id: I9bf34c1f67c6dac41c3761f8ccde4d7fa76bbf89
This change adds a missing step to accept matrix term required
to use the identity lookup service.
Change-Id: I4f6ad60d983bfc82342ee7d69659074c91296dc1
We create (a currently test only) playbook that upgrades zuul. This job
then runs through project creation and renaming and testinfra testing on
the upgraded gerrit version.
Future improvements should consider loading state on the old gerrit
install before we upgrade that can be asserted as well.
Change-Id: I364037232cf0e6f3fa150f4dbb736ef27d1be3f8
The pastebinit command-line tool hard-codes an allowed list of
pastebin URLs, one of which is "http://paste.openstack.org" so
redirecting to HTTPS and to other hostnames seems to break it.
It has a specific user-agent, so allow plain HTTP access for this
tool, but redirect others.
Change-Id: Ia7c983986e6e9c08299ded5282a83761448b35bb