Replace the optional src parameter with the required path parameter,
this seems to have been a simple typo.
Change-Id: Ib95a11b28eed138225b57062faeb7c344067c991
The etsencrypt_certs variable defined here in the "static" group file
is overwritten by the host variable. This is not doing anything (and
we don't have a logs.openstack.org any more as it is all in object
storage), remove it.
Change-Id: I6910d6652c558c94d71b1609d1194b654bc5b42d
Jammy nodes appear to lack the /etc/apt/sources.list.d dir by default.
Ensure it exists in the install-docker role before we attempt to
install a deb repo config to that directory for docker packages.
Change-Id: I859d31ed116607ffa3d8db5bfd0b805d72dd70c0
This is the first step in running our servers on jammy. This will help
us boot new servers on jammy and bionic replacements on jammy.
Change-Id: If2e8a683c32eca639c35768acecf4f72ce470d7d
The most recent version of the grafana-oss:latest container seems to be
a beta version with some issues, or maybe we need to adapt our
deployment. Until we do this, pin the container to the latest known
working version.
Change-Id: Id50bf3121f3009f36f0f9961cf5211053410a576
How we got here - I3e99b80e442db0cc87f8e8c9728b7697a5e4d1d3 split the
log collection into a post-run job so we always collect logs, even if
the main run times out. We then realised in
Ic18c89ecaf144a69e82cbe9eeed2641894af71fb that the log timestamp fact
doesn't persist across playbook runs and it's not totally clear how
getting it from hostvars interacts with dynamic inventory.
Thus take an approach that doesn't rely on passing variables; this
simply pulls the time from the stamp we put on the first line of the
log file. We then use that to rename the stored file, which should
correspond more closely with the time the Zuul job actually started.
To further remove confusion when looking at a lot of logs, reset the
timestamps to this time as well.
Change-Id: I7a115c75286e03b09ac3b8982ff0bd01037d34dd
The earlier problems identified with using mod_substitute have been
narrowed down to the new PEP 691 JSON simple API responses from
Warehouse, which are returned as a single line of data. The
currently largest known project index response we've been diagnosing
this problem with is only 1524169 characters in length, but there
are undoubtedly others and they will only continue to grow with
time. The main index is also already over the new 5m limit we set
(nearly double it), and while we don't currently process it with
mod_substitute, we shouldn't make it harder to do so if we need to
later.
Change-Id: Ib32acd48e5166780841695784c55793d014b3580
Reflect changes to mirror vhost configs immediately in their running
Apache services by notifying a new reload handler.
Change-Id: Ib3c9560781116f94b0fdfc56dfa5df3a1af74113
We've been getting the following error for some pages we're proxying
today:
AH01328: Line too long, URI /pypi/simple/grpcio/,
While we suspect PyPI or its Fastly CDN may have served some unusual
contents for the affected package indices, the content gets cached
and then mod_substitute trips over the result because it (as of
2.3.15) enforces a maximum line length of one megabyte:
https://bz.apache.org/bugzilla/show_bug.cgi?id=56176
Override that default to "5m" per the example in Apache's
documentation:
https://httpd.apache.org/docs/2.4/mod/mod_substitute.html
Change-Id: I5351f0465287f695fb2f1957062182fd3bf6c226
Update the Gerrit upgrade job to check for new on disk h2 cache files.
We discovered well after the fact that Gerrit 3.5 added new (large)
cache files to disk that would've been good to be aware of prior to the
upgrade. This change will check for new files and produce a message if
they exist.
Change-Id: I4b52f95dd4b23636c0360c9960d84bbed1a5b2d4
kernel.org has been rejecting rsync attempts with an over-capacity
message for several days now. Switch to the facebook mirror which
seems to be working for 8-stream.
Change-Id: I98de9dd827a3c78a023b677da854089593d5a454
This reverts commit 21c6dc02b5.
Everything appears to be working with Ansible 2.9, which does seem to
sugguest reverting this will result in jobs timing out again. We will
monitor this, and I76ba278d1ffecbd00886531b4554d7aed21c43df is a
potential fix for this.
Change-Id: Id741d037040bde050abefa4ad7888ea508b484f6
When this moved with I3e99b80e442db0cc87f8e8c9728b7697a5e4d1d3 we lost
access to the variable set as a fact; regenerate it. In a future
change we can look at strategies to share this with the start
timestamp (not totally simple as it is across playbooks on a
dynamicaly added host).
Change-Id: Ic18c89ecaf144a69e82cbe9eeed2641894af71fb
We've been seeing ansible post-run playbook timeouts in our infra-prod
jobs. The only major thing that has changed recently is the default
update to ansible 5 for these jobs. Force them back to 2.9 to see if the
problem goes away.
Albin Vass has noted that there are possibly glibc + debian bullseye +
ansible 5 problems that may be causing this. If we determine 2.9 is
happy then this is the likely cause.
Change-Id: Ibd40e15756077d1c64dba933ec0dff6dc0aac374
I3e99b80e442db0cc87f8e8c9728b7697a5e4d1d3 added this to ensure that we
always collect logs. However, since this doesn't have bridge
dynamically defined in the playbook, it doesn't run any of the steps.
On the plus side, it doesn't error either.
Change-Id: I97beecbc48c83b9dea661a61e21e0d0d29ca4733
If the production playbook times out, we don't get any logs collected
with the run. By moving the log collection into a post-run step, we
should always get something copied to help us diagnose what is going
wrong.
Change-Id: I3e99b80e442db0cc87f8e8c9728b7697a5e4d1d3
These files got moved around and refactored to better support testing of
the Gerrit 3.5 to 3.6 upgrade path. Make sure we trigger the test jobs
when these files are updated.
Change-Id: I5a520e8a8a7c794a761279d4fb98c23e5d25f0ad
ansible_date_time is actually the cached fact time that has little
bearing on the actual time this is running [1] -- which is what you
want to see when, for example, tracing backwards to see why some runs
are randomly timing out.
[1] https://docs.ansible.com/ansible/latest/user_guide/playbooks_vars_facts.html#ansible-facts
Change-Id: I8b5559178e29f8604edf6a42507322fc928afb21
We had two patches that we were carrying locally via iwienands' fork:
https://github.com/ProgVal/Limnoria/pull/1464https://github.com/ProgVal/Limnoria/pull/1473
Both appear to have made it into upstream. Lets go ahead and install
directly from the source. We checkout the most recent tag of master
which seems to be how they checkpoint things. Their most recent proper
release tags are more than a decade old. They have decent CI though so I
expect checking out the checkpoint tag will work fine.
Change-Id: I9fcf17a148a27c2bbdd119961e9df5b38bd6b396
This is a bugfix release that gitea suggests we update to for important
fixes.
Changelog can be found at:
https://github.com/go-gitea/gitea/blob/v1.16.9/CHANGELOG.md
One thing I note is the inclusion of support for git safe.directory in
newer git versions. Our bullseye git version is too old to support this,
but we also configure consistent users so this should be a non issue for
us.
Change-Id: I8c3e4e5eead13eeb72bee3ae6c8b89081cdc5cf0
haproxy only logs to /dev/log; this means all our access logs get
mixed into syslog. This makes it impossible to pick out anything in
syslog that might be interesting (and vice-versa, means you have to
filter out things if analysing just the haproxy logs).
It seems like the standard way to deal with this is to have rsyslogd
listen on a separate socket, and then point haproxy to that. So this
configures rsyslogd to create /var/run/dev/log and maps that into the
container as /dev/log (i.e. don't have to reconfigure the container at
all).
We then capture this sockets logs to /var/log/haproxy.log, and install
rotation for it.
Additionally we collect this log from our tests.
Change-Id: I32948793df7fd9b990c948730349b24361a8f307
This explicitly tests connection through the load-balancer to the
gitea backend to ensure correct operation.
Additionally, it adds a check of the haproxy output to make sure the
back-ends are active (that's the srv_op_state field, c.f. [1])
[1] http://docs.haproxy.org/2.6/management.html#9.3-show%20servers%20state
Change-Id: Ia896134d6a9b6951acebfbf8b0b32a7ef8b87777
Move the paste testing server to paste99 to distinguish it in testing
from the actual production paste service. Since we have certificates
setup now, we can directly test against "paste99.opendev.org",
removing the insecure flags to various calls.
Change-Id: Ifd5e270604102806736dffa86dff2bf8b23799c5
To make testing more like production, copy the OpenDev CA into the
haproxy container configuration directory during Zuul runs. We then
update the testing configuration to use SSL checking like production
does with this cert.
Change-Id: I1292bc1aa4948c8120dada0f0fd7dfc7ca619afd
Some of our testing makes use of secure communication between testing
nodes; e.g. testing a load-balancer pass-through. Other parts
"loop-back" but require flags like "curl --insecure" because the
self-signed certificates aren't trusted.
To make testing more realistic, create a CA that is distributed and
trusted by all testing nodes early in the Zuul playbook. This then
allows us to sign local certificates created by the letsencrypt
playbooks with this trusted CA and have realistic peer-to-peer secure
communications.
The other thing this does is reworks the letsencrypt self-signed cert
path to correctly setup SAN records for the host. This also improves
the "realism" of our testing environment. This is so realistic that
it requires fixing the gitea playbook :). The Apache service proxying
gitea currently has to override in testing to "localhost" because that
is all the old certificate covered; we can now just proxy to the
hostname directly for testing and production.
Change-Id: I3d49a7b683462a076263127018ec6a0f16735c94
A missed detail of the HTTPS config migration,
/usr/lib/mailman/Mailman/Defaults.py explicitly sets this:
PUBLIC_ARCHIVE_URL = 'http://%(hostname)s/pipermail/%(listname)s/'
Override that setting to https:// so that the archive URL embedded
in E-mail headers will no longer unnecessarily rely on our Apache
redirect. Once merged and deployed, fix_url.py will need to be rerun
for all the lists on both servers in order for this update to take
effect.
Change-Id: Ie4a6e04a2ef0de1db7336a2607059a2ad42665c2
openEuler 20.03 LTS SP2 was out of data in May 2022, and the newest
LTS version is 22.03 LTS which will be maintained to 2024.03.
This Patch add the 22.03-LTS mirror
Change-Id: I2eb72de4eee22a7a8739320ead8376c999993928