If this flag is set, the logs are copied into the published job.
There's no need to save an encrypted copy of the same thing.
Change-Id: I32ac5e0ac4d2307f2e1df88c5e2ccbe2fd381839
If infra_prod_playbook_collect_log is set, then we copy and publish
the playbook log in the job results.
Currently we skip renaming the log file on bridge in this case,
meaning that we don't keep logs of old runs on bridge. Also, there is
a bug in the bit that resets the timestamp on the logfile (so it is
timestamped by the time it started, no ended) that it isn't checking
this flag, so we end up with a bunch of zero-length files in this
case.
I guess the thinking here was that since the log is published, there's
no need to keep it on bridge as well.
The abstract case here is really only instantiated for
manage-projects, which is the only job we publish the log for. Today
we wanted an older log, but it had already been purged from object
storage.
It seems worth keeping this on-disk as well as publishing it. Remove
the checks around the rename/cleanup. This will also fix the bug of
zero-sized files being created, because the renamed file will be there
now.
Change-Id: Ic5ab52797fef880ae3ec3d92c071ef802e63b778
In thinking harder about the bootstrap process, it struck me that the
"bastion" group we have is two separate ideas that become a bit
confusing because they share a name.
We have the testing and production paths that need to find a single
bridge node so they can run their nested Ansible. We've recently
merged changes to the setup playbooks to not hard-code the bridge node
and they now use groups["bastion"][0] to find the bastion host -- but
this group is actually orthogonal to the group of the same name
defined in inventory/service/groups.yaml.
The testing and production paths are running on the executor, and, as
mentioned, need to know the bridge node to log into. For the testing
path this is happening via the group created in the job definition
from zuul.d/system-config-run.yaml. For the production jobs, this
group is populated via the add-bastion-host role which dynamically
adds the bridge host and group.
Only the *nested* Ansible running on the bastion host reads
s-c:inventory/service/groups.yaml. None of the nested-ansible
playbooks need to target only the currently active bastion host. For
example, we can define as many bridge nodes as we like in the
inventory and run service-bridge.yaml against them. It won't matter
because the production jobs know the host that is the currently active
bridge as described above.
So, instead of using the same group name in two contexts, rename the
testing/production group "prod_bastion". groups["prod_bastion"][0]
will be the host that the testing/production jobs use as the bastion
host -- references are updated in this change (i.e. the two places
this group is defined -- the group name in the system-config-run jobs,
and add-bastion-host for production).
We then can return the "bastion" group match to bridge*.opendev.org in
inventory/service/groups.yaml.
This fixes a bootstrapping problem -- if you launch, say,
bridge03.opendev.org the launch node script will now apply the
base.yaml playbook against it, and correctly apply all variables from
the "bastion" group which now matches this new host. This is what we
want to ensure, e.g. the zuul user and keys are correctly populated.
The other thing we can do here is change the testing path
"prod_bastion" hostname to "bridge99.opendev.org". By doing this we
ensure we're not hard-coding for the production bridge host in any way
(since if both testing and production are called bridge01.opendev.org
we can hide problems). This is a big advantage when we want to rotate
the production bridge host, as we can be certain there's no hidden
dependencies.
Change-Id: I137ab824b9a09ccb067b8d5f0bb2896192291883
Following-on from Iffb462371939989b03e5d6ac6c5df63aa7708513, instead
of directly referring to a hostname when adding the bastion host to
the inventory for the production playbooks, this finds it from the
first element of the "bastion" group.
As we do this twice for the run and post playbooks, abstract it into a
role.
The host value is currently "bridge.openstack.org" -- as is the
existing hard-coding -- thus this is intended to be a no-op change.
It is setting the foundation to make replacing the bastion host a
simpler process in the future.
Change-Id: I286796ebd71173019a627f8fe8d9a25d0bfc575a
This was introduced by Ifbb5b8acb1f231812905cf9643bfec6fbbd08324. The
flag is actually "disabled". Zuul documentation has been updated with
Ib45ec943d4b227ba254354d116440aa521fb6b9e.
Change-Id: Ie0a0d8f4ae137dc12f4c13f901096ee39d9a088e
By setting this variable (added in the dependent change) Zuul's
shell/command override will not write out streaming spool files in
/tmp. In our case, port 19885 is firewalled off to these hosts, so
they will never be used for streaming results.
Change-Id: Ifbb5b8acb1f231812905cf9643bfec6fbbd08324
Depends-On: https://review.opendev.org/855309
Our infra prod jobs use a strftime format string to update log file
modication times. This format string had a stray '%' in it leading to:
"Error while obtaining timestamp for time 2022-08-04T18:02:59 using
format %Y-%m%-%dT%H:%M:%S: '-' is a bad directive in format
'%Y-%m%-%dT%H:%M:%S'"
Fix that by removing the extra '%'.
Change-Id: I934ecb4b24244fdd00fa16de6e4c4ae67542e2fe
Replace the optional src parameter with the required path parameter,
this seems to have been a simple typo.
Change-Id: Ib95a11b28eed138225b57062faeb7c344067c991
How we got here - I3e99b80e442db0cc87f8e8c9728b7697a5e4d1d3 split the
log collection into a post-run job so we always collect logs, even if
the main run times out. We then realised in
Ic18c89ecaf144a69e82cbe9eeed2641894af71fb that the log timestamp fact
doesn't persist across playbook runs and it's not totally clear how
getting it from hostvars interacts with dynamic inventory.
Thus take an approach that doesn't rely on passing variables; this
simply pulls the time from the stamp we put on the first line of the
log file. We then use that to rename the stored file, which should
correspond more closely with the time the Zuul job actually started.
To further remove confusion when looking at a lot of logs, reset the
timestamps to this time as well.
Change-Id: I7a115c75286e03b09ac3b8982ff0bd01037d34dd
When this moved with I3e99b80e442db0cc87f8e8c9728b7697a5e4d1d3 we lost
access to the variable set as a fact; regenerate it. In a future
change we can look at strategies to share this with the start
timestamp (not totally simple as it is across playbooks on a
dynamicaly added host).
Change-Id: Ic18c89ecaf144a69e82cbe9eeed2641894af71fb
I3e99b80e442db0cc87f8e8c9728b7697a5e4d1d3 added this to ensure that we
always collect logs. However, since this doesn't have bridge
dynamically defined in the playbook, it doesn't run any of the steps.
On the plus side, it doesn't error either.
Change-Id: I97beecbc48c83b9dea661a61e21e0d0d29ca4733
If the production playbook times out, we don't get any logs collected
with the run. By moving the log collection into a post-run step, we
should always get something copied to help us diagnose what is going
wrong.
Change-Id: I3e99b80e442db0cc87f8e8c9728b7697a5e4d1d3