Our deployment tests don't need to send E-mail messages. More to the
point, they may perform actions which would like to send E-mail
messages. Make sure, at the network level, they'll be prevented from
doing so. Also allow all connections to egress from the loopback
interface, so that services like mailman can connect to the Exim MTA
on localhost.
Add new rolevars for egress rules to support this, and also fix up
some missing related vars in the iptables role's documentation.
Change-Id: If4acd2d3d543933ed1e00156cc83fe3a270612bd
This adds a zuul-client config file as well as a convenience script
to execute the docker container to the schedulers.
Change-Id: Ief167c6b7f0407f5eaebecde552e8d91eb3d4ab9
This used to be called "bridge", but was then renamed with
Ia7c8dd0e32b2c4aaa674061037be5ab66d9a3581 to install-ansible to be
clearer.
It is true that this is installing Ansible, but as part of our
reworking for parallel jobs this is the also the synchronisation point
where we should be deploying the system-config code to run for the
buildset.
Thus naming this "boostrap-bridge" should hopefully be clearer again
about what's going on.
I've added a note to the job calling out it's difference to the
infra-prod-service-bridge job to hopefully also avoid some of the
inital confusion.
Change-Id: I4db1c883f237de5986edb4dc4c64860390cc8e22
This adds a keycloak server so we can start experimenting with it.
It's based on the docker-compose file Matthieu made for Zuul
(see https://review.opendev.org/819745 )
We should be able to configure a realm and federate with openstackid
and other providers as described in the opendev auth spec. However,
I am unable to test federation with openstackid due its inability to
configure an oauth app at "localhost". Therefore, we will need an
actual deployed system to test it. This should allow us to do so.
It will also allow use to connect realms to the newly available
Zuul admin api on opendev.
It should be possible to configure the realm the way we want, then
export its configuration into a JSON file and then have our playbooks
or the docker-compose file import it. That would allow us to drive
change to the configuration of the system through code review. Because
of the above limitation with openstackid, I think we should regard the
current implementation as experimental. Once we have a realm
configuration that we like (which we will create using the GUI), we
can chose to either continue to maintain the config with the GUI and
appropriate file backups, or switch to a gitops model based on an
export.
My understanding is that all the data (realms configuration and session)
are kept in an H2 database. This is probably sufficient for now and even
production use with Zuul, but we should probably switch to mariadb before
any heavy (eg gerrit, etc) production use.
This is a partial implementation of https://docs.opendev.org/opendev/infra-specs/latest/specs/central-auth.html
We can re-deploy with a new domain when it exists.
Change-Id: I2e069b1b220dbd3e0a5754ac094c2b296c141753
Co-Authored-By: Matthieu Huin <mhuin@redhat.com>
This will allow us to issue internally generated auth tokens so
that we can use the zuul CLI to perform actions against the REST
API.
Change-Id: I09cafa2e820f5d0e7fa9ada00b9622de093242c7
This makes the haproxy role more generic so we can run another (or
potentially even more) haproxy instance(s) to manage other services.
The config file is moved to a variable for the haproxy role. The
gitea specific config is then installed for the gitea-lb service by a
new gitea-lb role.
statsd reporting is made optional with an argument. This
enables/disables the service in the docker compose.
Role documenation is updated.
Needed-By: https://review.opendev.org/678159
Change-Id: I3506ebbed9dda17d910001e71b17a865eba4225d
The current opendev-infra-prod-base job sets up the executor to log
into bridge AND copies in Zuul's checkout of system-config to
/home/zuul/src.
This presents an issue for parallel operation, as every production job
is cloning system-config ontop of each other.
Since they all operate in the same buildset, we only need to clone
system-config from Zuul once, and then all jobs can share that repo.
This adds a new job "infra-prod-setup-src" which does this. It is a
dependency of the base job so should run first.
All other jobs now inhert from opendev-infra-prod-setup-keys, which
only sets up the executor for logging into bridge.
Change-Id: I19db98fcec5715c33b62c9c9ba5234fd55700fd8
Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/807807
The dependent change moves this into the common infra-prod-base job so
we don't have to do this in here.
Change-Id: I444d2844fe7c7560088c7ef9112893da1496ae62
Depends-On: https://review.opendev.org/c/opendev/base-jobs/+/818189
The known_host key is written out by the parent infra-prod-base job in
the run-production-playbook.yaml step [1]. We don't need to do this
here again.
[1] 2c194e5cbf/playbooks/zuul/run-production-playbook.yaml (L1)
Change-Id: I514132b2dbc20ac321a79ca2eb6d4c8b11c4296d
This is a re-implementation of
I195ebee548071b0b89bd5bf64b251595271178ca that puts 9-stream in a
separate AFS volume
(Note the automated volume name "mirror.centos-stream" comes just
short of the limit)
Change-Id: I483c2982a6931e7d6fc97ab82f7750b72d2ef265
Gerrit 3.4 deprecates HTML-based plugins, so the old theme doesn't
work. I have reworked this into a javascript plugin.
This should look the same, although I've achieved things in different
ways.
This doesn't register light and dark variants; since
background-primary-color is white, by setting the
header-background-color to this we get white behind the header bar,
and it correctly switches to the default black(ish) when in dark mode
(currently its seems the header doesn't obey dark mode, so this is an
improvement).
I'm not sure what's going on with the extant header-border-image which
is a linear gradient all of the same color. I modified this down to
1px (same as default) and made it fade in-and-out of the logo colour,
just for fun.
Change-Id: Ia2e32731c1cfe97639de2ec0e7660c7ed583e045
Previously we had set up the test gerrit instance to use the same
hostname as production: review02.opendev.org. This causes some confusion
as we have to override settings specifically for testing like a reduced
heap size, but then also copy settings from the prod host vars as we
override the host vars entirely. Using a new hostname allows us to use a
different set of host vars with unique values reducing confusion.
Change-Id: I4b95bbe1bde29228164a66f2d3b648062423e294
Previously we had a test specific group vars file for the review Ansible
group. This provided junk secrets to our test installations of Gerrit
then we relied on the review02.opendev.org production host vars file to
set values that are public.
Unfortunately, this meant we were using the production heapLimit value
which is far too large for our test instances leading to the occasionaly
failure:
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 9596567552 bytes for committing reserved memory.
We cannot set the heapLimit in the group var file because the hostvar
file overrides those values. To fix this we need to replace the test
specific group var contents with a test specific host var file instead.
To avoid repeating ourselves we also create a new review.yaml group_vars
file to capture common settings between testing and prod. Note we should
look at combining this new file with the gerrit.yaml group_vars.
On the testing side of things we set the heapLimit to 6GB, we change the
serverid value to prevent any unexpected notedb confusion, and we remove
replication config.
Change-Id: Id8ec5cae967cc38acf79ecf18d3a0faac3a9c4b3
The default channel name in the ptgbot role defaults did not
correctly specify a starting hash which it requires, but also the
test jobs seem to need it set in the eavesdrop group vars specific
to testing.
Change-Id: I16cdeac4f7af50e2cac36c80d78f3a87f482e4aa
This shifts our Gerrit upgrade testing ahead to testing 3.3 to 3.4
upgrades as we have upgraded to 3.3 at this point.
Change-Id: Ibb45113dd50f294a2692c65f19f63f83c96a3c11
This bumps the gerrit image up to our 3.3 image. Followup changes will
shift upgrade testing to test 3.3 to 3.4 upgrades, clean up no longer
needed 3.2 images, and start building 3.4 images.
Change-Id: Id0f544846946d4c50737a54ceb909a0a686a594e
Currently we connect to the LE staging environment with acme.sh during
CI to get the DNS-01 tokens (but we never follow-through and actually
generate the certificate, as we have nowhere to publish the tokens).
We've known for a while that LE staging isn't really meant to be used
by CI like this, and recent instability has made the issue pronounced.
This modifies the driver script to generate fake tokens which work to
ensure all the DNS processing, etc. is happening correctly.
I have put this behind a flag so the letsencrypt job still does this
however. I think it is worth this job actually calling acme.sh to
validate this path; this shouldn't be required too often.
Change-Id: I7c0b471a0661aa311aaa861fd2a0d47b07e45a72
Instead of using the opendev.org/... logo file, host a copy from
gerrit's static location and use that. This isolates us from changes
to the way gitea serves its static assets.
Change-Id: I8ffb47e636a59e5ecc3919cc7a16d93de3eae08d
Copy static files directly into the container image instead of
managing them dynamically with Ansible.
Change-Id: I0ebe40ad2a97e87b00137af7c93a3ffa84929a2e
We now depend on the reverse proxy not only for abuse mitigation but
also for serving .well-known files with specific CORS headers. To
reduce complexity and avoid traps in the future, make it non-optional.
Change-Id: I54760cb0907483eee6dd9707bfda88b205fa0fed
We create (a currently test only) playbook that upgrades zuul. This job
then runs through project creation and renaming and testinfra testing on
the upgraded gerrit version.
Future improvements should consider loading state on the old gerrit
install before we upgrade that can be asserted as well.
Change-Id: I364037232cf0e6f3fa150f4dbb736ef27d1be3f8
Etherpad startup says:
2021-08-12 16:08:55.872] [WARN] console - Declaring the sessionKey
in the settings.json is deprecated. This value is auto-generated
now. Please remove the setting from the file. -- If you are seeing
this error after restarting using the Admin User Interface then you
can ignore this message.
So I guess we can remove this.
Change-Id: I5a8da8afe8b128224fa1bc89d5ba06fff16ca29b
We are now using the mariadb jdbc connector in production and no longer
need to include the mysql legacy connector in our images. We also don't
need support for h2 or mysql as testing and prod are all using the
mariadb connector and local database.
Note this is a separate change to ensure everything is happy with the
mariadb connector before we remove the fallback mysql connector from our
images.
Change-Id: I982d3c3c026a5351bff567ce7fbb32798718ec1b
This tests that we can rename both the project and the org the project
lives in. Should just add a bit more robustness to our testing.
Change-Id: I0914e864c787b1dba175e0fabf6ab2648a554d16
Previously we were only managing root's known_hosts via ansible but even
then this wasn't happening because the gerrit_self_hostkey var wasn't
set anywhere. On top of that we need to manage multiple known_hosts
because gerrit must recognize itself and all of the gitea servers.
Update the code to take a dict of host key values and add each entry to
known_hosts for both the root and gerrit2 user.
We remove keyscans from tests to ensure that this update is actually
working.
Change-Id: If64c34322f64c1fb63bf2ebdcc04355fff6ebba2
Thin runs the new matrix-eavesdrop bot on the eavesdrop server.
It will write logs out to the limnoria logs directory, which is mounted
inside the container.
Change-Id: I867eec692f63099b295a37a028ee096c24109a2e
It would be useful to test our rename playbook against gitea and gerrit
when we make changes to these related playbooks, roles, and docker
images. To do this we need to converge our test and production setups
for gerrit a bit more. We create an openstack-project-creator account in
the test gerrit to match prod and we have rename_repos.yaml talk to
localhost for gerrit ssh commands.
With that done we can run the rename_repos.yaml playbook from
test-gitea.yaml and test-gerrit.yaml to help ensure the playbook
functions as expected against these services.
Co-Authored-By: Ian Wienand <iwienand@redhat.com>
Change-Id: I49ffaf86828e87705da303f40ad4a86be030c709
The extant variable name is never set so this never writes anything
out. Move it to a dictionary value. Use stub values for testing,
this way we don't need the "when:".
Additionally remove an unused old template file.
Change-Id: Id96fde79e28f309aa13e16bdda29f004c3c69c4b
This moves review02 out of the review-staging group and into the main
review group. At this point, review01.openstack.org is inactive so we
can remove all references to openstack.org from the groups. We update
the system-config job to run against a focal production server, and
remove the unneeded rsync setup used to move data.
This additionally enables replication; this should be a no-op when
applied as part of the transition process is to manually apply this,
so that DNS setup can pull zone changes from opendev.org.
It also switches to the mysql connector, as noted inline we found some
issues with mariadb.
Note backups follow in a separate step to avoid doing too much at
once, hence dropping the backup group from the testing list.
Change-Id: I7ee3e3051ea8f3237fd5f6bf1dcc3e5996c16d10
The paste service needs an upgrade; since others have created a
lodgeit container it seems worth us keeping the service going if only
to maintain the historical corpus of pastes.
This adds the ansible to deploy lodgeit and a sibling mariadb
container. I have imported a dump of the old data as a test. The
dump is ~4gb and imported it takes up about double that; certainly
nothing we need to be too concerned over. The server will be more
than capable of running the db container alongside the lodgeit
instance.
This should have no effect on production until we decide to switch
DNS.
Change-Id: I284864217aa49d664ddc3ebdc800383b2d7e00e3
This adds a local mariadb container to the gerrit host to hold the
accountPatchReviewDb database. This is inspired by a few things
- since migration to NoteDB, there is only one table left where
Gerrit records what files have been reviewed for a change. This
logically scales with the number of reviews users are doing.
Pulling the stats on this, we can see since the NoteDB upgrade this
went from a very busy database (~300 queries/70 commits per second)
to barely registering one hit per second :
https://imgur.com/a/QGJV7Fw
Thus separating the db to an external host for performance reasons
is not a large concern any more.
- emperically we've done a bad job in keeping the existing hosted db
up-to-date; it's still running mysql 5.1 and we have been hit by
bugs such as the one referenced in-line which silently drops
backups.
- The other gerrit option is to use an on-disk H2 database. This is
certainly an option, however you need special tools to interact
with it for migration, etc. and it's not safe to backup from files
on disk (as opposed to mysqldump). Upstream advice is unclear, and
varies between H2 being a performance bottleneck to this being
ephemeral data that users don't care about. We know how to admin
mariadb/mysql and this allows us to migrate and backup data, so
seems like the best choice.
- we have a pressing need to update the server to a new operating
system. Running the db alongside the gerrit instance minimises
fiddling we have to do manging connections to and migrating the
hosted db systems.
- related to that, we are tending towards more provider independence
for control-plane servers. A hosted database product is not always
provided, so this gives us more flexibility in moving things
around.
- the main concern here is memory usage. "docker stats" reports a
quiescent container, freshly started on a 8GB host:
gerrit-compose_mariadb_1 67.32MiB
After loading a copy of the production table, and then dumping it
back to a file the same container reports:
gerrit-compose_mariadb_1 462.6MiB
The existing remote mysql configuration path remains mostly the same.
We move the gerrit startup into a script rather than a CMD so we can
call it after a "wait for db" script in the mariadb_container case
(this is the reccommeded way to enforce ordering [1]).
Backups of the local container need different dump commands; backups
are relocated to a new file and updated.
Testing is converted to use this rather than a local H2 database.
[1] https://docs.docker.com/compose/startup-order/
Change-Id: Iec981ef3c2e38889f91e9759e66295dbfb499c2e
Currently when we run tests, this connects to OFTC and tries to use
the opendevstatus nick as it is the default. Replace this with a
random username. Also override the channels list, so it only joins
Limnoria was already using a non-conflicting name, but switch it to a
random one for consistency and possible parallel running. This also
already only joins #opendev-sandbox.
Change-Id: I860b0f1ed4f99140dda0f4d41025f0b5fb844115
This installs statusbot on eavesdrop01.opendev.org.
Otherwise it's just config translation and bringing up the daemon.
Change-Id: I246b2723372594e65bcd1ba90215d6831d4c0c72
This enables the new eavesdrop01.opendev.org server in all current
channels. Puppet has been disabled on the old server and we will
manually stop supybot/meetbot and mirgrate logs before this applies.
Change-Id: I4a422bb9589c8a8761191313a656f8377e93422f
The ara-report role used to add this but it hasn't been updated for
the latest ARA (I008b35562994f1205a4f66e53f93b9885a6b8754). Add it
back here.
Change-Id: I2d56e7cde32cd7adabb359a35ecdaa9f0880f7d5
ARA's master branch now has static site generation, so we can move
away from the stable branch and get the new reports.
In the mean time ARA upstream has moved to github, so this updates the
references for the -devel job.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/793530
Change-Id: I008b35562994f1205a4f66e53f93b9885a6b8754
We're trying to phase out the ELK systems. While we have agreed to not
immediately turn anything off we probably don't need to keep running the
system-config-legacy-logstash-filters job as ELK should remain fairly
fixed unless someone rewrites config management for it and modernizes
it. And if that happens they will want new modern testing too.
Depends-On: https://review.opendev.org/c/openstack/project-config/+/792710
Change-Id: I9ac6f12ec3245e3c1be0471d5ed17caec976334f
Gerrit's bazel rules are looking for python which doesn't exist on our
images. Add a python symlink to python3 until
https://gerrit-review.googlesource.com/c/gerrit/+/298903 is in a release,
which seems likely to be 3.5.
Change-Id: I1c15cceac1c9bbf435ed23bed7c1e3fe868f05ff
This converts our existing puppeted mailman configuration into a set of
ansible roles and a new playbook. We don't try to do anything new and
instead do our best to map from puppet to ansible as closely as
possible. This helps reduce churn and will help us find problems more
quickly if they happen.
Followups will further cleanup the puppetry.
Change-Id: If8cdb1164c9000438d1977d8965a92ca8eebe4df
This adds the new inmotion cloud to clouds.yaml files and the cloud
launcher config. This cloud is running on an openstack as a service
platform so we have quite a bit of freedom to make changes here within
the resource limitations if necessary.
Change-Id: I2aed6dffde4a1d6e3044c4bd8df4ca60065ae1ea
Otherwise you get
BadRequest: Expecting to find domain in project - the server could
not comply with the request since it is either malformed or otherwise
incorrect. The client is assumed to be in error.
Change-Id: If8869fe888c9f1e9c0a487405574d59dd3001b65
This matches the proposal in https://review.opendev.org/785972
It's safe to merge now (secret storage on bridge is updated) and get
ahead of the curve. It's harmless to add unused items.
Change-Id: I942ef5f95f9f1afe39b7d9a044276bfb338d6760
The Oregon State University Open Source Lab (OSUOSL;
https://osuosl.org/) has kindly donated some ARM64 resources. Add
initial cloud config.
Change-Id: I43ed7f0cb0b193db52d9908e39c04e351b3887e3
The OpenEdge cloud has been offline for five months, initially
disabled in I4e46c782a63279d9c18ff4ba2944c15b3027114b, so go ahead
and clean up lingering references. If it is restored later, this can
be reverted fairly easily.
Depends-On: https://review.opendev.org/783989
Depends-On: https://review.opendev.org/783990
Change-Id: I544895003344bc8202363993b52f978e1c07d061
With our increased ability to test in the gate, there's not much use
for review-dev any more. Remove references.
Change-Id: I97e9865e0b655cd157acf9ffa7d067b150e6fc72
When we cleaned up the puppet in
I6b6dfd0f8ef89a5362f64cfbc8016ba5b1a346b3 we renamed the group
s/refstack-docker/refstack/ but didn't move the variables and some
other references too.
Change-Id: Ib07d1e9ede628c43b4d5d94b64ec35c101e11be8
This adds a role and related testing to manage our Kerberos KDC
servers, intended to replace the puppet modules currently performing
this task.
This role automates realm creation, initial setup, key material
distribution and replica host configuration. None of this is intended
to run on the production servers which are already setup with an
active database, and the role should be effectively idempotent in
production.
Note that this does not yet switch the production servers into the new
groups; this can be done in a separate step under controlled
conditions and with related upgrades of the host OS to Focal.
Change-Id: I60b40897486b29beafc76025790c501b5055313d
The production server is trying to send itself to
refstack01.openstack.org, causing cross-site scripting issues. In
production, use the CNAME, but use the FQDN for testing.
Fix up job file matchers while here.
Change-Id: I18a5067ee25c59c5eaa17b7c2d9bd5a942a9173d
We have seen some poor performance from gitea which may be related to
manage project updates. Start a dstat service which logs to a csv file
on our system-config-run job hosts in order to collect performance info
from our services in pre merge testing. This will include gitea and
should help us evaluate service upgrades and other changes from a
performance perspective before they hit production.
Change-Id: I7bdaab0a0aeb9e1c00fcfcca3d114ae13a76ccc9
All hosts are now running thier backups via borg to servers in
vexxhost and rax.ord.
For reference, the servers being backed up at this time are:
borg-ask01
borg-ethercalc02
borg-etherpad01
borg-gitea01
borg-lists
borg-review-dev01
borg-review01
borg-storyboard01
borg-translate01
borg-wiki-update-test
borg-zuul01
This removes the old bup backup hosts, the no-longer used ansible
roles for the bup backup server and client roles, and any remaining
bup related configuration.
For simplicity, we will remove any remaining bup cron jobs on the
above servers manually after this merges.
Change-Id: I32554ca857a81ae8a250ce082421a7ede460ea3c
This change splits our existing system-config-run-review job into two
jobs, one for gerrit 3.2 and another for 3.3. The biggest change is that
we use a var called zuul_test_gerrit_version to select which version we
want and that ends up in the fake group file written out by Zuul for the
nested ansible run. The nested ansible run will then populate the
docker-compose file with the appropriate version for us.
Change-Id: I00b52c0f4aa8df3ecface964007fcf5724887e5e
It is buggy (throwing exceptions for undefinied variables which are
actualyl defined via set_fact), and we frequently run into problems
using it in this repo. It was designed to lint roles for Galaxy,
not the way we write ansible. As of the 5.0.0 release it's
generating >4.5K lines of complaints about files in this repository.
Change-Id: If9d8c19b5e663bdd6b6f35ffed88db3cff3d79f8
This adds a dockerfile to build an opendevorg/refstack image as well as
the jobs to build and publish it.
Change-Id: Icade6c713fa9bf6ab508fd4d8d65debada2ddb30
We modify the x/ route to ensure we can serve git repos from x/.
Previously we had been using sed which is likely to be much more fragile
than patch. Patch will detect conflicts and other errors which would be
good for us to find out about early.
Change-Id: Ic324c7777e7851a6150e4415338c4628ac710970
This installs the zuul-summary-results plugin into our gerrit
container. testinfra is updated to take a screenshot of the plugin in
action.
Change-Id: Ie0a165cc6ffc765c03457691901a1dd41ce99d5a
bazel likes to build everything in ~/.cache and then symlink bazel-*
"convience symlinks" in the workspace/build directory. This causes a
problem for building docker images where we run in the context of the
build directory; docker will not follow the symlinks out of build
directory.
Currently the bazelisk-build copies parts of the build to the
top-level; this means the bazelisk-build role is gerrit specific,
rather than generic as the name implies.
We modify the gerrit build step to break build output symlink and move
it into the top level of the build tree, which is the context the
docker build runs in later. Since this is now just a normal
directory, we can copy from it at will there.
This is useful in follow-on builds where we want to start copying more
than just the release.war file from the build tree, e.g. polygerrit
plugin output.
While we're here, remove the javamelody things that were only for 2.X
series gerrit, which we don't build any more.
[1] https://docs.bazel.build/versions/master/output_directories.html
Change-Id: I00abe437925d805bd88824d653eec38fa95e4fcd
Specify bazelisk_targets as a list, and join the targets as
space-separated in the build command. This is used in the follow-on
Ie0a165cc6ffc765c03457691901a1dd41ce99d5a.
While we are here, remove the build-gerrit.sh script that isn't used
any more, along with the step that installs it.
Also, refactor the tasks to use include_role (this is also used in the
follow on).
Change-Id: I4f3908e75cbbb7673135a2717f9e51f099a4860e
The "additional_plugins" variable is so different builds gerrit can
specify additional plugins specific to their version to install into
the base image.
Since we've moved to only building 3.2 and master images, a bunch of
plugins that used to be additional (because they weren't 2.XX era) are
now common. Move them into the common plugin code in the playbook,
and leave the only one different for master, the "checks" plugin, as
separate.
Change-Id: I8966ed7b5436fbe012486dccc1028bc8cb1cf9e4
By setting the auth type to DEVELOPMENT_BECOME_ANY_ACCOUNT and passing
--dev to the init process, gerrit will create an initial admin user
for us. We leverage this user to create a sample project, change,
Zuul user and sample CI result comment.
We also update testinfra to take some screenshots of gerrit and report
them back.
Change-Id: I56cda99790d3c172e10b664e57abeca10efc5566
This runs selenium from a container on a node, and exposes port 4444
so you can issue commands to it. This is used in the follow-on
I56cda99790d3c172e10b664e57abeca10efc5566 to take some screenshots of
gerrit.
Change-Id: Idcbcd9a8f33bd86b5f3e546dd563792212e0751b
We don't need to test two servers in this test; remove review-dev.
Consensus seems to be this was for testing plans that have now been
superseded.
Change-Id: Ia4db5e0748e1c82838000c9b655808c3d8b74461
This provides an HTML-only PolyGerrit plugin consistent with our
Gitea theming, generously provided by Paladox (many thanks!).
Since we have to split some roles in the build playbook, also name
the temporary patching role to make the build console a little
easier to read.
Change-Id: I3baf17d04b2dca34fc23dcab91c00544cedf0ca6
Gerrit 3.2 supports java 11 now and Gerrit 3.3 will be the last to
support java 8. Lets get ahead of things and switch to java 11.
Change-Id: I1b2f6b1bdadad10917ef5c56ce77f7d7cfc8625d
This should only land once we are on Gerrit 3.x and happy with it. But
at this point the mysql reviewdb will not be used anymore and config for
it can be removed. We keep general mysql things like tools and backups
in place as the accountPatchReviewDb continues to live in MySQL.
This also comments out calls to jeepyb's welcome-message,
update-blueprint and update-bug entrypoints from the patchset-created
event hook, since they rely on database connections for the moment.
Calls to update-bug in change-abandoned and change-merged event
hooks are retained as those code paths don't rely on database
interaction nor attempt to load the removed configuration.
Change-Id: I6e24dbb223fd3f76954db3dd74a03887cf2e2a8b
Gerrit seems to handle x/ for plugin extensions in polygerrit.
Unfortunately we've got projects called x/* and that breaks cloning of
these projects. Lets just avoid that for nwo until we can do a rename.
Change-Id: Id01739725c22af9d02ac30b1653743b49a35a332
The hound project has undergone a small re-birth and moved to
https://github.com/hound-search/hound
which has broken our deployment. We've talked about leaving
codesearch up to gitea, but it's not quite there yet. There seems to
be no point working on the puppet now.
This builds a container than runs houndd. It's an opendev specific
container; the config is pulled from project-config directly.
There's some custom scripts that drive things. Some points for
reviewers:
- update-hound-config.sh uses "create-hound-config" (which is in
jeepyb for historical reasons) to generate the config file. It
grabs the latest projects.yaml from project-config and exits with a
return code to indicate if things changed.
- when the container starts, it runs update-hound-config.sh to
populate the initial config. There is a testing environment flag
and small config so it doesn't have to clone the entire opendev for
functional testing.
- it runs under supervisord so we can restart the daemon when
projects are updated. Unlike earlier versions that didn't start
listening till indexing was done, this version now puts up a "Hound
is not ready yet" message when while it is working; so we can drop
all the magic we were doing to probe if hound is listening via
netstat and making Apache redirect to a status page.
- resync-hound.sh is run from an external cron job daily, and does
this update and restart check. Since it only reloads if changes
are made, this should be relatively rare anyway.
- There is a PR to monitor the config file
(https://github.com/hound-search/hound/pull/357) which would mean
the restart is unnecessary. This would be good in the near and we
could remove the cron job.
- playbooks/roles/codesearch is unexciting and deploys the container,
certificates and an apache proxy back to localhost:6080 where hound
is listening.
I've combined removal of the old puppet bits here as the "-codesearch"
namespace was already being used.
Change-Id: I8c773b5ea6b87e8f7dfd8db2556626f7b2500473
In converting this to ansible I forgot to install the reprepro keytab.
The encoded secret has been added for production.
Change-Id: I39d586e375ad96136cc151a7aed6f4cd5365f3c7
This will allow us to test further gerrit upgrades while we sort out how
far into the gerrit releases we will be upgrading to on our next
upgrade.
Change-Id: Ic9d07b76e41ad4262cc0e2e1ff8a5d554f88239e
The Apache 3081 proxy allows us to do layer 7 filtering on incoming
requests. However, it was returning 502 errors because it proxies to
https://localhost and the certificate doesn't match (see
SSLProxyCheckPeerName directive). However, we can't use the full
hostname in the gate because our self-signed certificate doesn't cover
that.
Add a variable and proxy to localhost in the gate, and the full
hostname in production. This avoids us having to turn off
SSLProxyCheckPeerName.
Change-Id: Ie12178a692f81781b848beb231f9035ececa3fd8
Collect the tox logs from the testinfra run on bridge.openstack.org.
The dependent change helps if we have errors installing things into
tox, and this change lets us see the results.
Depends-On: https://review.opendev.org/747325
Change-Id: Id3c39d4287d7dc9705890c73a230b1935d349b9f
In our beaker rspec testing we ssh into localhost pretending it is a
managed VM because that is how all the config management testing tools
want to work... This is has run into problems with new format ssh keys
which zuul provides. If such a key is present we convert it to PEM
othrewise we generate our own.
Also add ensure-virtualenv to the job as we appear to need it to run
these tests properly.
Change-Id: Ibb6080b5a321a6955866ef9b847c4d00da17f427
Change restart mode to always instead of 'no' as testing shows we won't
restart in a loop in CI and we want production to restart automatically.
Also add ssh pubkey contents for completeness and simplicity if we need
to find those in the future.
Change-Id: I81573a1ad1574419194eb3088070dda95fb81fff
This new ansible role deploys gerritbot with docker-compose on
eavesdrop.openstack.org. This way we can run it where the other bots
live.
Testing is rudimentary for now as we don't really want to connect to a
production gerrit and freenode. We check things the best we can.
We will want to coordinate deployment of this change with disabling the
running service on the gerrit server.
Depends-On: https://review.opendev.org/745240
Change-Id: I008992978791ff0a38f92fb4bc529ff643f01dd6
We need to add host (and possibly the ssh host key so its here too) in
this playbook because the add_host from the base-jobs side is only
applicable to the playbook running in base-jobs. When we start our
playbook here that state is lost. Simple fix, just add_host it again.
Change-Id: Iee60d04f0232500be745a7a8ca0eac4a6202063d
We can't run ARA on the executor because that involves running
arbitrary commands, instead generate reports on the executor and put
them where the normal fetch-output will find them later.
Change-Id: I20d88a7f03872d19f6bd014bc687a1bf16e4e80e
This uses a new base job which handles pushing the git repos on to
bridge since that must now happen in a trusted playbook.
Depends-On: https://review.opendev.org/742934
Change-Id: Ie6d0668f83af801c0c0e920b676f2f49e19c59f6
This adds roles to implement backup with borg [1].
Our current tool "bup" has no Python 3 support and is not packaged for
Ubuntu Focal. This means it is effectively end-of-life. borg fits
our model of servers backing themselves up to a central location, is
well documented and seems well supported. It also has the clarkb seal
of approval :)
As mentioned, borg works in the same manner as bup by doing an
efficient back up over ssh to a remote server. The core of these
roles are the same as the bup based ones; in terms of creating a
separate user for each host and deploying keys and ssh config.
This chooses to install borg in a virtualenv on /opt. This was chosen
for a number of reasons; firstly reading the history of borg there
have been incompatible updates (although they provide a tool to update
repository formats); it seems important that we both pin the version
we are using and keep clients and server in sync. Since we have a
hetrogenous distribution collection we don't want to rely on the
packaged tools which may differ. I don't feel like this is a great
application for a container; we actually don't want it that isolated
from the base system because it's goal is to read and copy it offsite
with as little chance of things going wrong as possible.
Borg has a lot of support for encrypting the data at rest in various
ways. However, that introduces the possibility we could lose both the
key and the backup data. Really the only thing stopping this is key
management, and if we want to go down this path we can do it as a
follow-on.
The remote end server is configured via ssh command rules to run in
append-only mode. This means a misbehaving client can't delete its
old backups. In theory we can prune backups on the server side --
something we could not do with bup. The documentation has been
updated but is vague on this part; I think we should get some hosts in
operation, see how the de-duplication is working out and then decide
how we want to mange things long term.
Testing is added; a focal and bionic host both run a full backup of
themselves to the backup server. Pretty cool, the logs are in
/var/log/borg-backup-<host>.log.
No hosts are currently in the borg groups, so this can be applied
without affecting production. I'd suggest the next steps are to bring
up a borg-based backup server and put a few hosts into this. After
running for a while, we can add all hosts, and then deprecate the
current bup-based backup server in vexxhost and replace that with a
borg-based one; giving us dual offsite backups.
[1] https://borgbackup.readthedocs.io/en/stable/
Change-Id: I2a125f2fac11d8e3a3279eb7fa7adb33a3acaa4e
Specifying the family stops a deprecation warning being output.
Add a HTML report and report it as an artifact as well; this is easier
to read.
Change-Id: I2bd6505c19cee2d51e9af27e9344cfe2e1110572
Builds running on the new container-based executors started failing to
connect to remote hosts with
Load key "/root/.ssh/id_rsa": invalid format
It turns out the new executor is writing keys in OpenSSH format,
rather than the older PEM format. And it seems that the OpenSSH
format is more picky about having a trailing space after the
-----END OPENSSH PRIVATE KEY-----
bit of the id_rsa file. By default, the file lookup runs an rstrip on
the incoming file to remove the trailing space. Turn that off so we
generate a valid key.
Change-Id: I49bb255f359bd595e1b88eda890d04cb18205b6e
This uses the Grafana container created with
Iddfafe852166fe95b3e433420e2e2a4a6380fc64 to run the
grafana.opendev.org service.
We retain the old model of an Apache reverse-proxy; it's well tested
and understood, it's much easier than trying to map all the SSL
termination/renewal/etc. into the Grafana container and we don't have
to convince ourselves the container is safe to be directly web-facing.
Otherwise this is a fairly straight forward deployment of the
container. As before, it uses the graph configuration kept in
project-config which is loaded in with grafyaml, which is included in
the container.
Once nice advantage is that it makes it quite easy to develop graphs
locally, using the container which can talk to the public graphite
instance. The documentation has been updated with a reference on how
to do this.
Change-Id: I0cc76d29b6911aecfebc71e5fdfe7cf4fcd071a4
This adds an option to have an Apache based reverse proxy on port 3081
forwarding to 3000. The idea is that we can use some of the Apache
filtering rules to reject certain traffic if/when required.
It is off by default, but tested in the gate.
Change-Id: Ie34772878d9fb239a5f69f2d7b993cc1f2142930
We use ansible's to_nice_yaml output filter when writing ansible
datastructures to yaml. This has a default indent of 4, but we humans
usually write yaml with an indent of 2. Make the generated yaml more
similar to what us humans write and set the indent to 2.
Change-Id: I3dc41b54e1b6480d7085261bc37c419009ef5ba7
In prep-apply we're assuing virtualenv which is not there. Now
that the nodes don't have it by default, this breaks. Add it.
Change-Id: I07a392f5bcbf4d5f04d8812d5c712d2fcc60747b
We can't establish Gerrit or Github connections in the gate, so
Zuul fails to start. Reducing the set of connections in the gate
to just smtp should allow it to start (albiet with tenant loading
errors). But that should let us test basic system setup and
internal connectivity.
Change-Id: I39d648ac5dd6ee3e9bfbc026cd6d7142461c418c
This exports Rackspace DNS domains to bind format for backup and
migration purposes.
This installs a small tool to query and export all the domains we can
see via the Racksapce DNS API.
Because we don't want to publish the backups (it's the equivalent of a
zone xfer) it is run on, and logs output to, bridge.openstack.org from
cron once a day.
Change-Id: I50fd33f5f3d6440a8f20d6fec63507cb883f2d56
Tests that call host.backend.get_hostname() to switch on test
assertions are likely to fail open. Stop using this in zuul tests
and instead add new files for each of the types of zuul hosts
where we want to do additional verification.
Share the iptables related code between all the tests that perform
iptables checks.
Also, some extra merger test and some negative assertions are added.
Move multi-node-hosts-file to after set-hostname. multi-node-hosts-file
is designed to append, and set-hostname is designed to write.
When we write the gate version of the inventory, map the nodepool
private_ipv4 address as the public_v4 address of the inventory host
since that's what is written to /etc/hosts, and is therefore, in the
context of a gate job, the "public" address.
Change-Id: Id2dad08176865169272a8c135d232c2b58a7a2c1
Make inventory/service for service-specific things, including the
groups.yaml group definitions, and inventory/base for hostvars
related to the base system, including the list of hosts.
Move the exisitng host_vars into inventory/service, since most of
them are likely service-specific. Move group_vars/all.yaml into
base/group_vars as almost all of it is related to base things,
with the execption of the gerrit public key.
A followup patch will move host-specific values into equivilent
files in inventory/base.
This should let us override hostvars in gate jobs. It should also
allow us to do better file matchers - and to be able to organize
our playbooks move if we want to.
Depends-On: https://review.opendev.org/731583
Change-Id: Iddf57b5be47c2e9de16b83a1bc83bee25db995cf
The existing test gearman cert+key combos were mismatched and therefore
invalid. This replaces them with newly generated test data, and moves
them into the test private hostvar files where the production private
data are now housed.
This removes the public production data as well; those certs are now
in the private hostvar files.
Change-Id: I6d7e12e2548f4c777854b8738c98f621bd10ad00
The jitsi video bridge (jvb) appears to be the main component we'll need
to scale up to handle more users on meetpad. Start preliminary
ansiblification of scale out jvb hosts.
Note this requires each new jvb to run on a separate host as the jvb
docker images seem to rely on $HOSTNAME to uniquely identify each jvb.
Change-Id: If6d055b6ec163d4a9d912bee9a9912f5a7b58125
This adds a new variable for the iptables role that allows us to
indicate all members of an ansible inventory group should have
iptables rules added.
It also removes the unused zuul-executor-opendev group, and some
unused variables related to the snmp rule.
Also, collect the generated iptables rules for debugging.
Change-Id: I48746a6527848a45a4debf62fd833527cc392398
Depends-On: https://review.opendev.org/728952
This autogenerates the list of ssl domains for the ssl-cert-check tool
directly from the letsencrypt list.
The first step is the install-certcheck role that replaces the
puppet-ssl_cert_check module that does the same. The reason for this
is so that during gate testing we can test this on the test
bridge.openstack.org server, and avoid adding another node as a
requirement for this test.
letsencrypt-request-certs is updated to set a fact
letsencrypt_certcheck_domains for each host that is generating a
certificate. As described in the comments, this defaults to the first
host specified for the certificate and the listening port can be
indicated (if set, this new port value is stripped when generating
certs as is not necessary for certificate generation).
The new letsencrypt-config-certcheck role runs and iterates all
letsencrypt hosts to build the final list of domains that should be
checked. This is then extended with the
letsencrypt_certcheck_additional_domains value that covers any hosts
using certificates not provisioned by letsencrypt using this
mechanism.
These additional domains are pre-populated from the openstack.org
domains in the extant check file, minus those openstack.org domain
certificates we are generating via letsencrypt (see
letsencrypt-create-certs/handlers/main.yaml). Additionally, we
update some of the certificate variables in host_vars that are
listening on port !443.
As mentioned, bridge.openstack.org is placed in the new certcheck
group for gate testing, so the tool and config file will be deployed
to it. For production, cacti is added to the group, which is where
the tool currently runs. The extant puppet installation is disabled,
pending removal in a follow-on change.
Change-Id: Idbe084f13f3684021e8efd9ac69b63fe31484606
Create a zuul_data fixture for testinfra.
The fixture directly loads the inventory from the inventory YAML file
written out. This lets you get easy access to the IP addresses of the
hosts.
We pass in the "zuul" variable by writing it out to a YAML file on
disk, and then passing an environment variable to this. This is
useful for things like determining which job is running. Additional
arbitrary data could be added to this if required.
Change-Id: I8adb7601f7eec6d48509f8f1a42840beca70120c