citycloud is rolling out per-region keystone. There is a change with an
error in it in the latest openstacksdk, so put the right auth_url into
the files directly while we update it and release it again.
Additionally, Sto2 and Lon1 each have different domain ids. The domain
names are the same though - and that's good, because logical names are
nicer in config files anyway.
Restore the config for those clouds.
Change-Id: If55d27defc164bd38af2ffd1e7739120389422af
This region does not show up in catalog listings anymore and is causing
inventory generation for ansible to fail. This change removes Sto2 from
the management side of things so that we can get ansible and puppet
running again.
This does not cleanup nodepool which we can do in a followup once
ansible and puppet are running again.
Change-Id: Ifeea238592b897aa4cea47b723513d7f38d6374b
The mailman verp router handles remote addresses like dnslookup.
It needs to run before dnslookup in order to be effective, so run
it first. It's only for outgoing messages, not incoming, so won't
affect the blackhole aliases we have for incoming fake bounce
messages.
Note that the verp router hasn't been used in about a year due to
this oversight, so we should merge this change with caution.
Change-Id: I7d2a0f05f82485a54c1e7048f09b4edf6e0f0612
This region does not show up in catalog listings anymore and is causing
inventory generation for ansible to fail. This change removes Lon1 from
the management side of things so that we can get ansible and puppet
running again.
This does not cleanup nodepool which we can do in a followup once
ansible and puppet are running again.
Change-Id: Icf3b19381ebba3498dfc204a48dc1ea52ae9d951
We don't use snappy to install software on our servers, but it started
being installed by default. We don't need it, so remove it.
Change-Id: I043d4335916276476350d9ac605fed1e67362e15
The options are deprecated and don't do anything - but they do put
warnings into the service logs.
Change-Id: If53bc8aecc7df75c99ae71e5adb8189790405795
This is going to require some work to port several puppet things
to Ansible. To test the execution mechanism, let's just stub it
out for now.
Change-Id: Ief09ca30b19afffd106c98018cb23a9715fc9a69
After adding iptables configuration to allow bridge.o.o to send stats
to graphite.o.o in I299c0ab5dc3dea4841e560d8fb95b8f3e7df89f2, I
encountered the weird failure that ipv6 rules seemed to be applied on
graphite.o.o, but not the ipv4 ones.
Eventually I realised that the dns_a filter as written is using
socket.getaddrinfo() on bridge.o.o and querying for itself. It thus
gets matches the loopback entry in /etc/hosts and passes along a rule
for 127.0.1.1 or similar. The ipv6 hostname is not in /etc/hosts so
this works there.
What we really want the dns_<a|aaaa> filters to do is lookup the
address in DNS, rather than the local resolver. Without wanting to
get involved in new libraries, etc. the simplest option seems to be to
use the well-known 'host' tool. We can easily parse the output of
this to ensure we're getting the actual DNS addresses for hostnames.
An ipv6 match is added to the existing test. This is effectively
tested by the existing usage of the iptables role which sets up rules
for cacti.o.o access.
Change-Id: Ia7988626e9b1fba998fee796d4016fc66332ec03
We don't want to run ansible if we don't get a complete inventory from
our clouds. The reason for this is we cannot be sure that the ordering
of git servers, gerrit, and zuul or our serialized updates of afs
servers will work correctly if we have an incomplete inventory.
Instead we just want ansible to fail and try again in the future (we can
then debug why our clouds are not working).
From the ansible docs for any_unparsed_is_failed:
If 'true', it is a fatal error when any given inventory source
cannot be successfully parsed by any available inventory plugin;
otherwise, this situation only attracts a warning.
Additionally we tell openstack inventory plugin to report failures
rather than empty inventory so that the unparsed failures happen.
Change-Id: I9025776af4316fbdd2c910566883eb3a2530852a
Keystone auth and openstacksdk/openstackclient do not do the correct
thing without this setting set. They try v2 even though the discovery
doc at the root url does not list that version as valid. Force version 3
so that things will work again.
Change-Id: I7e1b0189c842bbf9640e2cd50873c9f7992dc8d3
This new job is a parent job allowing us to CD from Zuul via
bridge.openstack.org. Using Zuul project ssh keys we add_host bridge.o.o
to our running inventory on the executor then run ansible on bridge.o.o
to run an ansible playbook in
bridge.openstack.org:/opt/system-config/playbooks.
Change-Id: I5cd2dcc53ac480459a22d9e19ef38af78a9e90f7
Allow post-review jobs running under system-config and project-config
to ssh into bridge in order to run Ansible.
Change-Id: I841f87425349722ee69e2f4265b99b5ee0b5a2c8
Add some coarse-grained statsd tracking for the global ansible runs.
Adds a timer for each step, along with an overall timer.
This adds a single argument so that we only try to run stats when
running from the cron job (so if we're debugging by hand or something,
this doesn't trigger). Graphite also needs to accept stats from
bridge.o.o. The plan is to present this via a simple grafana
dashboard.
Change-Id: I299c0ab5dc3dea4841e560d8fb95b8f3e7df89f2
Let's abandon the idea that we'll treat the backup server specially.
As long as we allow *any* automated remote access via ansible, we
have opened the door to potential compromise of the backup systems
if bridge is compromised. Rather than pretending that this separation
gives us any benefit, remove it.
Change-Id: I751060dc05918c440374e80ffb483d948f048f36
In run_all, we start a bunch of plays in sequence, but it's difficult
to tell what they're doing until you see the tasks. Name the plays
themselves to produce a better narrative structure.
Change-Id: I0597eab2c06c6963601dec689714c38101a4d470
We use the git-servers group in remote_puppet_git to positively select
the git nodes in that playbook but used !git0* glob to exclude these
nodes in remote_puppet_else. Use !git-servers in remote_puppet_else so
that the two groups used line up with each other.
Change-Id: I023f8262a86117b2dec1ff5b762082e09e601e74
We were matching afs* as a glob to serialize puppet runs on afs servers.
This was fine until we added afs-client and afsadmin groups to our
inventory which matched afs*. These groups included many nodes including
our mirror nodes and zuul executors all of which were running puppet
serially which is slow.
Fix this by explicitly using the afs and afsdb groups instead of a glob.
Change-Id: If21bbc48b19806343617e18fb03416c642e00ed2
This account is an admin account and sees every project's default
security group. This leads to:
FAILED! => {"changed": false, "msg": "Multiple matches found for default"}
When attempting to set the properties of the default security group for
this account. There doesn't appear to be a good way to filter the other
default security groups out currently so avoid setting them for now.
Change-Id: I9a8cc7d59c0295caa71bf107b9b78745a4617981
Some of our summaries need to display more than 20 tasks to show
complete information. Up to 50, which should be enough for anyone.
Change-Id: I3ae3bb714ea7f5fb094f85c33c19ea3c8a81f6c3
This formerly ran on puppetmaster.openstack.org but needs to be
transitioned to bridge.openstack.org so that we properly configure new
clouds.
Depends-On: https://review.openstack.org/#/c/598404
Change-Id: I2d1067ef5176ecabb52815752407fa70b64a001b
Deployment of the nodepool cloud.yaml file is currently failing with
FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'rackspace_username' is undefined"}
This is because the variables in the group_vars on bridge.o.o are all
prefixed with "nodepool_". Switch to this.
Change-Id: I524cc628138d85e3a31c216d04e4f49bcfaaa4a8
It's definitely not a priori evident what all these configs that seem
to duplicate each other do; add some inline documentation to each to
hopefully explain what's going on a little more clearly for people
unfamiliar.
Change-Id: I0cc2e8773823b7d9b47d3dfd4c80827cd9929075
Add a logrotate role that allows basic configuration of a logrotate
configuration for a specific log-file.
Use this role in the ansible-cron and install-ansible roles to ensure
the log output they are generating is rotated.
This role is not intended to manage the logrotate package (mostly to
avoid the overhead of frequently checking package state when this is
expected to be called for multiple configuration files on a server).
We add it as a base package to our servers.
Tests are added for testinfra.
Change-Id: I90f59c3e42c1135d6be120de38e942ece608b761
Package is the generic way of using package managers in Ansible. This
will be a noop.
Don't use loops for package managers, since we are able to pass lists of
packages. This will reduce the number of tasks ansible will do.
Change-Id: If7988ba81a6bf851d1b5ec9db6888ba9509ed788
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This manages the clouds.yaml files in ansible so that we can get them
updated automatically on bridge.openstack.org (which does not puppet).
Co-Authored-By: James E. Blair <jeblair@redhat.com>
Depends-On: https://review.openstack.org/598378
Change-Id: I2071f2593f57024bc985e18eaf1ffbf6f3d38140
In order to talk to limestone clouds we need to configure a custom CA.
Do this in ansible instead of puppet.
A followup should add writing out clouds.yaml files.
Change-Id: I355df1efb31feb31e039040da4ca6088ea632b7e
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Add a job which runs testinfra for the eavesdrop server. When we
have a per-hostgroup playbook, we will add it to this job too.
The puppet group is removed from the run-base job because the
groups.yaml file is now used to construct groups (as it does
in production) and will construct the group correctly.
The testinfra iptables module may throw an error if it's run
multiple times simultaneously on the same host. To avoid this,
stop using parallel execution.
Change-Id: I1a7bab5c14b0da22393ab568000d0921c28675aa
This adds a group var which should normally be the empty list but
can be overridden by the test framework to inject additional iptables
rules. It's used to add the zuul console streaming port. To
accomplish this, the base+extras pattern is adopted for
iptables public tcp/udp ports. This means all host/group vars should
use the "extra" form of the variable rather than the actual variable
defined by the role.
Change-Id: I33fe2b7de4a4ba79c25c0fb41a00e3437cee5463
And collect it on post, it is helpful to see the results.
Change-Id: I0dbecf57bf9182168eb6f99cdf88329fcdeb1bdc
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We can directly pass a list of packages to the package task in ansible,
this will help save us some times on run times.
Change-Id: I9b26f4f4f9731dc7d32186584620f1cec04b7a81
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
The original version of this was wishful thinking: "is file" only
works locally, but this needs to run on the remote node.
Change-Id: Ib683809fdf580f41d213308331925c4765bb09d9
Ubuntu xenial does not come with python2 by default. In order to
accomodate a transition from trusty nodes to xenial nodes that are
managed by ansible we want to use python2 on trusty and xenial. Then
when a group of nodes are fully xenialed we can force ansible to use
python3 instead.
Eventually we will have no trusty nodes and can default to using
python3 instead and just have to have a small number of exceptions for
centos.
Change-Id: If1d97e25069d6ed5012c147024aad4d921febfc8
This variable name was translated incorrectly during the transition
from puppet to ansible for iptables.
Change-Id: I865ba7122b215a7f653aa5ed5770a05edbd655a0
This role manages puppet on the host and should be applied to all
puppet hosts.
The dependent change is required to get the "puppet" group into the
generated inventory for the system-config-run-base test.
Change-Id: I0e18c53836baca743d32abf1bb4b7a3f63c025bb
Depends-On: https://review.openstack.org/596994
Contains a handler to restart crond when tz is changed. Cron service
name differs across distros.
Removes the puppet-timezone usage.
Change-Id: I4e45d0e0ed37214ac491f373ff2d37750e720718
Co-Authored-By: James E. Blair <corvus@inaugust.com>
Change-Id: Id8b347483affd710759f9b225bfadb3ce851333c
Depends-On: https://review.openstack.org/596503
According to the Ubuntu 12.04 release notes, up until Ubuntu 11.10
admin access was granted via the "admin" unix group, but was changed
to the "sudo" group to be more consistent with Debian et al.
Remove the now unnecessary group
Modify the install-ansible role to set some directory ownership to
root:root; there didn't seem to be any reason to use admin here.
This means the "users" role is no longer required in the bridge.yaml,
as it is run from the base playbook anyway.
Change-Id: I6a7fdd460fb472f0d3468eb080aebbb010931e11
This adds a job which creates a bridge-like node and bootstraps it,
and then runs the base playbook against all of the node types we
use in our control plane. It uses testinfra to validate the results.
Change-Id: Ibdbaf511bbdaee46e1335f2c83b95ba1553a1d94
Depends-On: https://review.openstack.org/595905
Normally the bridge playbook runs as root on bridge. In order to
allow zuul to bootstrap a bridge-like node in its tests while running
as the zuul user, add become: true to the playbook. This will have
no effect on bridge itself, but will cause the playbook to behave
in the same manner in tests.
Also add the "users" role to bridge. This is in the base playbook
and is therefore eventually run on bridge. However it needs to also
be in the bridge playbook in order to bootstrap bridge correctly, as
the install-ansible role references groups which are created in the
users role.
Change-Id: If311914e9e632d8be855fff0a62528dd191bf1d0
Move the exim role to be a "generic" role in the top-level roles/
directory, making it available for use as a Zuul role.
Update the linters jobs to look for roles in the top level
Update the Role documentation to explain what the split in roles is
about.
Change-Id: I6b49d2a4b120141b3c99f5f1e28c410da12d9dc3
These role docs aren't exactly War and Peace, but I think longer term
as we fiddle about making things generic or not and moving them
around, we'll be better off having kept ourselves to writing
*something*.
Add terse README.rst files for all existing roles, and add simple
linter check to ensure new roles get them too.
Change-Id: Ibc836310fb8a45e12c2e31f112d92509ac350413
This filter is unsued in the role, remove it.
This allows it to be run under zuul and can be moved into the
top-level role/ directory later.
Change-Id: Ice97f0c3c9f52b6bf9f48c7b16d577e555924034
Since we're building out roles in system-config now, generate
documentation. We look in roles/* and playbook/roles/* (follow-on
changes will split things up between the two).
Correct the reference names in the exim documentation to avoid
warnings and failure.
This also revealed a single unicode character in the exim readme
(which caused prior versions of zuul-sphinx to barf). For fun, see if
you can find it!
Depends-On: https://review.openstack.org/#/c/579474/
Change-Id: I243a96bbd6d09560f8aa80b6345b90039422547a
Puppet cron is no longer being run on puppetmaster (yay!) so start
running it in cron from bridge.
Change-Id: Idc579a2660a5450092544c21a2e9e6cb9688e5f9
There is an issue with our nb0* hosts where they have zypper installed
for building suse images but that tricks ansible in to thinking it
should use zypper for package management.
This has been submitted upstream as
https://github.com/ansible/ansible/pull/44413
Change-Id: I96f60501e43bfe9c6acb4ce80f8450b245943ca8
In zuul's ansible config we add retries=3 to deal with transient issues.
Do the same thing for our production runs.
Change-Id: Ide53bae34e5e622de1fd4741706752e8728da20e
We don't run a cloud anymore and don't use these. With the cfg
management update effort, it's unlikely we'd use them in the form they
are in even if we did get more hardware and decide to run a cloud again.
Remove them for clarity.
Change-Id: I88f58fc7f2768ad60c5387eb775a340cac2c822a
We copied this over from puppetmaster, but let's manage it in ansible.
The key has been renamed in host_vars on bridge.openstack.org already.
Change-Id: Ia102dbe2ae2836880092b8997cb99135f5197b00
The CentOS tasks run inside of a loop in tasks/main.yaml. That means
that item has been defined in the loop there. While it's currently
working, go ahead and add loop_control.loop_var to remove the clash.
Change-Id: I0e8288c35645945aa9b43fb02c29576c1ad31d7e
puppet wants the code to be in /opt/system-config/production because of
the environment config. bridge just wants /opt/system-config because
it's an ansible server.
Rather than relying on inferring things, just be explicit about what we
want where.
Depends-On: https://review.openstack.org/593134
Change-Id: I9e749d2c50f7d8a7b0681fe48f38f4741c8a8d01
This is not a variable describing the system-under-management
bridge.openstack.org - it's a variable that is always true for all
systems in the puppet group.
As a result, update the puppet apply test to figure out which directory
we should be copying modules _from_ - since the puppet4 tests will be
unhappy otherwise.
Change-Id: Iddee83944bd85f69acf4fcfde83dc70304386baf
The first entry is where ansible galaxy will install roles. We want that
to be /etc/ansible/roles, not overlaid on the system-config repo.
Pass --roles-path to ansible-galaxy to make sure they go to the right
place.
Change-Id: I109dc004acad32a515c6a1caca50ab38edc62aed
file: state=touch returns changed every time. Instead, put the log files
into a /var/log/ansible directory.
Change-Id: I086d803f0e532b9da41cb01d4e7d2ed66245dfc1
restricted is supported software that is non-free.
multiverse is unsupported software that is non-free.
Use of software from either would be unacceptable on any Infra server,
so remove them from the sources.list files.
While we're in there, clean things up a little bit and add an arm file
for bionic.
Change-Id: I55a3b3d411e8a3496a4e6910baaf72f3c192e9d4
This was a setting added for infra cloud that had to do with bootstrap
order. It seems to have been cargo-culted elsewhere. Remove it. Let's be
specific with our sources.list files.
Change-Id: Iefbd59ad20e9fdc450d9a0c4e58b9cf4a89ff5a3
Rather than copying these out of system-config inside of
install-ansible, just point the ansible.cfg to them in the system-config
location. This way as changes come in that have group updates we don't
have to first apply them to the system.
Change-Id: I1cefd7848b7f3f1adc8fbfa080eb9831124a297b
The puppet playbooks were some of the first we wrote, so they're
slightly wonky.
Remove '---' lines that are completely unnecessary.
Fix indentation.
Move some variables that are the same everywhere into
ansible variables.
Put puppet related variables into the puppet group_vars.
Stop running puppet on localhost in the git playbook.
Change-Id: I2d2a4acccd3523f1931ebec5977771d5a310a0c7
The production directory is a relic from the puppet environment concept,
which we do not use. Remove it.
The puppet apply tests run puppet locally, where the production
environment is still needed, so don't update the paths in the
tools/prep-apply.sh.
Depends-On: https://review.openstack.org/592946
Change-Id: I82572cc616e3c994eab38b0de8c3c72cb5ec5413
We do this for zuul jobs already, so let's do it for our production
runs.
Shift the inventory cache location down a directory so that launch-node
can invalidate the inventory cache.
Change-Id: I52b1c48d091c07e4205c1a7233448925ca26d8d3
Now that we've got base server stuff rewritten in ansible, remove the
old puppet versions.
Depends-On: https://review.openstack.org/588326
Change-Id: I5c82fe6fd25b9ddaa77747db377ffa7e8bf23c7b
The exim config chunk has a {{ in it, which makes the ansible jinja
very cranky. Add in a raw block so it doesn't try to understand the
exim.
Change-Id: If49d976e503b6ebe236a2d2c6077cce96783e102
So that we can have complete control of the router order, always
template the full set of routers, including the "default" ones.
So that it's easy to use the defaults but put them in a different
order, define each router in its own variable which can be used
in host or group vars to "copy" that router in.
Apply this change to lists, firehose, and storyboard, all of which
have custom exim routers. Note that firehose intentionally has
its localuser router last.
Change-Id: I737942b8c15f7020b54e350db885e968a93f806a
We want to configure firehose logically as the firehose service, but the
host that is in the group is called firehose01.openstack.org. Make a
group and put the config variables for firehose into it.
Change-Id: I17c8e8a72f41c5e2730af81f70cef81dd3ed7bca
regex_match seems to either not work or not exist or something. match,
otoh, works. Additionally, we get this:
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead
of using `result|match` use `result is match`. This feature will
be removed in version 2.9.
when using the | syntax, so obey the warning and switch to is.
Change-Id: Ie201241a11c08b9fed58c0e1790e8187ee4cf474
Now that we're running with ansible, we can set the futureparser varible
in the group_vars for the futureparser group and stop passing it as a
parameter explicitly.
Change-Id: I41fe283e96bb48a17f2acfe2ffd939223b5345e7
Bridge can run puppet on the remote hosts. Stop running on puppetmaster
so that we can run from bridge. Put it in the disabled group so that we
don't try to run puppet on it from bridge.
Change-Id: Ibcfa7e902c07c55e3a84f8232a11792c5f7d80e9
In order to get puppet out of the business of mucking with exim and
fighting ansible, finish moving the config to ansible.
This introduces a storyboard group that we can use to apply the exim
config across both servers. It also splits the base playbook so that we
can avoid running exim on the backup servers. And we set
purge_apt_sources the same as was set in puppet. We should probably
remove it though, since none of us have any clue why it's here.
Change-Id: I43ee891a9c1beead7f97808208829b01a0a7ced6
The mailing list servers have a more complex exim config. Put the
routers and transports into ansible variables.
While we're doing it, role variables with an exim_ prefix - since 'routers'
as a global variable might be a little broad.
iteritems isn't a thing in python3, only items.
We need to escape the exim config with ${if or{{ - because of the {{
which looks like jinja. Wrap it in a {% raw %} block.
Getting the yaml indentation right for things here is non-trivial. Make
them strings instead.
Add a README.rst file - and use the zuul:rolevar construct in it,
because it's nice.
Change-Id: Ieccfce99a1d278440c5baa207479a1887898298e
Now that we're running more than just "puppet apply", reconnecting
starts to add up. Turn on pipelining.
Change-Id: If629485a0e602f1a906fef0cabd73154243d7e3d
Instead of just having bridge be disabled, make a puppet group that it's
not a part of and switch the remote_puppet_else playbook to use that.
Change-Id: Ifb96ce483fc5675d095723bda70242a425bdc619
This is a setup for the next patch, to allow us to roll the change out.
Update the roles path to point to the system-config roles dir.
Change-Id: I6bcf36beba8e65c9dd8ddf9f4a99d0308f42c565
We want email to work.
Add a default value so that integration tests work - and update the
template so that if the value in the alias mapping is empty we don't
write out a half-formed alias.
Enable the epel repo on CentOS nodes in base-repos. This is done in
install_puppet.sh, but install_puppet.sh doesn't get run on ansible-only
nodes.
Change-Id: I68ad9f66c3b8672d9642c7764e50adac9cafdaf9
ansible-role-puppet attempts to infer where it should copy hieradata
from based on puppet3 or puppet4. On bridge there is no puppet and thus
there is no puppet version. Set mgmt_hieradata to tell
ansible-role-puppet from where it should copy hiera secrets.
Change-Id: I0c518b8a5a8ee2155e2125d6bc7f4e0a3bf4faeb
We don't really need to keep these in here. We can put a user in the
remove group without them being in this list.
Change-Id: I321d489d4202272e36d25c5b8913ca7cdda25fdd
Split base playbook into two plays
The update apt-cache handler from base-repos needs to fire before we run
base-server. Split into two plays so that the handler will fire.
Fix use of first_found
For include_vars, using the lookup version of first_found requires being
explicit about the path to search in as well. We also need to use query
together with loop to get skip to work right.
Extract the list of file locations we look for for distro and platform
specific variables into a variable so that we can reuse it instead of
copy-pasta.
The vim package is vim-nox on ubuntu and vim-minimal on debian.
ntpdate only needs to be enabled on boot, it does not need to be
immediately started. At least, that's what the old puppet was doing and
trying to start it immediately breaks centos integration tests.
emacs-nox is emacs23-nox on trusty.
Change-Id: If3db276a5f6a8f76d7ce8635da8d2cbc316af341
Depends-On: https://review.openstack.org/588326
The with_ directives are discouraged now in place of use of loop: and/or
lookup filters. In the case of with_first_found, it confuses people
because with_ directives are usually a loop, but in this case it's
the task is always executed once. Using the first_found filter makes it
clearer that this is occuring.
While we're in there, remove uses of 'static: no'. Since 2.0 includes
are dynamic by default, so these are not necessary.
Change-Id: Ie429d7614b2f3322a646f46a8117d4b6ae29f737
The list of allowed hosts is comma separated, not colon separated.
Set exclusive: yes to ensure this is the *only* authorized key.
The zuul-executor group is the group for ze hosts. It's not a second
zuul-scheduler group.
Change-Id: I214482ce8931e697ada497048fcf12fa492b98b7
The purpose of the playbook is to update the system-config checkout, as
well as installing puppet modules and ansible roles.
Rename it, so that it's clearer what it does. Also, clean it up a bit.
We've gotten better at playbooks since we originally wrote this.
Change-Id: I793914ca3fc7f89cf019cf4cdf52acb7e0c93e60
There is a shared caching infrastructure in ansible now for inventory
and fact plugins. It needs to be configured so that our inventory access
isn't slow as dirt.
Unfortunately the copy of openstack.py in 2.6 is busted WRT to caching
because the internal API changed ... and we didn't have any test jobs
set up for it. This also includes a fixed copy of the plugin and
installs it into the a plugin dir.
Change-Id: Ie92e5d7eac4b7e4060a4e07cb29c5a6f2a16ae18
We put in IP restrictions on logging in as root on our servers. Add
bridge.openstack.org's IPs so that we can ansible from it.
Change-Id: Id1cd81c41806cd028d834fb56e1686687d3fb65d
We want to launch a new bastion host to run ansible on. Because we're
working on the transition to ansible, it seems like being able to do
that without needing puppet would be nice. This gets user management,
base repo setup and whatnot installed. It doesn't remove them from the
existing puppet, nor does it change the way we're calling anything that
currently exists.
Add bridge.openstack.org to the disabled group so that we don't try to
run puppet on it.
Change-Id: I3165423753009c639d9d2e2ed7d9adbe70360932
Remove some old ones which were in the wrong place and out of date.
Change-Id: I4303e66edc7d3dc00c455a0990b0b3be0f5f91a6
Depends-On: https://review.openstack.org/586699
We need to expand-contract our keypairs. This is the first of three
patches. The next will use this new keypair from nodepool. Then we can
remove the old one.
The new keypair object updates the ssh key for Shrews and removes
inactive old rooters.
Change-Id: I610e51b58a8b69c8d70c8be260e3a91e86247389
Packet Host and Platform 9 have generously agreed to donate some
compute resources to our testing efforts. Add Nodepool and
Puppetmaster credentials for them.
Change-Id: I705c4204abca060c35a1a417791a67229b78cd02
If a host is a member of the 'futureparser' group, pass the
'futureparser' option to the puppet role, which will turn on parser =
future in puppet.conf when manage_config is true and when the node isn't
already using puppet 4. Nodes can be added one at a time by adding them
to modules/openstack_project/files/puppetmaster/groups.txt.
Depends-On: https://review.openstack.org/572856
Change-Id: I54e19ef6164658da8e0e5bff72a1964b88b81242
Add a playbook to rerun install_puppet.sh with PUPPET_VERSION=4. Also
make the install_modules.sh script smarter about figuring out the puppet
version so that the update_puppet.yaml playbook, which updates the
puppet config and puppet modules but not the puppet package, does not
need to be changed.
When we're ready to start upgrading nodes, we'll add them to the puppet4
group in `modules/openstack_project/files/puppetmaster/groups.txt`.
Change-Id: Ic41d277b2d70e7c25669e0c07e668fb9479b8abf
Because we changed out the hostname of review.o.o for review01.o.o our
current playbooks will be broken. To fix this moving forward, we can
just switch to the group 'review' which includes the review01.o.o
host.
Change-Id: I149eacbc759f95087f2b0a0e44fcf0b49cae7ad6
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
When running the playbook, it's not immediately clear which task is
running without names. Add names. Also, update the whitespace to be more
in-line with how we write playbooks for zuul.
Change-Id: Ia189b8da6ded882aeb1fcff4932a1f9586027f80
We longer have any jobs or need to manage VMs in
tripleo-test-cloud-rh(1|2). This hardware still eventually be removed
so lets also remove it from our configuration.
Change-Id: I588ae945df15beceaf7a60bf6a65b1615b2074f0
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We have puppet configured to write reports when it runs. We used to
collect these and inject them into puppetdb. Since we don't do this
anymore, they're just a giant pile of files we never see.
Enable managing the puppet.conf file from ansible and then also turn off
the reports.
Change-Id: I55bef052bddc9b9ff5de76a4f0b2ec07f93f158c
Following on from I166d9f669ea88663d4ffe70e25a6e908d11cf35f, add to
the cloud launcher. For now just add keys and security (no special
network setup).
Add a default image to the control plane account, as the cloud
currently doesn't have a xenial-based image. It needs a few special
properties to boot.
Change-Id: I846632219cbeb1f56eb0648861db0bfea3de7c3b
Now that zuulv3.openstack.org has been replaced by the larger
zuul01.openstack.org server, the former can be cleaned out of our
configuration in preparation for server deletion.
Change-Id: Icc1d545906e5615e2a205b98f364a084e1d22895
Since Ansible host inventory globs match against both host names and
host groups, use the zuul-scheduler group when referring to
zuul01.openstack.org and similarly-named hosts so as to avoid
inadvertently matching all members of the "zuul" host group with
zuul* (which includes the executors and mergers). Continue to match
zuulv3.openstack.org separately for now as it's not in the
zuul-scheduler group (and soon to be deleted anyway).
Change-Id: I3127d121ea344e1eb37c700d37f873e34edbb86e
To avoid the need for regular expression matching, switch to a
simple glob of zuul* covering zuulv3 and zuul01 servers. Now that
zuul-dev and zuulv3-dev are gone, this glob will only match the two
remaining hosts mentioned.
Change-Id: I2749ffa6c0e4d2ea6626d1ebde1d7b3ab49378bb
In preparation for replacing the zuulv3.openstack.org host with a
larger instance, set up the necessary support in
Puppet/Hiera/Ansible. While we're here, remove or replace old
references to the since-deleted zuul.openstack.org instance, and
where possible update documentation and configuration to refer to
the new zuul.openstack.org CNAME instead of the zuulv3.openstack.org
FQDN so as to smooth the future transition.
Change-Id: Ie51e133afb238dcfdbeff09747cbd2e53093ef84
We don't need a clean workspaces playbook, nor do we need to do anything
with that during renames. We don't need to reference machines that don't
exist in ansible groups. The launcher ssh config is not used by
anything.
Change-Id: Id3e9cddb06b6e47b6f07d9a39086f3b054b46bde
With the migration to zuulv3, there is no more zuul-launcher. This has
become zuul-executor, which has been moved into production.
Servers have already been deleted, lets also remove it from puppet.
Change-Id: Id2b53decdc63712460049f5fa9ed751e049d17ff
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
The set_hostname playbook, used by the launch-node script, needs
facts to determine which package manager it should use to uninstall
cloud-init. Remove the line which disabled fact gathering so that we
can build servers again.
Change-Id: Ic971d456f6d04273c9b981518614130e9b1c5898
This removes remaining references to internap (renamed to inap).
It also updates some items (cacti/nodepool logging) that were missed
in the rename.
Change-Id: Ibafd416e9e55aa458a50eb71922065a35e3d99f4
Bump ansible-playbook runs to 10% of our compute nodes, this is ~12
nodes at a time. We also max failures out to 100% because we actually
want to run puppet across all nodes, regardless of what fails.
Change-Id: I74b294820d8cd342fd7e5466ee63f198177412b4
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We are having bandwidth issues in infracloud, lets experiment with
serial 1. We can adjust upwards if needed.
Change-Id: I89f0a1b197354e2d25d4f17ba29dd3da7d6586d4
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
In order to provide increased proxy cache capacity, increase the
mirror flavor's disk size in Infra-cloud to 250GiB. Other providers
will get Cinder volumes added as needed.
Change-Id: I56130167e94237b93b3bdbfd1334eb97c76836fa
This should give us connectivity to the outside world with NAT'd
internal IP addressing.
Note that we can't add the router to the template because the external
network name will be different across clouds and we have to pass in the
subnet lists which may vary as well.
Change-Id: Iea225c71d0d8e644cbaf709554d02d130ad21c18
Currently puppet fails to run on our baremetal servers for infracloud.
While this is an issue, it should not block puppet from running on our
controller or compute nodes.
Change-Id: I190af6cfc63006cb03686cd501998e4e06d350b1
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We need to ensure ovh is properly setup with our SSH keypairs for
nodepool.
Change-Id: I2a02dfb5da2ac0af087d502ae8143047e3d1b12c
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Copy our current infra-root list from user.pp into cloud_layouts.yml.
Change-Id: Ic339f6879782a9f9d7d92a445160c5b0949a698b
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Because rackspace doesn't support security groups, we need to create
openstackci-keypairs.
Change-Id: I549c5e99554eb876b872a08989dc0345a799ff00
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Since we are moving forward with removing our baked in SSH keys for
our images, we now need to move our public keys into our clouds. This
will allow nodepool to inject them into metadata for glean.
Change-Id: I0ff9db47a0845ed9d038792383624af4bd34d525
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We are in the process of shutting down puppetdb.o.o, so stop pushing
reports to it.
Change-Id: Ib27b21c3fb2cd149e57432fd511129a5c8ecc3e9
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This fixes the issues we have with our rename_repos.yaml file. We are
also skipping additional failures for now, which will be cleaned up in
a follow up patch.
Change-Id: I726535e195a292e3f2d457f0ed039d01bb96c66b
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Currently, if review.o.o takes more then 30mins to run puppet, it will
be aborted. Up this to 60m.
Change-Id: I98e384544d5104572ad252b5dab88e06762b87a9
Depends-On: Id42ba80a5118a9f93e45619ac6ecc5baa774549a
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
When I919ba42b0d22126719daa7ad308f75ce021720b7 merged, it introduced
a few regressions into our process:
* Github renaming/transferring was dropped
* Switched to a very slow (for our environment) Zuul stopping
method
* it advocated for composing a rename parameters file very late
in the process
This change fixes the above issues. It also updates the
documentation to note that Puppet should be stopped well in advance
of the maintenance window, and updates the playbook to no longer run
an offline Gerrit reindex (since online reindexing is now
supported).
Change-Id: Ie249214c0d1b1df6c66d4910002e35d8c17c3b69
In the infracloud, the Member role is not created by default.
We created that with a previous change by adding it to the launcher.
Now we associate that role to the openstackci/openstackzuul user/projects,
so those users are members of their corresponding projects.
Change-Id: I9147b253c7f747f435c773932dc4a8aad1189799
We need to create these roles, so we can associate users with projects.
Change-Id: I29af32c9b0f99c584b6ed76b346b1b117d05b277
Depends-On: I2df8503bb713827f0f04691c2f259dc9541c9c83
The servers are still currently created by launch-node, I'll revert
this commit when I put the pre/post create/delete actions per resource
on the launcher role.
Change-Id: I0a6401c9d783b9c3876ebb1f9c8b144f75d7abb2
It was discussed with other members of the Infra team that this
file would be better place on the playbooks folder, since the
run_launcher is located there.
Change-Id: I752ee592d3ffd8be4fd4ad29dbf73df443f28674
Now that we've confirmed ansible-playbook works as expected, lets
enable the free strategy by default.
While playbooks with singles hosts will not benefit from this, we add
it to be consistent with our playbooks.
Change-Id: Ia6abdfaf5c122f88ead2272c8700e2c1f33c5449
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
In an effort to improve performance, switch out strategy[1] to free.
This will allow each ansible host to run until the end of the play as
fast as it can.
[1] http://docs.ansible.com/ansible/playbooks_strategies.html
Change-Id: I86588154b71e69399be930fc78be7c17f54fd9dd
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Running this playbook on the puppetmaster we consistently run into ssh
failures due to async reconnecting periodically and network issues
between hosts. We can address this by starting a single connection
without async and polling on that which appears to be the default
wait_for behavior. Testing of this seems to indicate it is more
reliable.
Change-Id: Iec72e2c0d099c0e28bc4b4b48608a03b3e66b4c0
Add support so we can run the playbook as non-root user.
Change-Id: I05af471417ba58a985c24dc0ea2c43f1c7e24a4b
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We nolonger need it as we don't have jenkins masters any more.
Change-Id: I8117a6f4afb9f65a1400fad090594efd260c3bec
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We'll wait up to 3hr 10mins for zuul-launchers to shutdown.
Change-Id: I880748704b6cae5a25c21326d6374ac71f4c9e1a
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This is the runner for the ansible cloud launcher role.
Change-Id: Iad9ce14905e89cb875c0cf92dfd8093c3a8d4e1c
Depends-On: Ia775598090471b80be75624a6a6a0649622799e8
We're already on the host, and this defaults to localhost, so this
is simpler and doesn't go through the apache proxy.
Change-Id: Iac1047dc0a482d21466ab062f3aa3b0ef9144f38
Running puppet remotely in an ad-hoc manner on disabled hosts is mildly
complex. To facilitate, have a wide open playbook that we always run
with --limit - and a shell script to help us type less.
Change-Id: I629072dcada38d0465d351b1b99828466405372f
It's fine right now with 5, but over time if we keep a flat namespae,
which is not necessary, it's just going to get ugly.
Change-Id: I07a143f45f2eb100c231ea1b7dd617b40f8f231c
We are only deploying West for now, so just doing West.
When we get East in production, we would update this playbook.
Unfortunate there is no Ansible module or Puppet resources to set
quotas per-project, thus using regular shell module in Ansible.
Change-Id: Ib884508bebedc9f88fac242711af98fc0c4d95ec
Turns out we have had many issues with random servers having
wrong hostname and /etc/hosts info.
This playbook/role allows to configure that by passing
-e "target=<hostname>" as ansible-playbook parameter.
Change-Id: I73939ebc65211a840bb41370c22b111112389716
In a clean deploy of infra cloud, the puppet environment
is not configured from scratch. That will prevent puppet to run
because it won't find the /opt/system-config/production/modules.
The config option of the ansible role will configure properly
all settings before trying to apply it, and things will work
properly.
Change-Id: I736e10623fb3ba90b3320cc20758a18c70930be0
Depends-On: I6cb8dff569f2cca8bca7359412d01cc7ec009c54
Without this patch, we would run infracloud in its playbook, then again
in the 'everybody else' playbook.
Co-Authored-By: Colleen Murphy <colleen@gazlene.net>
Change-Id: I3de1de8f0f74e52a443c0b7a6ef6ae0a2cf7e801
Add separate playbook for infacloud nodes to ensure they run in the
correct order - baremetal -> controller -> compute.
Baremetal is intentionally left out, it is not ready yet.
All 'disabled' flags on infracloud hosts are turned off. This patch
landing turns on management of the infracloud.
Co-Authored-By: Yolanda Robla <info@ysoft.biz>
Co-Authored-By: Spencer Krum <nibz@spencerkrum.com>
Change-Id: Ieeda072d45f7454d6412295c2c6a0cf7ce61d952
The puppet ansible module is growing a flag to be able to send stdout to
syslog. It's growing that because we want to use it. Let's.
Change-Id: I22b1d0e1fb635f2c626d75a11764725c8753bf24
At long last, the day of reckoning is here. Run puppet apply and then
copy the log files back and post them to puppetdb.
Change-Id: I919fea64df0fbb8681e91ac9425b4c43760bb3dd
We don't need to rsync to ourselves. Best case it's a no-op. Worst case
something weird happens and we overwrite ourselves while running.
Change-Id: I890ea487d7a6129b7477b6d17b6a7e3c1904bade
When we do it as a second playbook, the failure to copy updated code
cannot prevent puppet from running.
Change-Id: I94b06988a20da4c0c2cf492485997ec49c3dca13
Depends-On: I22b7a21778d514a0a1ab04a76f03fdc9c58a05b3
There are a few things that are run as part of run_all.sh that are
not logged into puppet_run_all.log - namely git cloning, module installation
and ansible role installation. Let's go ahead and do those in a playbook
so that we can see their output while we're watching the log file.
Change-Id: I6982452f1e572b7bc5a7b7d167c1ccc159c94e66
We're not ready to move from puppet inventory to openstack inventory
just yet, so don't actually swap the dynamic inventory plugin. But, add
it to the system so that running manual tests of all of the pieces is
possible.
Add the currently administratively disabled hosts to the disabled group
so that we can verify this works.
Change-Id: I73931332b2917b71a008f9213365f7594f69c41e
One step before flipping the switch, start copying hieradata, even
though we're still using agent, so that we can verify as much as we
want.
Change-Id: Iae63fd056cdb17aedd6526b9cbc1d83037ddcbb3
We use a symlink into /opt/system-config to make the hiera.yaml config
sane. Make sure it's there.
Change-Id: I5e9681ac8fca71ce2f439eed3ef1281ba228d5b2
If we're going to run puppet apply on all of our nodes, they need
the puppet modules installed on them first.
Change-Id: I84b80818fa54d1ddc4d46fead663ed4212bb6ff3
As we're using these roles, we'll want to pass potentially different
values to different of our hosts over time. For instance, we may want to
set the jenkins servers to start using puppet apply before we get all
the hosts there. Since we run most of the hosts in a big matching
mechanism, the way we can pass different input values to each host.
Change-Id: I5698355df0c13cd11fe5987787e65ee85a384256
/etc/ansible/playbooks isn't actually a thing, it was just a convenient
place to put things. However, to enable puppet apply, we're going to
want a group_vars directory adjacent to the playbooks, so having them be
a subdirectory of the puppet module and installed by it is just extra
complexity. Also, if we run out of system-config, then it'll be easier
to work with things like what we do with puppet environments for testing
things.
Change-Id: I947521a73051a44036e7f4c45ce74a79637f5a8b