initial commit

playbooks, scripts, etc. from the 2nd check_mk conference in munich,
germany.
This commit is contained in:
Marius Pana 2015-10-28 16:37:02 +02:00
parent bf45e8886f
commit 0a123abbcb
22 changed files with 467 additions and 0 deletions

84
README.md Normal file
View File

@ -0,0 +1,84 @@
# INTRO
These are ansible playbooks used for deploying an OMD instance as well as a simple haproxy and two web servers. These are the playbooks that were used by Marius Pana at the 2nd Check_MK conference in Munich, Germany. The presentation will be made available online shortly for those that are interested.
Alert handlers (as defined by Check_MK) can be used from within Check_MK to signal the execution of specific handlers (as defined by Ansible) from the ansible playbooks so as to provide a simple feedback loop which provides self healing.
*We are still looking for a good mapping of services between check_mk and ansible. One solution that was recommended was the use of service attributes(nagios macros) which could then be mapped one-to-one with ansible tags. As soon as we have something functional we will update this. If anyone else has ideas we are interested in hearing them.*
These examples are fairly simple but can and should be expanded to include more logic for repairing your specific systems/services. We intended these as a starting point.
## About these playbooks
We are assuming you are using a RedHat based distro. These playbooks will deploy for you an OMD instance on a freshly installed system, they will configure an HAProxy for load balancing between two apache web servers.
We do not do the initial provisioning via these playbooks but this could be included in the future (i.e. deploy to joyent, cobbler or others). In other words we expect that you have the systems freshly installed and configured with a root user that is allowed SSH access as defined in the cmkconfinv (inventory) file.
### ansible inventory file
The cmkconvinf file contains our inventory. In it we define three groups of hosts, a variable named folder which is the OMD folder we create via the WATO API for the respective host(s) and the IP address where these hosts can be reached.
You must have these installed and configured before running these playbooks. You will also need to know the root user password.
## Prerequisites
Make sure you change the users and ssh keys via the common role. Upload you ssh keys in roles/common/files and edit roles/common/vars/usersandpsks.yml.
### Ansible
You will need a functional ansible set-up. Setting it up can be as easy as cloning the ansible repo or installing via your operating system package manager. More information about installing ansible can be found here: http://docs.ansible.com/ansible/intro_installation.html .
You will also need to clone this repository to play around with these playbooks.
### Check_MK
We are assuming you are using the CEE (Check_MK Enterprise Edition). While this should work with any recent version of Check_MK we are specifically targeting the use of the current innovation branch (1.2.7i) because of the new Alert Handlers (Werk #8275).
If you would like to deploy your OMD instance via these playbooks you will need to download Check_MK CEE in RPM format and place it in the following directory:
> roles/omd/files
## Deploying OMD via Ansible
This is a very simple way to deploy an OMD instance and create a site named "prod". The following command will deploy OMD to your preinstalled server. It will prompt for the root users password (-k).
> ansible-playbook -i cmkconfinv site.yml -l omd -u root -k --skip-tags check_mk_agent,check_mk_discovery,check_mk_apply
Notice the use of the --skip-tags switch which is necessary as in this first run we do not have an OMD instance running from which to pull the agent, discovery, etc.
You now need to create an Automation user in our Check_MK site and use that information in the roles/omd/vars/main.yml file.
Now we can deploy the check_mk_agent to our monitoring instance as well. Notice we are running just the check_mk_agent, discovery and pply steps now. Also after bootstrapping your system you can use your own user if you created one and uploaded the ssh keys. In this case you could use ansible with sudo (-u <username> -s instead of -u root).
> ansible-playbook -i cmkconfinv site.yml -l omd -u root --tags check_mk_agent,check_mk_discovery,check_mk_apply
## Deploying the webserver and loadbalancer
The following will configure your webservers and loadbalancer. It will prompt for the root users password (-k). Once it is done you should have in your OMD instance 4 hosts (1 omd, 2 web servers and one lb) and their services monitored.
> ansible-playbook -i cmkconfinv site.yml -l loadbalancers,webservers -u root -k
## Check_MK Alert Handlers
We have created two alert handlers to showcase two different scenarios:
1. services.sh - Restarting of apache web services if they are failed
2. instantiate.sh - Deploying a loadbalancer if it fails (state DOWN)
These are specific to the setup we were using for the presentation at the conference however they serve as a good starting point.
Add the following two Alert Handlers to your Check_MK site and place the scripts in ~/local/share/check_mk/alert_handlers (make sure they are executable):
services.sh
![image of services.sh ](http://i67.tinypic.com/jgqqzm.png)
```
#!/bin/bash
ansible-playbook -i /omd/sites/prod/ansible/cmkconfinv /omd/sites/prod/ansible/site.yml -l webservers -u root --tags httpd
```
instantiate.sh
![image of instantiate.sh ](http://i65.tinypic.com/14c9s8w.png)
```
#!/bin/bash
ssh root@10.88.88.145 vmadm create -f /etc/zones/loadbalancer.json
ansible-playbook -i /omd/sites/prod/ansible/cmkconfinv /omd/sites/prod/ansible/site.yml -l loadbalancers -u root
```
The first line is specific to my setup which is using SmartOS available at 10.88.88.145. There I have already created a manifest file (loadbalancer.json) to create a loadbalancer instance. You will want to change this for your particular set-up.
## TODO
You may notice an extra two check_mk checks named up_upscale and down_scale on your loadbalancer instance. These are not finished yet however they are an example of how you could use check_mk and ansible to do autoscaling. Based on feedback received via your monitoring you can bring up or down more instances effectively doing autoscaling. This is a work in progress and will be updated in the near future. The ansible tags are add_backend and del_backend, these may be useful if you plan on extending these.
There are certainly more things to be done here ...

7
bootstrap.yml Normal file
View File

@ -0,0 +1,7 @@
---
# file: bootstrap.yml
- hosts: all
#vars:
vars_files: [roles/common/vars/usersandpsks.yml, roles/omd/vars/main.yml]
roles:
- common

9
cmkconfinv Normal file
View File

@ -0,0 +1,9 @@
[loadbalancers]
lb01 ansible_ssh_host=10.88.88.127 folder=loadbalancers
[webservers]
web1 ansible_ssh_host=10.88.88.128 folder=webservers
web2 ansible_ssh_host=10.88.88.129 folder=webservers
[omd]
omd ansible_ssh_host=10.88.88.150 folder=omd

7
loadbalancers.yml Normal file
View File

@ -0,0 +1,7 @@
---
# file: loadbalancers.yml
- hosts: loadbalancers
vars_files: [roles/common/vars/usersandpsks.yml, roles/omd/vars/main.yml]
roles:
- common
- loadbalancers

7
omd.yml Normal file
View File

@ -0,0 +1,7 @@
---
# file: omd.yml
- hosts: omd
#vars:
#vars_files:
roles:
- omd

View File

@ -0,0 +1 @@
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/asZXkhLJVGIcPQGUxZDLl/yMwslgn6GyJd6QGKUmR+Snr1hMz01y7WEWPvfXXUqNym6rMU5fAMUr+alcyzMGZYKyymTLfjgp0SUuWG3TGpl3EPxnfGwNcXOvuJE9cnY0q3nhZgQjvn6EdEFDKAmLG1WXlKYjbQUUrHp0wFvEx3TNIXMVJqHxbKi8Uwyvn5EB1emdeJkaAaXJbk1TxALu400Ts0KYJUUyMn5njJjVELwtPVsnb0skmKSXd4dgBLN+wo94YQLpdfCnmho0uPhZfTHHi0+jtJNtUSycOSuOr/TxYGirxOYcb5FoOvzg9L0RyQAj6O+Hzs3RkHB+qast mariusp@marduk.local

View File

@ -0,0 +1,15 @@
---
# file: roles/common/handlers/main.yml
- name: restart ntp
service: name=ntp state=restarted
tags:
- ntpd
- name: restart xinetd
service: name=xinetd state=restarted
tags: xinetd
- name: restart sshd
service: name=sshd state=restarted
tags:
- sshd

View File

@ -0,0 +1,73 @@
---
# file: roles/common/tasks/main.yml
- name: make sure ntp,epel,etc. are installed
yum: pkg={{ item }} state=installed
with_items:
- ntp
- xinetd
- epel-release
#- screen
#- vim-enhanced
#- mc
tags: packages
- name: add sphs group
action: group name=sphs state=present
- name: add our users
action: user name={{ item }} groups=sphs state=present append=yes
with_items: usersAdd
when: item != 'none'
- name: Add SSH public key to user mariusp
action: authorized_key user=mariusp key="{{ lookup('file', "../files/ssh_keys/mariusp.pub") }}"
- name: Remove users
action: user name={{ item }} state=absent remove=yes
with_items: usersDel
when: item != 'none'
# Enable sudo for sphs group with no password
- name: Enable sudo without password for sudo group
action: 'lineinfile "dest=/etc/sudoers" state=present regexp="^%sphs ALL" line="%sphs ALL=(ALL) NOPASSWD: ALL"'
- name: install check_mk agent
yum: pkg=http://{{ omdhost }}/{{ omdsite }}/check_mk/agents/{{ rpmagent }} state=installed
tags:
- check_mk_agent
# change to get_uri - do some error checking
- name: add host to omd
uri:
method: POST
body_format: json
url: http://{{omdhost}}/{{omdsite}}/check_mk/webapi.py?action=add_host&_username={{automationuser}}&_secret={{autosecret}}
body: 'request={"attributes":{"alias":"{{inventory_hostname}}","ipaddress":"{{ansible_default_ipv4["address"]}}"},"hostname":"{{inventory_hostname}}","folder":"{{folder}}"}'
delegate_to: 127.0.0.1
tags:
- check_mk_agent
notify:
- cmk_discovery
- cmk_apply
- name: cmk_discovery
uri:
method: POST
url: http://{{ omdhost }}/{{ omdsite }}/check_mk/webapi.py?action=discover_services&_username={{ automationuser }}&_secret={{ autosecret }}&mode=refresh
body: 'request={"hostname":"{{ inventory_hostname }}"}'
body_format: json
status_code: 200
tags:
- check_mk_discovery
delegate_to: 127.0.0.1
- name: cmk_apply
uri:
method: POST
url: http://{{ omdhost }}/{{ omdsite }}/check_mk/webapi.py?action=activate_changes&_username={{ automationuser }}&_secret={{ autosecret }}&mode=specific
body: request={"sites":["{{ omdsite }}"]}
body_format: json
status_code: 200
tags:
- check_mk_apply
delegate_to: 127.0.0.1

View File

@ -0,0 +1,8 @@
---
usersAdd:
- mariusp
usersDel:
- none
usersPSK:
- name: mariusp
psk: ["../files/ssh_keys/mariusp.pub"]

View File

@ -0,0 +1,24 @@
#!/bin/bash
CONN=`echo "show info" | socat /var/lib/haproxy/stats stdio |grep CurrConns | cut -d' ' -f2`
SRVS=`cat /etc/haproxy/haproxy.cfg |grep check | grep server |wc -l`
if [ $CONN = 0 ]; then
CONN=4
fi
if [ $SRVS = 0 ]; then
echo "<<<up_scale>>>"
echo "up_scale 1000"
echo "<<<down_scale>>>"
echo "down_scale 1000"
else
let "CONNPERSRV=$CONN/$SRVS"
echo "<<<up_scale>>>"
echo "up_scale $CONNPERSRV"
if [ $SRVS -le 2 ]; then
echo "<<<down_scale>>>"
echo "down_scale 16"
else
echo "<<<down_scale>>>"
echo "down_scale $CONNPERSRV"
fi
fi

View File

@ -0,0 +1,57 @@
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats mode 644 level admin
stats timeout 2m
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend main *:5000
acl url_static path_beg -i /static /images /javascript /stylesheets
acl url_static path_end -i .jpg .gif .png .css .js
#use_backend static if url_static
#default_backend appname
##
listen appname 0.0.0.0:80
mode http
stats enable
stats uri /haproxy?stats
stats realm Strictly\ Private
stats auth marius:marius
balance roundrobin
option httpclose
option forwardfor
# we are adding our hosts manually ..
# we could populate this dynamically from our inventory
server web1 10.88.88.128:80 check
server web2 10.88.88.129:80 check

View File

@ -0,0 +1,29 @@
---
- name: restart haproxy
service: name=haproxy state=restarted
tags: haproxy
#la executie scriptul va seta cu -e o noua variabila de genul new_server=' server web2 10.88.88.129:80 check'
- name: add_backend
action: 'lineinfile "dest=/etc/haproxy/haproxy.cfg" state=present regexp="{{new_server}}" line="{{new_Server}}"'
tags:
- add_backend
#la executie scriptul va seta cu -e o noua variabila de genul old_server=' server web2 10.88.88.129:80 check'
- name: del_backend
action: 'lineinfile "dest=/etc/haproxy/haproxy.cfg" state=absent regexp="{{old_server}}" line="{{old_Server}}"'
tags:
- del_backend
- name: cmk_discovery
command: curl 'http://{{ omdhost }}/{{ omdsite }}/check_mk/webapi.py?action=discover_services&_username={{ automationuser }}&_secret={{ autosecret }}&mode=refresh' -d 'request={"hostname":"{{ inventory_hostname }}"}'
tags:
- check_mk_agent
- check_mk_discovery
- name: cmk_apply
command: curl 'http://{{ omdhost }}/{{ omdsite }}/check_mk/webapi.py?action=activate_changes&_username={{ automationuser }}&_secret={{ autosecret }}&mode=specific' -d 'request={"sites":["{{ omdsite }}"]}'
tags:
- check_mk_agent
- check_mk_discovery

View File

@ -0,0 +1,22 @@
---
- name: make sure haproxy and socat is installed
yum: name={{ item}} state=latest
with_items:
- socat
- haproxy
tags: packages
- name: copy haproxy configuration files
copy: src=../files/haproxy.cfg.j2 dest=/etc/haproxy/haproxy.cfg backup=yes mode=0644
notify:
- restart haproxy
- name: deploy ha_check.sh (autoscale)
copy: src=../files/check_ha.sh dest=/usr/lib/check_mk_agent/plugins/check_sa.sh mode=755
tags: check_sa
notify:
- cmk_discovery
- cmk_apply
- name: enable haproxy
service: name=haproxy enabled=yes state=started

27
roles/omd/tasks/main.yml Normal file
View File

@ -0,0 +1,27 @@
---
# file: roles/common/tasks/main.yml
# we need omd host/site from omd role
- include_vars: roles/omd/vars/main.yml
- name: make sure epel is installed
yum: pkg={{ item }} state=installed
with_items:
- epel-release
tags: packages
- name: upload omd package
copy: src=roles/omd/files/check-mk-enterprise-1.2.7i3p1-el6-36.x86_64.rpm dest=/tmp
- name: install omd server
yum: name=/tmp/check-mk-enterprise-1.2.7i3p1-el6-36.x86_64.rpm state=present
# might be nice to create ansible module for omd
- name: create prod instance
command: /usr/bin/omd create prod
tags:
- omdcreate
- name: start our prod instance
command: /usr/bin/omd start prod
tags:
- omdstart

7
roles/omd/vars/main.yml Normal file
View File

@ -0,0 +1,7 @@
---
automationuser: "automaton"
autosecret: "GUVKRNECLRGFBTQJCRFY"
omdhost: "10.88.88.150"
#omdhost: "192.168.217.129"
omdsite: "prod"
rpmagent: "check-mk-agent-1.2.7i3p1-1.noarch.rpm"

Binary file not shown.

After

Width:  |  Height:  |  Size: 85 KiB

View File

@ -0,0 +1,6 @@
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 10.88.88.150 127.0.0.1 ::1
</Location>

View File

@ -0,0 +1,20 @@
---
- name: restart httpd
service: name=httpd state=restarted
tags:
- httpd
notify:
- cmk_discovery
- cmk_apply
- name: cmk_discovery
command: curl 'http://{{ omdhost }}/{{ omdsite }}/check_mk/webapi.py?action=discover_services&_username={{ automationuser }}&_secret={{ autosecret }}&mode=refresh' -d 'request={"hostname":"{{ inventory_hostname }}"}'
tags:
- check_mk_agent
- check_mk_discovery
- name: cmk_apply
command: curl 'http://{{ omdhost }}/{{ omdsite }}/check_mk/webapi.py?action=activate_changes&_username={{ automationuser }}&_secret={{ autosecret }}&mode=specific' -d 'request={"sites":["{{ omdsite }}"]}'
tags:
- check_mk_agent
- check_mk_discovery

View File

@ -0,0 +1,36 @@
---
- name: make sure httpd is installed
yum: name=httpd state=latest
tags: httpd
- name: enable httpd
service: name=httpd enabled=yes state=started
tags:
- httpd
- name: enable http status
copy: src=../files/status.conf.j2 dest=/etc/httpd/conf.d/status.conf backup=yes mode=0644
notify:
- restart httpd
tags:
- http_status
- cmk_discovery
- cmk_apply
- name: add apache_status plugin
get_url: url=http://{{ omdhost }}/{{ omdsite }}/check_mk/agents/plugins/apache_status dest=/usr/lib/check_mk_agent/plugins/apache_status mode=0755
tags:
- apache_status
notify:
- cmk_discovery
- cmk_apply
- name: copy images to sites
copy: src=../files/konf.jpg dest=/var/www/html/ mode=0644
tags:
- webcontent
- name: copy index.html to sites
template: src=../templates/index.html.j2 dest=/var/www/html/index.html mode=0644
tags:
- webcontent

View File

@ -0,0 +1,16 @@
<html>
<head>
<title>Welcome to the 2ND Check_MK Conference!</title>
</head>
<body>
<img src="konf.jpg" />
<br />
<br />
<strong>Welcome to the 2ND Check_MK Conference!</strong>
<br />
<p><h3>Im running on {{ inventory_hostname }}.</h3><p>
<p>Running on {{ ansible_os_family }} ;-}</p>
</body>
</html>

5
site.yml Normal file
View File

@ -0,0 +1,5 @@
---
- include: bootstrap.yml
- include: webservers.yml
- include: loadbalancers.yml
- include: omd.yml

7
webservers.yml Normal file
View File

@ -0,0 +1,7 @@
---
# file: webservers.yml
- hosts: webservers
vars_files: [roles/common/vars/usersandpsks.yml, roles/omd/vars/main.yml]
roles:
- common
- webservers