Ansible and EC2
There are a few must-haves to switch from Chef to Ansible and a few nice to haves.
- A replacement for chef search
- Ability to create and manage auto-scaling infrastructure on EC2.
- A way to bootstrap inventory without a central master.
So a basic application pattern would be a central server and a set of workers. To provision on EC2, we need to do a few things:
- Create a security group
- Spin up a central server instance
- Spin up worker instances
- Ability to setup autoscale groups
- Provision the workers brought up by the autoscale group
Setting up the EC2 resources is easier in ansible than in chef. A few things to note below:
- The source code for this post is https://github.com/tjheeta/ansible_ec2_example. You’ll need to have boto and ansible installed to run it.
- I’m just going to go over the gotchas I found, not the actual source because that should be straightforward.
- The first command is executed on the local. It’s essentially just running the equivalent boto commands. Make sure you’ve set it up with the proper keys.
➜ ~ cat ~/.boto [Credentials] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEY
This following sample on github does a few things:
- Creates the security group
- Adds the main instance to the security group
- Creates a userdata script using the main instance ip address to be used for workers.
- Creates the ec2 launch configuration with the main instance ip address available from the userdata script.
The example is below:
---
# Basic provisioning example
- name: Create AWS resources
hosts: localhost
connection: local
gather_facts: False
tasks:
- name: Create security group
local_action:
module: ec2_group
name: rabbit_fw
description: "rabbit_fw"
region: "{{aws_region}}"
rules:
- proto: tcp
type: ssh
from_port: 22
to_port: 22
cidr_ip: 0.0.0.0/0
rules_egress:
- proto: all
type: all
cidr_ip: 0.0.0.0/0
register: rabbit_fw
- name: Create security group
local_action:
module: ec2_group
name: worker_fw
description: "worker_fw"
region: "{{aws_region}}"
rules:
- proto: tcp
type: ssh
from_port: 22
to_port: 22
cidr_ip: 0.0.0.0/0
rules_egress:
- proto: all
type: all
cidr_ip: 0.0.0.0/0
register: worker_fw
- debug: var=worker_fw
- name: create rabbitmq instance
local_action:
module: ec2
key_name: "{{key_name}}"
region: "{{aws_region}}"
#group_id: "{{rabbit_fw.group_id}}"
group: [ "default", "rabbit_fw" ]
instance_type: "{{instance_type}}"
instance_tags:
group: rabbit
count_tag:
group: rabbit
exact_count: 1
image: "{{ami_id}}"
wait: yes
register: ec2host_rabbit
- name: setup the internal rabbit ip script for userdata
local_action: template src=worker_init_script.j2 dest=/tmp/init_worker.sh
with_items: ec2host_rabbit.tagged_instances
- debug: var=ec2host_rabbit
- add_host: hostname={{ item.public_ip }} groupname=rabbit
with_items: ec2host_rabbit.tagged_instances
# Can't combine these for some reason
- name: wait for rabbit instances to listen on port:22
wait_for: state=started host={{ item.public_ip }} port=22
with_items: ec2host_rabbit.tagged_instances
- name: setup the launch config for the autoscale group for workers with the user_data scripts
ec2_lc:
name: worker_lc
image_id: "{{ ami_id }}"
key_name: "{{key_name}}"
region: "{{aws_region}}"
security_groups: "default,worker_fw"
instance_type: "{{ instance_type }}"
user_data: "{{ lookup('file', '/tmp/init_worker.sh') }}"
tags: launch_config
- name: setup the autoscaling group
ec2_asg:
name: worker_asg
health_check_period: 60
launch_config_name: worker_lc
min_size: 1
max_size: 1
region: "{{aws_region}}"
tags: autoscale_group
- name: configure the rabbit
hosts: rabbit
remote_user: ubuntu
sudo: true
roles:
- rabbit
#- name: configure the workers
# hosts: workers
# remote_user: ubuntu
# sudo: true
# roles:
# - worker
There are a few things to note:
- The count_tag requires the form given in the sample below for exact_count WITH the instance_tag. Don’t mess it up.
- The tag will show in ec2 console under tags -> group : rabbit
- The autoscale inventory will show in ec2 under tags -> worker_asg
- The inventory directory contains two files - hosts and ec2.py
- The ec2.ini file in the github repo is setup to use us-east. This speeds it up slightly.
- If something seems to be out of date, the ec2.py script should be re-run with
./inventory/ec2.py --refresh-cache
The problem comes with configuring the workers with the userdata scripts. With chef, you can easily bootstrap a userdata script as outlined in this post. Also, with a master server which keeps track of ip’s, we will know the ip of the rabbit server.
Ansible is masterless, which initially was annoying, followed by being awesome, and now it is back to being annoying. There is ansible-pull which can be used to pull down a git-repo followed by executing commands. And ansible tower which we are going to firmly avoid because we wanted this to be easy and preferably free for those one-off projects that seem to happen. There’s an interesting post about using ansible with knockd. Also, if we’re going to manage thousands of hosts, we aren’t going to do it from a laptop. Essentially, we’re going to need to setup a host internally that manages groups. There’s also a project on github that tries to implement an alternative to Tower called semaphore.
There are a few ways to go about setting up a host fired up by autoscale and setting up the correct ip for rabbit:
- Setup the role and the bucket relationship for the userdata script, same as chef and put the ip address of rabbit in S3.
- Setup github with a read-only key to certain private projects and setup git-pull.1
- Setup an internal trusted server to pull down data.
- Instead of using autoscale groups, we can run an ansible machine that runs from cron every few minutes to make sure there are the correct number of hosts setup.
- A combination of the above.
I’m not sure which route is the best to take as of yet, but one thing for sure, Ansible Tower is too expensive for small, simple projects that require some auto-scaling.
tl;dr - ansible is working fine without search, sets up the ec2 fine, but needs a method to callback and configure workers created by autoscale group
- Will still need to manage the ip address of the rabbit server somehow through elastic ip’s, dns, etc. [return]