Mar 15, 2014 · 2 minute read
I’ve been a big fan of Vagrant since it’s
initial release and still find myself using it for various tasks.
Recently I’ve been using it to test collections of Puppet modules. For a
single host
vagrant-serverspec is
excellent. Simply install the plugin, add a provisioner and write your
serverspec tests. The serverspec provisioner
looks like the following:
config.vm.provision :serverspec do |spec|
spec.pattern = '*_spec.rb'
end
But I also found myself wanting to test behaviour from the host
(serverspec tests are run on the guest), and also wanted to write tests
that checked the behaviour of a multi-box setup. I started by simply
writing some Cucumber tests which I ran locally,
but I decided I wanted this integrated with vagrant. Enter
vagrant-cucumber-host.
This implements a new vagrant provisioner which runs a set of cucumber
features locally.
config.vm.provision :cucumber do |cucumber|
cucumber.features = []
end
Just drop your features in the features folder and run vagrant
provision
. If you just want to run the cucumber features, without any
of the other provisioners running you can use:
vagrant provision --provision-with cucumber
Another advantage of writing this as a vagrant plugin is that it uses
the Ruby bundled with vagrant, meaning you just install the plugin
rather than faff about with a local Ruby install.
A couple of other vagrant plugins that I’ve used to make the testing
setup easier are vagrant-hostsupdater
and vagrant-hosts. Both
help with managing hosts files, which makes writing tests without
knowing the IP addresses easier.
Feb 16, 2014 · 5 minute read
At the excellent London Devops meetup last week I asked what was
apparently a controversial question:
should you just use software as a service monitoring products rather than integrate lots of open source tools?
This got a few people worked up and I promised a blog post.
Note that I wrote a post listing lots of open source monitoring tools
not that long ago. And I’ve been to both the
Monitorama events about open source
monitoring. And have a bunch of Puppet modules for open source monitoring tools. I’m
a fan of both open source and of open source monitoring. Please don’t
read this as an attack on either, and particularly on the work of
awesome people working on great open source monitoring products.
Some assumptions
- No one product exists that does everything. I think this is true for
SaaS as much as for open source.
- Lets work with about 200 hosts. This is a somewhat arbitrary number I
know, some people will have more and others less.
- If it saves money we’ll pay yearly, rather than monthly or hourly.
- We could probably get some volume discounts from some of the
suppliers, but we’ll use list prices for this post.
Show me the money
So what would it cost to get up and running with a state of the art
software as a service monitoring system? In order to do this we need to
choose our software. For this post that means I’m going to pick products
I’ve used (sometimes only a bit) and like. This isn’t a comprehensive
study of all the alternatives I’m afraid - though feel free to write
your own alternative blog posts.
New Relic provides a crazy amount of data about
the running of both your servers and your applications. This includes
application performance data, errors, low level metrics and even rolled
up method or database query performance. $149 per host per month for our
200 hosts gives us $29,800 per month.
Librato Metrics provides a fantastic way
of storing arbitrary time series data. We’re already storing lots of
data in New Relic but Metrics provides us with less opinionated software
so we can use it for anything, for instance number of logins or searches
or other business level metrics. We’ll go for a plan with 200 data sources, 100 metrics each and at 10 second resolution for a cost of $3,860 per month.
Pagerduty is all about the alerts side of
monitoring. Most of the other SaaS tools we’ve chosen integrate with it
so we can make sure we get actionable emails and SMS messages to the
right people at the right time. Our plan costs $18 per person per month,
so lets say we have 30 people at a cost of $540 per month.
Papertrail is all about logs. Simple setup
your servers with syslog and Papertrail will collect, analyze and store
all your log messages. You get a browser based interface, search tools
and the ability to setup alerts. We like lots of logs so we’ll have a
plan for 2 weeks of search, 1 year archive and 100GB month of log
traffic. That all costs $575 per month.
Sentry is all about exceptions. We could be
simply logging these and sending them to Papertrail but Sentry provides
tools for tracking and rolling up occurences. We’ll go for a plan with
90 days of history and 200 events per minute at a cost of $199 a
month.
Pingdom used to provide a very simple
external check service, but now they have added more complex multistage
checks as well as real user monitoring to the basic ping. We’ll choose
the plan with 250 checks, 20 Real User Monitoring sites and 500 SMS
alerts for $107 a month.
How much!
In total that all comes to $35,080 (£20,922) per month, or
$420,960 (£251,062) per year.
Now the first reaction of lots of people will be that’s a lot of money
and it is. But remember open source isn’t free either. We need to pay
for:
- The servers we run our monitoring software on
- The people to operate those servers
- The people to install and configure our monitoring software
- The office space and other costs of employing people (like management
and hiring)
I think people with the ability to build software tend to forget they
are expensive, whether as a contractor or as a full time member of
staff. And people without management experience tend to forget costs
like insurance, rent, management overhead, recruitment, etc.
And probably more important than these for some people we need to consider:
- The time taken to build a good open source monitoring system
The time needed to put together a good monitoring stack based on for
instance logstash, kibana, riemann, sensu, graphite and collectd isn’t
small. And don’t forget the number of other moving parts like redis,
rabbitmq and elasticsearch that need installing configuring and
maintaining. That probably means compromising in the short term or
shipping later. In a small team how core is building your monitoring
stack to what you do as a business?
But I can’t use SaaS
For some people, using a software as a service product just isn’t going
to cut it. Here’s a list of reasons I can think of:
- Regulation constrains where your data can be stored, for instance it’s
not allowed out of the country
- Sheer size of infrastructure, although you may be able to get a volume
discount it might not be enough
I think everything else is a cost/benefit issue or personal preference
(or bias). Happy to add more to that list, but I don’t think it’s a very
long list.
Conclusions
I’ve purposefully not talked about the quality of the tools here, just
the cost. I’ve also not mentioned that it’s likely not an all or nothing
decision, lots of people will mix SaaS products and open source tools.
Whether taking a SaaS approach will be quicker, cheaper or better will
depend on your specific business context. But try and make that about
the organisation and not about the technology.
If you’ve never used the current crop of SaaS monitoring
tools (and not just the one’s mentioned above) then I think you’re missing
out. Even if you stick with a mainly open source monitoring stack you
might look at your tools a bit differently after you’ve experimented
with some of the commercial competition.
Feb 5, 2014 · 3 minute read
A little while ago I published a template writing your own puppet modules. It’s
very opinionated but comes out of the box with lots of the tools you
eventually find and add to your tool box. I’m posting this as it came
up at the recent Configuration Management Camp
and after discussing it I realised I hadn’t actually wrote anything
about it anywhere.
What do you get?
- A simple install, config, service class pattern
- Unit tests with rspec-puppet
- Rake tasks for linting and syntax checking
- Integration tests using Beaker
- A Modulefile to provide Forge metadata
- Command line tools to upload to the Forge with blacksmith
- A README based on the Puppetlabs documentation standards
- Travis CI configuration based on the official
Puppetlabs support matrix
- A Guardfile which can run all the tests when you change manifests
Obviously you can choose not to use parts of this, or even delete
aspects, but I find that approach much quicker than starting from scratch
or copying files from previous modules and changing names.
How can I use it?
Simple. The following will install the module skeleton to
~/.puppet/var/puppet-module/skeleton
. This turns out to be picked up
by the Puppet module tool.
git clone https://github.com/garethr/puppet-module-skeleton
cd puppet-module-skeleton
find skeleton -type f | git checkout-index --stdin --force --prefix="$HOME/.puppet/var/puppet-module/" --
With that in place you can then just run the following to create a new
module, where puppet-ntp is the name of our new module.
puppet module generate puppet-ntp
We use puppet module
like this rather than just copying the files
because otherwise you would have to rename everything from class names
to test assertions. The skeleton actually contains erb templates in
places, and running puppet module generate
results in the module name
being available to those templates.
Now what?
Assuming you have run the above commands you should have a folder called
puppet-ntp
in your current directory. cd
into that and then install
the dependencies:
Bundler is a dependency manager for Ruby. If you
don’t already have it installed you should be able to do so with the
following:
Now you have the dependencies why not run the full test suite? This
checks syntax, lints the Puppet code and runs the unit tests.
Unit tests give fast feedback and help make sure the code you write is
going to do what you intend, but they aren’t actually applying the
manifests to a real machine. For that you want an integration test.
You’ll need Vagrant installed for this next
step. Lets run those as well with:
bundle exec rspec spec/acceptance
This will take a while, especially the first time. This uses Beaker to
download a virtual machine from Puppetlabs (if you don’t already have
it) and then brings up a new machine, applies a simple manifest, runs
the acceptance tests and then destroys the machine.
The CONTRIBUTING.md
file has more information for running the test
suite.
What’s new?
I’ve recently added a Guardfile to
help with testing. You can run this with:
Now in a separate tab or pane make a change to any of the manifests. The
tests should run automatically in the tab or pane where guard is
running.
Probably. Although I started the repo a few other people have
contributed code or made improvements already. Just sent a pull request
or open an issue.
Jan 25, 2014 · 2 minute read
One of my favourite topics for a while now has been infrastructure as
code. Part of that involves introducing well understood programming
techniques to infrastructure - from test driven design, to refactoring
and version control. One tool I’m fond of (even with it’s potential to
be misused) is code coverage. I’d been meaning
to go code spelunking to see if this could be done for testing Puppet
modules.
The functionality is now in master for rspec-puppet
and so anyone feeling brave can use it now, or if you must wait for the
2.0.0 release. The actual implementation is inspired by the same functionality in
ChefSpec
written by Seth Vargo. Lots of the how came
from here, and the usage is very similar.
How to use it?
First add (or hopefully change) your Gemfile line item for rspec-puppet
to the following:
gem "rspec-puppet", :git => 'https://github.com/rodjek/rspec-puppet.git'
Then all you need to do is include the following line anywhere in a
spec.rb file in your spec directory.
at_exit { RSpec::Puppet::Coverage.report! }
What do I get?
Here’s an example module,
including a file called
coverage_spec.rb.
When running the test suite with rake spec
you now get coverage
details like so:
Total resources: 24
Touched resources: 8
Resource coverage: 33.33%
Untouched resources:
Class[Nginx]
File[preferences.d]
Anchor[apt::update]
Class[Apt::Params]
File[sources.list]
Exec[Required packages: 'debian-keyring debian-archive-keyring' for nginx]
Anchor[apt::source::nginx]
Class[Apt::Update]
File[configure-apt-proxy]
Apt::Key[Add key: 7BD9BF62 from Apt::Source nginx]
Anchor[apt::key/Add key: 7BD9BF62 from Apt::Source nginx]
Anchor[apt::key 7BD9BF62 present]
File[nginx.list]
Exec[apt_update]
File[sources.list.d]
Exec[e407f76c6e349fc397947a4a49260a9320196cb1]
Here’s the output on Travis CI as
well for a recent build.
Why is this useful?
I’ve already found coverage useful when writing tests for a few of my
puppet modules. The information about the total number of resouces is
interesting (and potentially an indicator of complexity) but the list of
untouched resources is the main useful part. These represent both
information about what your module is doing, and potential things you
might want to test.
I’m hoping to find some more time to make this even better, providing
more information about untouched resources, adding some configuration
options and hopefully to integrate with the Coveralls API.
Jan 12, 2014 · 4 minute read
As of a few weeks ago Test Kitchen has a shell provisioner as well as the original Chef provisioners. This opens up all sorts of interesting testing potential.
If you’ve not already seen Test Kitchen, probably because you’re not using Chef, it’s a tool for integration testing infrastructure code. Configured by a simple YAML file it will setup a matrix of virtual machines, using Virtualbox, AWS, OpenStack and more, run some setup code (normally applying Chef recipes) and then run a test suite (with support for Bats, ShUnit2, Rspec and Serverspec). It’s all very pluggable. With the addition of the shell provisioner it’s useful to just about anyone. To try and prove that here’s a hello world style example.
Dependencies
First we need to install Test Kitchen. We’ll use vagrant and virtualbox for our example too so we need a few extra dependencies. I’m going to assume you have bundler installed, if not you may be able to do so with gem install bundler
but as the number of ways of setting a ruby environment up is greater than the number of people on the planet I’ll have to defer to instructions elsewhere for getting that far.
First create a file called Gemfile
with the following contents:
source "https://rubygems.org"
gem "test-kitchen", :git => "https://github.com/test-kitchen/test-kitchen.git"
gem "kitchen-vagrant"
gem "vagrant-wrapper"
Then run:
bundle install
This should install the above software. Note that the shell provisioner is not yet in an official release so where installing direct from GitHub for the moment.
Configuration
Next we’ll tell Test Kitchen what we want to do. As much for demonstration purposes I’m going to grab one of the Puppetlabs boxes. This is just plain Vagrant so feel free to substitude the box
and box_url
for alternatives you already have installed locally. Otherwise the first run will take a little longer as it downloads a large file.
Pull all of the following in a file called `.kitchen.yml’.
---
driver:
name: vagrant
provisioner:
name: shell
platforms:
- name: puppet-precise64
driver_config:
box: puppet-precise64
box_url: http://puppet-vagrant-boxes.puppetlabs.com/ubuntu-server-12042-x64-vbox4210.box
suites:
- name: default
The shell provisioner is going to look for a file called bootstrap.sh
by default. You can overide this but we’ll leave it for the moment. Our bootstrap script is going to do something very simple, install the ntp package. But the important part is it could do anything; run Salt, run Ansible, run Puppet, execute any arbitrary code we choose. In this case our script is completely self contained but if it needed some additional files we could put them in a directory called data
and they would be copied to the newly created virtual machine under /tmp/kitchen
.
#!/bin/bash
apt-get install ntp -y
Tests
The last step is to write a test. I’m suddently finding lots of excuses to use Serverspec so we’ll use that, but if you prefer you can use pretty much anything. The following file should be saved as test/integration/default/serverspec/ntp_spec.rb
. Note the default
in the path which matches our suite above in the .kitchen.yml
file. Test Kitchen allows for multiple suites all with separate tests based on a strong set of file path conventions.
require 'serverspec'
include Serverspec::Helper::Exec
include Serverspec::Helper::DetectOS
RSpec.configure do |c|
c.before :all do
c.path = '/sbin:/usr/sbin'
end
end
describe package('ntp') do
it { should be_installed }
end
describe service('ntp') do
it { should be_enabled }
it { should be_running }
end
Running the tests
With all of that in place we’re ready to run our tests.
bundle exec kitchen test
This should:
- download the virtual machine image if you don’t already have it locally
- create a new virtual machine based on the image
- run the bootstrap.sh script
- run our serverspec test suite
The real power comes from doing this iteratively as you work on code, probably code more complex than a simple one-line bash script. You can also test across multiple virtual machines at a time, for instance different operating systems or different machine roles. The kitchen
command line tool provides lots of help too, with the ability to login to machines, verify that specific combinations of platform and suite are working and print lots of diagnotic information to aid development.
Hopefully this will make it into a release soon, and we’ll see more involved examples using higher level tools and more documentation. But even now I’d be looking at Test Kitchen for any infrastructure testing you might be doing.
Jan 1, 2014 · 2 minute read
Packer provides a great way of describing the steps for creating a virtual machine image. But it doesn’t have a built-in way of verifying those images.
Serverspec provides a nice framework for writing tests against infrastructure, asserting the operation of services or the installation of packages.
I’m interested at the moment in building continous delivery pipelines for infrastructure components and have a simple working example of testing Packer with Serverspec on
Github. The example uses the AWS builder and the Puppet provisioner but the approach should work with other combinations.
This doesn’t represent a complete infrastructure pipeline, but it does demonstrate an approach to automating one particular component - building base images.
Testing
In our example I’m using the Puppetlabs NTP module to install and configure NTP. Once the Puppet provisioner has run, but before we build the AMI (or other virtal machine image) we run a test suite. For our example the tests are pretty simple:
describe package('ntp') do
it { should be_installed }
end
describe service('ntp') do
it { should be_enabled }
it { should be_running }
end
If the tests fail, Packer will stop and the AMI won’t be built. The combination of storing the code (Packer template) alongside a test suite (Serverspec) and building a new AMI whenever you change the code, makes this setup perfect for continuous integration.
Wercker builds
As an example of a continuous integration setup the repository contains a wercker.yml configuration file for the excellent Wercker service. Wercker makes setting up multi-step built pipelines easy and nicely configurable via a simple text file in your repository.
The Wercker build for this project is public. Currently the build involves downloading Packer, running packer validate
to check the template and eventually running packer build
to boot an instance and run our serverspec tests.
Dec 29, 2013 · 6 minute read
Originally written as part of Sysadvent 2013.
Writing automated tests for your code is one of those things that,
once you have gotten into it, you never want to see code without tests
ever again. Why write pages and pages of documentation about how
something should work when you can write tests to show exactly how something does work? Looking at the number and quality of testing tools and frameworks (like cucumber,
rspec, Test Kitchen,
Server Spec,
Beaker,
Casper and
Jasmine to name a few) that have
popped up in the last year or so I’m obviously not the only person who
has a thing for testing utilities.
One of the other things I am interested in is web application
security, so this post is all about using the tools and techniques
from unit testing to avoid common web application security issues. I’m
using Ruby in the examples but you could quickly convert these to other languages if you desire.
Any port in a storm
Lets start out with something simple. Accidentally exposing
applications on TCP ports can lead to data loss or introduce a vector
for attack. Maybe your main website is super secure, but you left the
port for your database open to the internet. It’s the server configuration equivalent of forgetting to lock the back door.
Nmap is a tool lots of people will be familiar with for spanning for
open ports. As well as a command line interface Nmap also has good
library support in lots of languages so lets try and write a simple
tests suite around it.
require "tempfile"
require "nmap/program"
require "nmap/xml"
describe "the scanme.nmap.org website" do
file = Tempfile.new("nmap.xml")
before(:all) do
Nmap::Program.scan do |nmap|
nmap.xml = file.path
nmap.targets = "scanme.nmap.org"
end
end
@open_ports = []
Nmap::XML.new("scan.xml") do |xml|
xml.each_host do |host|
host.each_port do |port|
@open_ports << port.number if port.state == :open
end
end
end
end
With the above code in place we can then write tests like:
it "should have two ports open" do
@open_ports.should have(2).items
end
it "should have port 80 open" do
@open_ports.should include(80)
end
it "should have port 22 closed" do
@open_ports.should_not include(22)
end
We can run these manually, but also potentially as part of a
continuous integration build or constantly as part of a monitoring
suite.
Run the Guantlt
We had to do quite a bit of work wrapping Nmap before we could write
the tests above. Wouldn’t it be nice if someone had already wrapped
lots of useful security minded tools for us? Gauntlt is pretty much just that, it’s a security testing framework based on cucumber which currently supports curl, nmap, sslyze, sqlmap, garmr and a bunch more tools in master. Lets do something more advanced than our port scanning test above by testing a URL for a SQL injection vulnerability.
@slow
Feature: Run sqlmap against a target
Scenario: Identify SQL injection vulnerabilities
Given "sqlmap" is installed
And the following profile:
| name | value |
| target_url | http://localhost/sql-injection?number_id=1 |
When I launch a "sqlmap" attack with:
"""
python <sqlmap_path> -u <target_url> —dbms sqlite —batch -v 0 —tables
"""
Then the output should contain:
"""
sqlmap identified the following injection points
"""
And the output should contain:
"""
[2 tables]
+-----------------+
| numbers |
| sqlite_sequence |
+-----------------+
"""
The Gauntlt team publish lots of examples like this one alongside the source code, so getting started is easy. Gauntlt is very powerful, but as you’ll see from the example above you need to know quite a bit about the underlying tools it is using. In the case above you need to know the various arguments to sqlmap and also how to interpret the output.
Enter Prodder
Prodder is a tool I put together
to automate a few specific types of security testing. In many ways
it’s very similar to Gauntlt; it uses the cucumber testing framework
and uses some of the same tools (like nmap and sslyze) under the hood.
However rather than a general purpose security framework like Gauntlt,
Prodder is higher level and very opinionated. Here’s an example:
Feature: SSL
In order to ensure secure connections
I want to check the SSL configuration of my servers
Background:
Given "sslyze.py" is installed
Scenario: Check SSLv2 is disabled
When we test using the "sslv2" protocol
Then the exit status should be 0
And the output should contain "SSLv2 disabled"
Scenario: Check certificate is trusted
When we check the certificate
Then the output should contain "Certificate is Trusted"
And the output should match /OK — (Common|Subject
Alternative) Name Matches/
And the output should not contain “Signature Algorithm: md5”
And the output should not contain “Signature Algorithm: md2”
And the output should contain “Key Size: 2048”
Scenario: Check certificate renegotiations
When we test certificate renegotiation
Then the output should contain "Client-initiated
Renegotiations: Rejected”
And the output should contain “Secure Renegotiation: Supported”
Scenario: Check SSLv3 is not using weak ciphers
When we test using the "sslv3" protocol
Then the output should not contain "Anon"
And the output should not contain "96bits"
And the output should not contain "40bits"
And the output should not contain " 0bits"
This is a little higher level than the Gauntlt example — it’s not
exposing the workings of sslyze that is doing the actual testing. All
you need is an understanding of SSL certifcates. Even if you’re not an
expert on SSL you can accept the aforementioned opinions of Prodder
about what good looks like. Prodder currently contains steps and
exampes for port scanning, SSL certificates and security minded HTTP
headers. If you already have a cucumber based test suite (including
one based on Gauntlt) you can reuse the step definitions in that too.
I’m hoping to build upon Prodder, adding more types of tests and
getting agreement on the included opinions from the wider systems
administration community. By having a default set of shared assertions
about the expected security of out system we can more easily move onto
new projects, safe in the knowledge that a test will fail if someone
messes up our once secure configuration.
I’m convinced, what should I do next?
As well as trying out some of the above tools and techniques for
yourself I’d recommend encouraging more security conversations in your
development and operations teams. Here’s a few places to start with:
Dec 28, 2013 · 2 minute read
Hyde is a brazen two-column Jekyll theme that pairs a prominent sidebar with uncomplicated content. It’s based on Poole, the Jekyll butler.
Built on Poole
Poole is the Jekyll Butler, serving as an upstanding and effective foundation for Jekyll themes by @mdo. Poole, and every theme built on it (like Hyde here) includes the following:
- Complete Jekyll setup included (layouts, config, 404, RSS feed, posts, and example page)
- Mobile friendly design and development
- Easily scalable text and component sizing with
rem
units in the CSS
- Support for a wide gamut of HTML elements
- Related posts (time-based, because Jekyll) below each post
- Syntax highlighting, courtesy Pygments (the Python-based code snippet highlighter)
Hyde features
In addition to the features of Poole, Hyde adds the following:
- Sidebar includes support for textual modules and a dynamically generated navigation with active link support
- Two orientations for content and sidebar, default (left sidebar) and reverse (right sidebar), available via
<body>
classes
- Eight optional color schemes, available via
<body>
classes
Head to the readme to learn more.
Browser support
Hyde is by preference a forward-thinking project. In addition to the latest versions of Chrome, Safari (mobile and desktop), and Firefox, it is only compatible with Internet Explorer 9 and above.
Download
Hyde is developed on and hosted with GitHub. Head to the GitHub repository for downloads, bug reports, and features requests.
Thanks!
Oct 13, 2013 · 3 minute read
Originally published on Medium.
We have a bunch of internal mailing lists at work, and on one of them someone asked:
we’re looking into monitoring/logging tools…
I ended up writing a bit of a long reply which a few people found useful, so I thought I’d repost it here for posterity. I’m sure this will date but I think it’s a reasonable snapshot of the state of open source monitoring tools at the end of 2013.
Simply put, think about four elements and you won’t be far off on the
technical front. Miss one and you’re probably in trouble.
- logs
- metric storage
- metric collection
- monitoring checks
For logs, some combination of syslog at one end and elasticsearch and
Kibana at the other are probably the state of the open source art at
the moment. The shipping around is more interesting; Logstash is improving constantly, Heka is an similar alternative from Mozilla, and Fluentd looks nice too.
For pure metrics it’s all about Graphite, which is both awesome and
perilous. Not much else really competes in the open source world at
present. Maybe OpenTSB (is you’re into a Hadoop stack.)
For collecting metrics on boxes I’d probably look at collectd or diamond both of which have pros and cons but work well. Statsd is also useful here for different types of metric collection and aggregation. Ganglia is interesting too, it combines some aspects of the metrics collection tools with an integrated storage and visualisation tool similar to Graphite.
Monitoring checks is a bit more painful. I’ve been experimenting with Sensu in hope of not installing Nagios. Nagios works but it’s just a bit ungainly. But you do need somewhere to write checks against metrics or other aspects of your system and to issue alerts.
At this point everyone loves dashboards, and Dashing is particularly lovely. Graphiti and Tasseo for Graphite are useful too.
For bonus points things like Flapjack and Reimann provide some interesting extra capabilities around alert control or real time monitoring respectively.
And for that elusive top of the class grade take a look at Kale, which provides anomaly detection on top of Graphite and Elasticsearch .
You might be thinking that’s a lot of moving parts and you’d be right. If you’re a small project running all of that is too much overhead, turning to something like Zabbix might be more sensible.
Depending on money/sensitivity/control issues lots of nice and not so
nice commercial products exist. Circonus, Splunk, New Relic, Boundary and Librato Metrics are all lovely in different ways and provide part of the puzzle.
And that’s just the boring matter of tools. Now you get into alert design and other gnarly people stuff.
If you got this far you should watch all the Monitorama videos too.
Aug 11, 2013 · 5 minute read
Originally published on Medium.
I’m a big fan of the Platform as a Service (PaaS) model of operating web
application infrastructure. But I’m a much bigger user and exponent of
Infrastructure as a Service (IaaS) products within my current role
working for the UK Government. This post describes why that is, and
hopefully helps anyone else inside other large enterprise organisations
reason about the advantages and disadvantages, and helps PaaS vendors
and developers understand what I personally thing is a barrier to
adoption in that type of organisation.
A quick word of caution, I don’t know every product inside out. It’s
very possible a PaaS product exists that deals with the problems I will
describe. If you know of such a product do let me know.
A simple use case
PaaS products make for the very best demos. Have a working application?
Deployment is probably as simple as:
git push azure master
Your app has started to run slowly because visitors are flooding in?
Just scale out with something like:
heroku ps:scale web+2
The amount of complexity being hidden is astounding and the ability to
move incredibly quickly is obvious for anyone with experience of doing
this in a more traditional organisation.
A not so simple use case
Even small systems are often being built out of many small services
these days. Many large organisations have been up to this for a while
under the banner of Service Orientated Architecture. I’m a big fan of
this approach, in my view it moves operational and organisational
complexity back into the development team where its impact can often be
minimised by automation. But that’s a topic for another post.
In a PaaS world having many services is fine. We just have more
applications running on the Platform which can be independently scaled
out to meet our needs. But services need to communicate with each other
somehow, and this is where our problems start. We’ll keep things simple
here by assuming communication is over HTTPS (which should be pretty
typical) but I don’t think other protocols make the problem I have go
away. The same problem applies if you’re using a SaaS database for
example.
It’s the network, stupid
Over what network does my HTTPS internal service call travel? The
internet? The internal PaaS vendor’s network? If the latter, is my
traffic travelling over the same network as other clients on the
platform? Maybe I’m running my own PaaS in-house. But do I trust
everyone else in my very large organisation and want my traffic on the
same network as other things I don’t even know about? Even if it’s just
me do I want internal service traffic mixing with requests coming from
the internet? And are all my services created equally with regards what
they can and cannot access?
Throw in questions like: is the PaaS supplier running on infrastructure
provided by a public IaaS suppliers who you don’t have a relationship
with and you start to question the suitability of the current public
PaaS products for building secure service based systems.
A journey into Enterprise Architectures
You might be thinking, pah, what’s the worst that can happen? If you
work for a small company or a shiny startup that might be completely
valid. If on the other hand you’re working in a regulated environment
(say PCI) or dealing with large volumes of highly sensitive information
you’re very likely to have to build systems that provide layers of
trust, and to be doing inspection, filtering and integrity checking as
requests flow between those layers.
Imagine that I have a service dealing with some sensitive data. If I
control the infrastructure (virtualised or not, IaaS provided or not)
I’ll make sure that service endpoint isn’t available to anything that
doesn’t need access to it via my network configuration. If I’m being
more thorough I’ll filter traffic through some sort of proxy that does
checking of the content; It should be JSON (or XML), it should meet this
schema, It shouldn’t exceed this rate, it shouldn’t exceed this payload
size or response size, etc. That is before anything even reaches the
services application. And that’s on top of SSL and maybe client
certificates.
If I don’t control the infrastructure, for example when running on a
PaaS, I lose some of the ability to have the network protect me. I can
probably get some of this back by running my own PaaS on my own
infrastructure, but without awareness and a nice interface to that
functionality at the PaaS layer I’m going to lose lots of the benefits
of running the PaaS in the first place. It’s nice that I can scale my
application out, but if new instances can’t connect to the required
backend services without some additional network configuration that’s
invisible to the PaaS what use is that?
The question becomes; how to implement security layers within existing
PaaS products (without changing them). And my answer is “I don’t know”.
Yet.
Why isn’t SSL enough?
SSL doesn’t help as much as you’d like to think here because if I’m an
attacker what I’m probably going to attack is your buggy code rather
than the transport mechanism. SSL doesn’t protect you from SQL injection
or unpatched software or zero-day exploits. If the only thing that my
backend service will talk to is my frontend application, an attacker has
to compromise two things rather than just ignore the frontend and go
after the data. Throw in a filter as described above and it’s really
three things that need to be overcome.
The PaaS/IaaS interface
I think part of the solution lies in exposing some of the underlying
infrastructure via the PaaS interface. IaaS is often characterised as
compute, storage and network. In my experience everyone forgets the
network part. In a PaaS world I don’t want to be exposed to storage
details (I just want it to appear infinite and pay for what I use) or
virtual machines (I just care about computing power, say RAM, not the
number of machines I’m running on) but I think I do, sometimes, want to
be exposed to the (virtual) network configuration.
Hopefully someone working on OpenShift or CloudFoundry or Azure or
Heroku or DotCloud or insert PaaS here is already working on this. If
not maybe this post will prompt someone to do so.