Counting bytes and measuring latencies

Infrastructure tooling patterns

Following is my grouping of tools that i have learned/used as a sysad and devops person at ThoughtWorks while maintaining our distributed infrastructure, setting up our private cloud installations, and in many different client gigs .

One can add some of these tooling as and when required as your infrastructure / deployment/app grows.

  1.  Provisioner : Abstracts your vm / environment provisioning mechanism. Mostly relevant if you are on cloud. Examples are boto, fog. Very important if you plan to do something like auto scaling. Gives you elastic infrastructure.
  2. Configuration management system: Lets you create reusable environments by expressing packages, services, files and other component via a dsl , also addresses cross platform issues. Puppet , Chef, Cfengine, salt are examples. A mature CMS setup will give you context aware infrastructure, like you web server can automatically know who is db server or the load balancer can automatically know who all are the web servers. A mature CMS setup will also incorporate the notion of environments, and have version-ed infrastructures. Like UAT can have app deployment version 1.3 while prod in 1.1 and staging in 2.0 etc. 
  3. Application Deployers: Let you deploy your application. CMS can do this too. But there are dedicated apps for this, vlad, cap, func and fabric comes in these category. They also help you automating add hoc system automation. Most of these are ssh in a loop (or uses gnu parallel).
  4. Orchestrator: Kind of app deployers but incorporates middle ware like facilities, to do async command dispatching. Mcollective /salt are example. Both of them uses a middleware (salt uses zeromq while mcollective can use any stomp compliant AMQP) to broker 1->N, 1>1, N>M, N>N real time , async dispatching. Can be used with platforms that dont have ssh. Massively scalable. 
  5. Monitoring solution: Three kinds of monitoring you'll need mostly. System (disk, cpu load, memory ), Services (web server, db server etc), App (a cuke script that check the whole app is working). A good monitoring solution is one which easily integrates with all other infra services, lets you define metric (app response time, free memory, cached memory etc). Customization notifications(email, jabber, sms etc) , escalations. Charting of the metrics (to get trends), reporting , event handlers (use this in conjunction with provisioner to get auto scaling features). Exampls are nagios, zabbix, zenoss and many many more. None of them are complete, but all of them can be complemented with some others (like for charting Graphite) to make a decent solution. Nagios has text based configs, does not uses any db, easy to install,scales well and one of the oldest one hence integration with other apps is very easy. 
  6. Log management and log analytic: Three parts of these solution: a) Forwarding : a client that will sit in every vm and forward the log to a central location. Options are rsyslog, syslog, syslog-ng, graylog agent, logstash, splunk forwarders etc. b) Gathering : A server that will sit and accepts all logs. Syslog-ng, rsyslog, splunk, graylog2 3) Analytics : In most cases you will be searching , indexing your logs for particular patterns. Garylog2, logstash (both uses elastic search). Splunk (very powerful , very costly). A matured log management solution will let you set up alerts based on patterns (like failed transactions, 50Xs, 40Xs etc). 
  7. Supervisors: They observe a service and take appropriate action to bring them alive whenever they are down. Bypassing the whole network monitoring > event handler loop. Very help full for shaky service. Monit, bluepill, god etc are some example. A good supervisor has less meory/cpu footprint, provides fast healing capacity and rich dsl for expressing a service state (like which port should be responsive, which process should be running, how to fix the process if it dies, alert when it takes an action etc). 
  8. Security, hardening and auditing tools: They deserve a separate para. Things like bastille ensures you have done the basic OS level hardening. It can also assess your infrastructure and lock it down if needed. Tools like PSAD and snort uses iptables log to automatically block intruders. Some of the CMS tools like puppet or chef can be used to audit.