Podcast Review: Software Engineering Radio

I’ve been catching up on some podcasts while catching up on tasks around the home this past week. One of them is Software Engineering Radio, which unsurprisingly is about software engineering. The podcast invites on a lot of well known guests to talk about their experinces and their opinions.

Follows are some recent ones I have particularly enjoyed.

James Lewis on Microservices

If you have been closely following Martin Fowler and James Lewis’ microservice commentary, this won’t have anything that you haven’t heard before. But if you’re just discovering microservices this is an excellent summary of the architural style.

Gang of Four – 20 Years Later

The surviving authors of the seminal Design Patterns discuss their book, and its relevance today. They share their favourite patterns and their least favourites. I’ll give you one guess which pattern they regret. Probably my favourite interview on se-radio to date.

James Turnbull on Docker

An excellent introduction to its history and why you should care about this paradigm shifting technology. Another interview that probably won’t provide you any insights if you have been following the subject closely.

Josiah Carlson on Redis

Being a long time memcached user I’d never bothered getting deep into Redis. But after listening to this interview I am resolved to get more familiar with it. The interview goes through Redis’ features and design, advantages and drawbacks.

Stefan Tilkov on Architecture and Micro Services

A comparitively less bullish interview on microservices than the James Lewis interview. That isn’t to say that Stefan doesn’t think that the pattern isn’t a good idea, but he does come across a little more cautious than James. Stefan is also somewhat less prescriptive with his definition than James and Martin.

Yet Another Reason I Cannot Use Docker

I would love to use Docker but keep finding edge cases that block its usage. Today the hurdle is with our private repositories and docker build.

The sites we build have external dependencies on private libraries we have written, both Python and Javascript. These dependencies are currently fetched via mercurial/git over ssh, but at build time a docker container doesn’t have access to private keys with permission to access them. The current solutions I’ve seen are to copy in a password-less private key with access to the repositories you need. But then you’re in the horrible situation of either passing around a password-less private key or everyone generating one of their own. Both options being a security risk.

Someone raised a ticket 6 months ago addressing this, Forward ssh key agent into container #6396 which would nicely solve this problem by giving access to SSH_AUTH_SOCK. But only a little discussion has been had on the ticket since and it doesn’t look like it is even in the docker road-map right now. So I am left either:

  • Ignoring the security risks.
  • Taking on the ticket myself.
  • Providing another way of adding dependencies.

The latter is the way I will likely go as I am not happy with ignoring security risks and I don’t know much Go in order to contribute usefully to Docker. However it isn’t clear at this stage what other implementation can be used. Python has devpi but you still need a username/password to access private indexes. Bower can only pull things from Git repositories, so that one might be a real blocker. Though I at least know Javascript so adding a new way for Bower to retrieve packages might be an option.

Microservices: But What About Foreign Keys & Transactions?

One of the most impressively sized books on my bookshelf is Introduction to Database Systems, an artefact of my university years. It had an equally “impressive” price tag, or should that be “punishing” given my student wage? I regarded this expensive and weighty tome to contain unquestionable wisdom from on high, and in the very early 2000s it practically was. I had taught myself SQL in the past, but concepts such as normalisation, ACID and referential integrity were new and became to me immutable aspects of a good database.

From where we sit now, basking in the light of dozens of NoSQL database technologies (a terrible name, but we’re stuck with it), this is obviously not true. The hegemony of RDMS is over, document databases, graph databases and wide column stores are all widely used and acceptable options. With this revolution came the discarding of many of these “immutable” aspects, the argument being that by making this trade off an advantage is conferred. People make these same trade offs in RDMS schemas all the time, de-normalisation for example might be considered a cowboy move, but its read performance advantage is indisputable.

So what does all this have to do with microservices? Well trade offs have to be made, and this becomes obvious fairly early on, often when a developer is first introduced to the concept. With a distributed finely grained architecture spread across different databases and technologies, transactions won’t always be an option, neither will foreign keys. With the speed and agility that a microservice architecture provides this is the cost.

This is a scary idea, especially for those of us weaned on SQL, what will become of our data? First of all transactions. Is there a reason your whole system needs to be able to run in a single transaction? More often a web call will generate several data calls, but many of them are read only, and most of them touching only a few tables or rows. The read only calls probably can run outside the transaction and the others likely centre around some domain concept. That domain concept in turn probably makes sense to be collected into a microservice, which can then run a transaction. This won’t always be the case and hard decisions are inevitable somewhere along the line.

Giving up referential integrity is an easier task as it comes with a big reward. Removing foreign keys and replacing them with API calls means the owner of the data is free to change their internals. As long as the contract with the consumer of the API is obeyed then the owner can change as fast as requirements change, without the consumer having to also be updated. Databases aren’t the only line of defence for referential integrity, most applications we write already deal with this, often checks happen in a few layers as data travels through our systems. Without the database enforcing referential integrity we’re relying on our services and applications behaving correctly manner, something we already do to prevent errors in any case.

Everything old is new again, we aren’t dealing with new concepts, distributed systems have always had to face these trade offs. A micro-service architecture makes these trade offs more visible and explicit creating a tension developers must address. Even if a team chooses a more course grained approach, they’ve evaluated what is going to work for their project best and this can only be a good thing.

Design and Implementation of Microservices Workshop

Today I attended a full day workshop presented by Sam Newman and Scott Shaw on micro-services. Most people seemed pretty up with the why of micro-services, if you’re not James Lewis and Martin Fowler have talked at length on the subject. The workshop covered a wide range of topics and gave an excellent overview of the how and what of micro-services, something which is still lacking in literature (but is coming, see below).

I have seen Sam talk before, in fact his talk at Yow! last year was my first introduction to micro-services and I have read an early release copy of his book Building Microservices. A lot of the concepts were very familiar to me, however being a workshop meant that there was significant discussion around those ideas. This gave me a great deal of insight into the thinking around micro-services and how people have dealt with the trade offs and choices they present.

It was especially nice to have confirmation that some of the ways I am heading with Biarri’s architecture are tried and tested paths. This goes doubly for the contract testing library Pact (and Pact.rb) which a good friend of mine has been encouraging me to port to Python for Biarri’s use.

A few other books, videos and tools popped up in the talk which I have noted for further investigation:

  • Domain-Driven Design: Tackling Complexity in the Heart of Software by Eric Evans
  • Implementing Domain-Driven Design by Vaughn Vernon
  • Apache Zookeeper, useful for managing services, something like consul.io which I discovered last week.
  • Postel’s law: Be conservative in what you do, be liberal in what you accept from others (often reworded as “Be conservative in what you send, be liberal in what you accept”). A useful principle for micro-service communication.
  • Hystrix, a “latency and fault tolerance library” by Netflix. Probably overkill for Biarri’s architecture but it could come in handy in the future.
  • 12factor.net, SaaS app principles which I also discovered last week but re-affirms it’s usefulness.
  • A video by Stuart Halloway (I think) dealing with real time data and versioning. I did a google and couldn’t find a link so I am going to chase up Sam about it.

In all well worth my time.

Infracoders June Meetup

I’ve set myself the goal of attending one software meetup per week. I enjoy attending conferences, and meetups are like micro-conferences. You meet people, you learn something and often there is beer and food at the end like this evening.

This evening I attended Infracoders (http://www.meetup.com/Infrastructure-Coders/), a devops group focused on tools that make devops easier and more fun. Given my current focus on Biarri’s overhaul of infrastructure and development workflows these sorts of tools are on my mind a fair bit.

The presentations were interesting. First up was Alexey Kotlyarov and Ross Williamson from Infoxchange talking about some Docker tools called Pallet and Forklift (https://github.com/infoxchange/docker-forklift). Their talk was rather information dense and I found it a little hard to follow. From what I understood with my small knowledge of Docker is they automate some common development and deployment tasks in a platform/language agnostic manner. They look like they have a similar stack and problems to Biarri in many ways with lots of projects and a fragmented environment, so I will be looking into it deeper.

They also linked to some interesting things in their talk which are worth looking at:

  • Zato (https://zato.io/), a python ESB and application server. I am not clear on what its value proposition is yet but it has something to do with managing SOA which is something I am spending a lot of time thinking about right now.
  • The twelve-factor app (http://12factor.net/), a set of guidelines or principles for building SaaS applications. I’d not seen this before and at first glance looks like good reading.
  • Serf (http://serfdom.io), cluster management.

The second presentation was by Colin Panisset from REA Group about Credulous (http://credulous.io/), a AWS credential management system. He was an excellent and amusing speaker but Credulous solves a problem we don’t have and unlikely to have in the medium term at Biarri. It does look like a good solution if you have a large team with access to your AWS infrastructure. His lack of usage of GPG as a tool to solve the key sharing and encryption part of Credulous’ bothered me a little, but his criticisms of the installation and setup of GPG weren’t without merit. He did highlight some aspects of AWS security that I had not considered and will discuss with my co-workers.

The evening concluded with free beer and dumplings which was nice. I will certainly consider attending again, there seems to be significant overlap with the Devops Melbourne meetup, though perhaps their focuses are different.

A CP Solution for XKCD NP Complete Restaurant Order

I’ve been messing around with Constraint Programming (CP) the past week. A few people at work have tried it out on some real world problems lately but it didn’t seem to stand up when given a lot of data and variables. This seemed sad as the declarative nature of CP attracts me and it strikes me there must be a set problems that it could be used for and it deserved a look.

The first CP model I wrote by myself is stupidly simple but since my non techie fiance understood the code I figure it is a good example. The problem is as described by XKCD below, select some appetizers from the menu so that the total cost adds up to $15.05.

NP Complete

I modeled the problem in Minizinc, a declarative language for modeling CP problems. I am just learning so if you know Minizinc and I’ve done something dumb don’t judge me too harshly.

Firstly we declare a bunch of variables that the solver needs to find values for. We provide the solver a domain in which the variables must lie, zero to ten for all variables in this case. Each of these variables represents the number of times as part of a solution we buy an item to add up to $15.05.

var 0..10: fruit;
var 0..10: fries;
var 0..10: salad;
var 0..10: wings;
var 0..10: sticks;
var 0..10: sampler;

Then we declare a constraint, something that the solver must meet to solve the problem. And in this case we say that a sum of the cost of the items (converted to cents) multiplied by the number of items in the solution must equal the required 1505 cents (that English version could be taken a couple of ways, the maths below makes better sense).

constraint fruit*215 + fries*275 + salad*335 + wings*355 + sticks*420 + sampler*580 == 1505;

We tell the solver to solve to satisfy.

solve satisfy;

And provide a format in which to output the solution.

output ["fruit=", show(fruit), "\t fries=", show(fries), 
        "\t salad=", show(salad), "\t wings=", show(wings),
        "\t sticks=", show(sticks), "\t sampler=", show(sampler)];

Running the model gives us:

$ minizinc --all-solutions xkcd.mzn
fruit=7     fries=0     salad=0     wings=0     sticks=0     sampler=0
----------
fruit=1     fries=0     salad=0     wings=2     sticks=0     sampler=1
----------
==========

So there we go, all the possible solutions to the poor waiters problem! We know it is all of the solutions because of the “==========” minizinc cryptically places at the end of its output. Of course this problem is easy to brute force with a couple of for loops, there aren’t that many combinations.

But it is a start along what I hope will be a fruitful path.

Update: as people have noted on HN and Reddit I originally screwed up transcribing the price for salad which produced a couple of extra solutions. Fixed that now.

WTForms and Cherrypy 3.1

I have been trialling WTForms, a HTML form input and validation library for Python with a project I am working on. Much to my irritation however WTForms and Cherrypy don’t play nicely in one small area. Using wtforms.FileField with the validator wtforms.validators.Required will always fail.

Cherrypy in 3.1 (but not 3.2 interestingly) uses the Python built in cgi.FieldStorage to handle file uploads. In Python 2.6 and 2.7 at least this is beause the code for cgi.FieldStorage.__nonzero__ [1] only checks self.list and ignores self.file which is where the data is (at least for Cherrypy 3.1). No idea why this is the case, google gives no love on the why of this issue.

There has been an issue raised with the WTForms guys about the same case with Pylons but the long and the short of it is the developers don’t want to add special cases for the various frameworks. Special cases are the bane of a coder’s existence, they bloat otherwise lean and understandable code and cause maintenance nightmares so I do understand.

So what about me? Moving to Cherrypy 3.2 is an option but I don’t want to deal with the worry of that migration right now. An easy fix on my end is to write a custom validator and use it in place of the built in one. But what about the next project and the dozens of other people on my team who have to remember to use the custom validator for file upload? I might need to continue looking at form libraries, at least there is a lot of them!

[1] http://hg.python.org/cpython/file/9f8771e09052/Lib/cgi.py line 602

Github, Rails and Interface Design

All the excitement over the Rails and Github hack reminded me about Scott Meyers (author of Effective C++ et al.) who has talked about the subject of interface design many times (including in the aforementioned book) and his perspective puts the ball squarely in the Rails team’s court.

Let’s make the reasonable assumption that your clients—the people using your interfaces— are trying to do a good job. They’re smart, they’re motivated, they’re conscientious. They’re willing to read some documentation to help them understand the system they’re using. They want things to behave correctly.

That being the case, if they make a mistake when using your interface, it’s your fault. We’re assuming they’re doing their best—they want to succeed. If they fail, it’s because you let them. So, if somebody uses your interface incorrectly, either they’re working hard at it (less likely) or your interface allowed them to do something easy that was not correct (more likely). This puts the shoe on the foot not used to wearing it: it means that responsibility for interface usage errors belongs to the interface designer, not the interface user.

Source: Scott Meyers: The Most Important Design Guideline?

Slow CREATE DATABASE with PostgreSQL on Ubuntu

My new project (Relishment) is an opportunity for me to experiment with different technology. In this case I decided to switch to PostgreSQL from the RDMS I am familiar with MySQL.

So after struggling to learn the user management system in PostgreSQL, “Ident authentication failed” was stumping me for a while, I finally got Django connected to a database. The next step was to run the test suite, and then wait.

And wait I did, what was a two or three second run on MySQL was a fifteen second run on PostgreSQL. Hardly desirable when I can run my tests up to several times a minute when debugging an issue. A quick poke around established it was “CREATE DATABASE” at the beginning of the test run that was eating all time.

Googling around the suggestions were that my shiny new Ubuntu 11.04 install, which uses the ext4 file system, is less than optimal for how PostgreSQL goes about things. It also turned up a solution, turning fsync off, which is dangerous on a live server but for development it doesn’t matter if you lose data in a crash.

The fix works splendidly, speeds at least matching MySQL, if not beating it.

DO NOT DO THIS ON A LIVE SERVER!!

So first you want to edit your PostgreSQL file:

sudo vi /etc/postgresql/8.4/main/postgresql.conf

Find this line:

#fsync = on

And change it to this:

fsync = off

Save and close the config file and restart PostgreSQL:

sudo service postgresql restart

And you should get a nice speed improvement when running your test cases.

How I learned to stop worrying and love Apache

Okay, not quite. Those of you looking at my HTTP headers will see nginx, but who I am to stand in the way of a catchy title and the chance to rip off Stanley Kubrick?

The title does have a point, however: that all too often we hackers worry about optimising our sites rather than addressing the real issues. There are endless discussions about which configuration better serves our site. We pour over the latest benchmarks, drilling down to the last decimal like a spice seller counting saffron threads. Pondering the pros and cons of various servers/proxies/CGIs/languages is enormously enjoyable to hackers, especially those of an entrepreneurial bent. It tickles our technical bone while allowing fantasies of wads of traffic and overheating servers. But it is all a distraction from the real issues which we all too easily allow to slip from prominence.

If we are to accomplish our goals, site optimisation should be low on the priority list. What we should be working on prior to optimisation is endless. I don’t need to list them; your own guilty conscience is doing that for me right now. It is always useful to maintain a perspective on this as more often than not the work we produce isn’t the most elegant or the fastest way of getting what we want. So if you are like me, your first impulse is to refactor, recode and tweak.

Don’t. There will always time later to improve things once you have your first customer, first investor or your mum has had a look. (Seriously, if your mum doesn’t understand your site what are you doing worrying about SQL vs NOSQL?)

There are of course caveats to all of this. This advice mainly applies to early in the life of a project/feature. At some point traffic on that 500ms query is going to take your site down, but won’t that be a nice problem to have? The other obvious exception is bad design. Keep future optimisation in mind so you don’t paint yourself into a corner and nothing but a major rewrite is going to fix things.

With this in mind go and get something done!