Microservices: But What About Foreign Keys & Transactions?

One of the most impressively sized books on my bookshelf is Introduction to Database Systems, an artefact of my university years. It had an equally “impressive” price tag, or should that be “punishing” given my student wage? I regarded this expensive and weighty tome to contain unquestionable wisdom from on high, and in the very early 2000s it practically was. I had taught myself SQL in the past, but concepts such as normalisation, ACID and referential integrity were new and became to me immutable aspects of a good database.

From where we sit now, basking in the light of dozens of NoSQL database technologies (a terrible name, but we’re stuck with it), this is obviously not true. The hegemony of RDMS is over, document databases, graph databases and wide column stores are all widely used and acceptable options. With this revolution came the discarding of many of these “immutable” aspects, the argument being that by making this trade off an advantage is conferred. People make these same trade offs in RDMS schemas all the time, de-normalisation for example might be considered a cowboy move, but its read performance advantage is indisputable.

So what does all this have to do with microservices? Well trade offs have to be made, and this becomes obvious fairly early on, often when a developer is first introduced to the concept. With a distributed finely grained architecture spread across different databases and technologies, transactions won’t always be an option, neither will foreign keys. With the speed and agility that a microservice architecture provides this is the cost.

This is a scary idea, especially for those of us weaned on SQL, what will become of our data? First of all transactions. Is there a reason your whole system needs to be able to run in a single transaction? More often a web call will generate several data calls, but many of them are read only, and most of them touching only a few tables or rows. The read only calls probably can run outside the transaction and the others likely centre around some domain concept. That domain concept in turn probably makes sense to be collected into a microservice, which can then run a transaction. This won’t always be the case and hard decisions are inevitable somewhere along the line.

Giving up referential integrity is an easier task as it comes with a big reward. Removing foreign keys and replacing them with API calls means the owner of the data is free to change their internals. As long as the contract with the consumer of the API is obeyed then the owner can change as fast as requirements change, without the consumer having to also be updated. Databases aren’t the only line of defence for referential integrity, most applications we write already deal with this, often checks happen in a few layers as data travels through our systems. Without the database enforcing referential integrity we’re relying on our services and applications behaving correctly manner, something we already do to prevent errors in any case.

Everything old is new again, we aren’t dealing with new concepts, distributed systems have always had to face these trade offs. A micro-service architecture makes these trade offs more visible and explicit creating a tension developers must address. Even if a team chooses a more course grained approach, they’ve evaluated what is going to work for their project best and this can only be a good thing.

Design and Implementation of Microservices Workshop

Today I attended a full day workshop presented by Sam Newman and Scott Shaw on micro-services. Most people seemed pretty up with the why of micro-services, if you’re not James Lewis and Martin Fowler have talked at length on the subject. The workshop covered a wide range of topics and gave an excellent overview of the how and what of micro-services, something which is still lacking in literature (but is coming, see below).

I have seen Sam talk before, in fact his talk at Yow! last year was my first introduction to micro-services and I have read an early release copy of his book Building Microservices. A lot of the concepts were very familiar to me, however being a workshop meant that there was significant discussion around those ideas. This gave me a great deal of insight into the thinking around micro-services and how people have dealt with the trade offs and choices they present.

It was especially nice to have confirmation that some of the ways I am heading with Biarri’s architecture are tried and tested paths. This goes doubly for the contract testing library Pact (and Pact.rb) which a good friend of mine has been encouraging me to port to Python for Biarri’s use.

A few other books, videos and tools popped up in the talk which I have noted for further investigation:

  • Domain-Driven Design: Tackling Complexity in the Heart of Software by Eric Evans
  • Implementing Domain-Driven Design by Vaughn Vernon
  • Apache Zookeeper, useful for managing services, something like consul.io which I discovered last week.
  • Postel’s law: Be conservative in what you do, be liberal in what you accept from others (often reworded as “Be conservative in what you send, be liberal in what you accept”). A useful principle for micro-service communication.
  • Hystrix, a “latency and fault tolerance library” by Netflix. Probably overkill for Biarri’s architecture but it could come in handy in the future.
  • 12factor.net, SaaS app principles which I also discovered last week but re-affirms it’s usefulness.
  • A video by Stuart Halloway (I think) dealing with real time data and versioning. I did a google and couldn’t find a link so I am going to chase up Sam about it.

In all well worth my time.

Infracoders June Meetup

I’ve set myself the goal of attending one software meetup per week. I enjoy attending conferences, and meetups are like micro-conferences. You meet people, you learn something and often there is beer and food at the end like this evening.

This evening I attended Infracoders (http://www.meetup.com/Infrastructure-Coders/), a devops group focused on tools that make devops easier and more fun. Given my current focus on Biarri’s overhaul of infrastructure and development workflows these sorts of tools are on my mind a fair bit.

The presentations were interesting. First up was Alexey Kotlyarov and Ross Williamson from Infoxchange talking about some Docker tools called Pallet and Forklift (https://github.com/infoxchange/docker-forklift). Their talk was rather information dense and I found it a little hard to follow. From what I understood with my small knowledge of Docker is they automate some common development and deployment tasks in a platform/language agnostic manner. They look like they have a similar stack and problems to Biarri in many ways with lots of projects and a fragmented environment, so I will be looking into it deeper.

They also linked to some interesting things in their talk which are worth looking at:

  • Zato (https://zato.io/), a python ESB and application server. I am not clear on what its value proposition is yet but it has something to do with managing SOA which is something I am spending a lot of time thinking about right now.
  • The twelve-factor app (http://12factor.net/), a set of guidelines or principles for building SaaS applications. I’d not seen this before and at first glance looks like good reading.
  • Serf (http://serfdom.io), cluster management.

The second presentation was by Colin Panisset from REA Group about Credulous (http://credulous.io/), a AWS credential management system. He was an excellent and amusing speaker but Credulous solves a problem we don’t have and unlikely to have in the medium term at Biarri. It does look like a good solution if you have a large team with access to your AWS infrastructure. His lack of usage of GPG as a tool to solve the key sharing and encryption part of Credulous’ bothered me a little, but his criticisms of the installation and setup of GPG weren’t without merit. He did highlight some aspects of AWS security that I had not considered and will discuss with my co-workers.

The evening concluded with free beer and dumplings which was nice. I will certainly consider attending again, there seems to be significant overlap with the Devops Melbourne meetup, though perhaps their focuses are different.

A CP Solution for XKCD NP Complete Restaurant Order

I’ve been messing around with Constraint Programming (CP) the past week. A few people at work have tried it out on some real world problems lately but it didn’t seem to stand up when given a lot of data and variables. This seemed sad as the declarative nature of CP attracts me and it strikes me there must be a set problems that it could be used for and it deserved a look.

The first CP model I wrote by myself is stupidly simple but since my non techie fiance understood the code I figure it is a good example. The problem is as described by XKCD below, select some appetizers from the menu so that the total cost adds up to $15.05.

NP Complete

I modeled the problem in Minizinc, a declarative language for modeling CP problems. I am just learning so if you know Minizinc and I’ve done something dumb don’t judge me too harshly.

Firstly we declare a bunch of variables that the solver needs to find values for. We provide the solver a domain in which the variables must lie, zero to ten for all variables in this case. Each of these variables represents the number of times as part of a solution we buy an item to add up to $15.05.

var 0..10: fruit;
var 0..10: fries;
var 0..10: salad;
var 0..10: wings;
var 0..10: sticks;
var 0..10: sampler;

Then we declare a constraint, something that the solver must meet to solve the problem. And in this case we say that a sum of the cost of the items (converted to cents) multiplied by the number of items in the solution must equal the required 1505 cents (that English version could be taken a couple of ways, the maths below makes better sense).

constraint fruit*215 + fries*275 + salad*335 + wings*355 + sticks*420 + sampler*580 == 1505;

We tell the solver to solve to satisfy.

solve satisfy;

And provide a format in which to output the solution.

output ["fruit=", show(fruit), "\t fries=", show(fries), 
        "\t salad=", show(salad), "\t wings=", show(wings),
        "\t sticks=", show(sticks), "\t sampler=", show(sampler)];

Running the model gives us:

$ minizinc --all-solutions xkcd.mzn
fruit=7     fries=0     salad=0     wings=0     sticks=0     sampler=0
----------
fruit=1     fries=0     salad=0     wings=2     sticks=0     sampler=1
----------
==========

So there we go, all the possible solutions to the poor waiters problem! We know it is all of the solutions because of the “==========” minizinc cryptically places at the end of its output. Of course this problem is easy to brute force with a couple of for loops, there aren’t that many combinations.

But it is a start along what I hope will be a fruitful path.

Update: as people have noted on HN and Reddit I originally screwed up transcribing the price for salad which produced a couple of extra solutions. Fixed that now.

WTForms and Cherrypy 3.1

I have been trialling WTForms, a HTML form input and validation library for Python with a project I am working on. Much to my irritation however WTForms and Cherrypy don’t play nicely in one small area. Using wtforms.FileField with the validator wtforms.validators.Required will always fail.

Cherrypy in 3.1 (but not 3.2 interestingly) uses the Python built in cgi.FieldStorage to handle file uploads. In Python 2.6 and 2.7 at least this is beause the code for cgi.FieldStorage.__nonzero__ [1] only checks self.list and ignores self.file which is where the data is (at least for Cherrypy 3.1). No idea why this is the case, google gives no love on the why of this issue.

There has been an issue raised with the WTForms guys about the same case with Pylons but the long and the short of it is the developers don’t want to add special cases for the various frameworks. Special cases are the bane of a coder’s existence, they bloat otherwise lean and understandable code and cause maintenance nightmares so I do understand.

So what about me? Moving to Cherrypy 3.2 is an option but I don’t want to deal with the worry of that migration right now. An easy fix on my end is to write a custom validator and use it in place of the built in one. But what about the next project and the dozens of other people on my team who have to remember to use the custom validator for file upload? I might need to continue looking at form libraries, at least there is a lot of them!

[1] http://hg.python.org/cpython/file/9f8771e09052/Lib/cgi.py line 602

Github, Rails and Interface Design

All the excitement over the Rails and Github hack reminded me about Scott Meyers (author of Effective C++ et al.) who has talked about the subject of interface design many times (including in the aforementioned book) and his perspective puts the ball squarely in the Rails team’s court.

Let’s make the reasonable assumption that your clients—the people using your interfaces— are trying to do a good job. They’re smart, they’re motivated, they’re conscientious. They’re willing to read some documentation to help them understand the system they’re using. They want things to behave correctly.

That being the case, if they make a mistake when using your interface, it’s your fault. We’re assuming they’re doing their best—they want to succeed. If they fail, it’s because you let them. So, if somebody uses your interface incorrectly, either they’re working hard at it (less likely) or your interface allowed them to do something easy that was not correct (more likely). This puts the shoe on the foot not used to wearing it: it means that responsibility for interface usage errors belongs to the interface designer, not the interface user.

Source: Scott Meyers: The Most Important Design Guideline?

Slow CREATE DATABASE with PostgreSQL on Ubuntu

My new project (Relishment) is an opportunity for me to experiment with different technology. In this case I decided to switch to PostgreSQL from the RDMS I am familiar with MySQL.

So after struggling to learn the user management system in PostgreSQL, “Ident authentication failed” was stumping me for a while, I finally got Django connected to a database. The next step was to run the test suite, and then wait.

And wait I did, what was a two or three second run on MySQL was a fifteen second run on PostgreSQL. Hardly desirable when I can run my tests up to several times a minute when debugging an issue. A quick poke around established it was “CREATE DATABASE” at the beginning of the test run that was eating all time.

Googling around the suggestions were that my shiny new Ubuntu 11.04 install, which uses the ext4 file system, is less than optimal for how PostgreSQL goes about things. It also turned up a solution, turning fsync off, which is dangerous on a live server but for development it doesn’t matter if you lose data in a crash.

The fix works splendidly, speeds at least matching MySQL, if not beating it.

DO NOT DO THIS ON A LIVE SERVER!!

So first you want to edit your PostgreSQL file:

sudo vi /etc/postgresql/8.4/main/postgresql.conf

Find this line:

#fsync = on

And change it to this:

fsync = off

Save and close the config file and restart PostgreSQL:

sudo service postgresql restart

And you should get a nice speed improvement when running your test cases.

How I learned to stop worrying and love Apache

Okay, not quite. Those of you looking at my HTTP headers will see nginx, but who I am to stand in the way of a catchy title and the chance to rip off Stanley Kubrick?

The title does have a point, however: that all too often we hackers worry about optimising our sites rather than addressing the real issues. There are endless discussions about which configuration better serves our site. We pour over the latest benchmarks, drilling down to the last decimal like a spice seller counting saffron threads. Pondering the pros and cons of various servers/proxies/CGIs/languages is enormously enjoyable to hackers, especially those of an entrepreneurial bent. It tickles our technical bone while allowing fantasies of wads of traffic and overheating servers. But it is all a distraction from the real issues which we all too easily allow to slip from prominence.

If we are to accomplish our goals, site optimisation should be low on the priority list. What we should be working on prior to optimisation is endless. I don’t need to list them; your own guilty conscience is doing that for me right now. It is always useful to maintain a perspective on this as more often than not the work we produce isn’t the most elegant or the fastest way of getting what we want. So if you are like me, your first impulse is to refactor, recode and tweak.

Don’t. There will always time later to improve things once you have your first customer, first investor or your mum has had a look. (Seriously, if your mum doesn’t understand your site what are you doing worrying about SQL vs NOSQL?)

There are of course caveats to all of this. This advice mainly applies to early in the life of a project/feature. At some point traffic on that 500ms query is going to take your site down, but won’t that be a nice problem to have? The other obvious exception is bad design. Keep future optimisation in mind so you don’t paint yourself into a corner and nothing but a major rewrite is going to fix things.

With this in mind go and get something done!

Django and Dreampie

As would be obvious by now I am a dedicated djangonaut (cheesy name I know) and I’ve been using django long enough now that using the ORM (Object Relational Mapping) is becoming second nature. Using SQL for anything but database gymnastics is somewhat annoying as the data isn’t as accessible to python as it is with an ORM. So when I need to get a quick insight into some data or prototype some code I want to be able to use the ORM instead of SQL.

I’ve also fallen in love with dreampie, which is an interactive python shell on steroids. It allows you to type and edit multi-line pieces of python code which you can then run in the inbuilt python shell. It has a fast clean interface and autocompletes attributes from the interpreter so you can get some hacking done quick.

Of course I want to be able to do quick django hacking in dreampie and I worked out a simple way of doing this.

import sys
sys.path.append('/path/to/your/project/')
import settings
from django.core.management import setup_environ
setup_environ(settings)
from a_django_app.models import AModel

Pasting this into dreampie (or loading it from the nice history functionality dreampie provides) sets up the shell with everything it needs to access the ORM (or anything else you use in django for that matter. The first two lines add the project’s path to the list of paths python uses to search for modules and packages. The next imports the project’s settings and activates those settings using the django function setup_environ. After that you can import app models and query away.

You can also use the same snippet of code at the top of standalone scripts to allow them to access the project’s django environment which is useful though it is normally advisable to add a management command instead.

Django Site Admin Emails

There is a neat feature in default behavior of Django. If while rendering a view it throws an uncaught exception it emails all the sysadmins in it’s settings file. This simple little feature is great on several levels.

  1. Repeatedly getting emails about an issue with often enough information to diagnose and fix the problem means you’re less likely to put up with broken windows.
  2. When something goes wrong you’re much more likely to find out about it, as monitoring is often one of the last things to be done on a project.

So it is a warm and fuzzy safety net that reassures you with an empty inbox, I really like it when I come in on Monday morning and I don’t get a single email telling me something barfed over the weekend.

And as an added bonus Django has this wonderful philosophy about exposing the pixie dust it uses to do it’s magic. All it takes is a call to mail_admins():

mail_admins("Made a sale","We just sold a pony for $%d" % pony_price)

And you’re alerting everyone when something good happens. This is great with new low volume projects where every sale is a momentous event. Of course you can use this for alerting the team about errors as well. There is also mail_managers() so you can setup the non technicals to receive relevant alerts with the same amount of effort.