CouchDB – check your default settings

While running a pkg upgrade on my FreeBSD box, I noticed the following message scroll by while couchdb was being updated to the latest version:

And just like that my response times for simple fetch requests to couchdb went from ~200ms to ~4ms.  This was something that was seriously bothering me, and had me contemplating dropping couchdb in favor of something else.  If ‘most sites’ will want to do this, why isn’t this the default setting, I wonder?  Oh well.

How to corrupt your git repo with sed

So, some day you’re going to be doing some refactoring that involves changing a single line in multiple files in your code base.  Like, say, the license header in every file in your project (because you actually sat down and read the GPLv3 and decided to revert to GPLv2 before publishing your project on github).

It seems like doing something like this would be a quick and easy solution:

Do not do this.  As I discovered, sed is a powerful tool that also happens to be really adept at wrecking things.  Like all the binary files in your .git directory.  Good luck fixing that.

In my case, no amount of git fsck/git reset/ etc. could bring my repo back.  Since I had a lot of uncommitted changes (I know, bad), I ended up re-initializing my local repo and doing a big, ugly pull/merge/push.  Which, while being a quick and dirty of getting back up and running (and back to work), is only an acceptable solution when you’re working on a solo project (because it does ugly things to the repo history, which tends to irritate other people for some reason).

Next time, I’ll need to come up with a search/replace script that ignores binary files :)

Using C libraries with Go

On my current project, which involves wiki-esque collaborative editing of documents, I decided I wanted to use markdown.  And since I wanted to use markdown (or rather one of the many almost-sort-of-compatible implementations of it), I decided that I might as well use CommonMark, which is attempting to introduce some sanity (standardization).

I’m using Go, and a quick search on google told me there weren’t any good golang implementations of CommonMark.  Since the prospect of implementing the CommonMark spec from scratch in Go seemed rather daunting (or rather, a project in and of itself), I decided to look into using the CommonMark C-language implementation.  Turns out it’s much easier than I expected to call C code from Go programs.  So, I spent a weekend coding up a Go wrapper for the CommonMark C library (which happens to be the reference implementation).

Setup

Go provides a handy utility called cgo to help you deal with C code from within Go.  For a simple example, here’s a signature for a libcmark function that I want to call from Go (exported library functions are in cmark.h):

To call this from Go, I can do the following:

The lines in the comment block under the package declaration are not just comments :)  They’re actually used by cgo.  First, the line:

specifies options to pass to the c linker.  In this case, I’m instructing it to link in libcmark.  You can also specify any CFLAGS, etc. you want.

If you know C, these lines are self-explanatory :)  They’re preprocessor directives telling the compiler to include the stdlib.h and cmark.h files.  Sort of like an import statement… but that’s an oversimplification.

Last but not least, if you’re going to be working with C, you need to import the C package:

C Types

Now, there’s a lot going on in the code itself:

These lines are basically converting our Go data types into C types.  Our C function, cmark_markdown_to_html(), takes two parameters, a string (in C, this is a char*), and an integer containing the length of the string.  Since C doesn’t understand Go types (and vice-versa), we explicitly convert them using the C package’s handy CString and int functions.

The deferred call to C.free deallocates the CString and releases its memory.  This is necessary because C is most definitely NOT a garbage-collected language (one of the reasons for C’s generally excellent performance is that it’s lean and mean), so when dealing with C types, you have to be mindful of what you’re doing or you’ll end up leaking memory.  In C, strings are really just arrays of characters.  Normally, you pass strings around in C by passing a pointer to the first character in the string (char*).  When you call C.CString, it allocates enough memory to hold the characters from your Go string, copies them over, and then gives you a pointer.  This newly allocated memory is NOT handled by Go’s GC for you.  So, when you’re done with a CString, you need to call C.free (incidentally, the free() C function is in the stdlib library, which is why we included stdlib.h).  An easy way to handle this is to put C.free in a defer statement right after you allocate it (assuming you want it gone when the function finishes).  We provide C.free() the raw C pointer using the unsafe package’s unsafe.Pointer type here.

Now, to finish up:

We call our libcmark function via C.cmark_markdown_to_html and feed it our CString and Cint.  It returns a C char*, which we must convert to a Go string in our return statement (the C package also provides functions for converting types from C to Go).  We defer the call to free the htmlString pointer, we won’t need it after our nice Go string is returned :)

C Enums

Another little gotcha when dealing with C code is the fact that C has enums while Go does not :(

For example, in cmark.h, there are several enums, one of which is defined thusly:

So how do we deal with this in Go?  One thing to note is that C enums are represented as integer values, so in the above example CMARK_NO_LIST has a value of 0, CMARK_BULLET_LIST has a value of 1, and so on.  So, if you need to pass an enum type to a C function, you could just give it an integer.  I don’t particularly like that solution, since my memory tends to suck and I don’t want to have to flip between my Go and C code looking up what type of list a ‘1’ represents.

Fortunately, in Go we can approximate an enumerated type by doing this:

Then, when I need a C cmark_list_type, I can do this:

Not too hard.

You can find my commonmark go wrapper here: https://github.com/rhinoman/go-commonmark

CouchDB Authorization

For the last month or so I’ve been working on a little project using CouchDB and Go.  As this is my first time using a NoSQL database, I’ve had to learn to think a little bit differently about how I interact with my data.  One aspect of Couch that had me a little confused at first was how it deals with authentication and authorization.

When creating an application with a traditional RDBMS (with PostgreSQL being my favorite), I was used to handling authentication / authorization myself (usually via a third-party library) at the application level.  While you can of course still do this with CouchDB, when I first came across CouchDB’s auth support, I thought “Great! I can just use Couch for everything!”.  Turns out, I sorta can, but it’s not as simple as I had first assumed.  Oh well :)

Users are stored in a special database called “_users” and creating a new one is as simple as a PUT request (Admin credentials are generally required for creating or deleting users).  For authentication, you can either use Couch’s Session API, or in the case of HTTP Basic, just pass the client’s Basic Authentication headers through to Couch from your application. This was easy enough to implement and seems to work well enough for what I need.

Authorization, however, is where things can get tricky.  CouchDB handles permissions on a per database basis.  It is not possible to restrict access to individual documents.  “Well, that’s useless, guess I’ll have to handle authorization some other way” I thought to myself.  But the answer, if your use-case allows for it, is just to create multiple databases at the level of granularity you need to control access.  I resisted doing this at first, as being used to RDBMS systems, I kept thinking “a database is a big deal and creating dozens, hundreds, or even thousands of them is insane.”  But CouchDB is not Postgres :) and a little research told me that creating a database per user account is actually a very common use case and doesn’t cause any serious issues (if you start getting into thousands of databases you may need to tweak your config to allow a larger number of databases to be open simultaneously).

CouchDB allows you to grant access to databases either on a per-user basis or based on user roles.  I would highly recommend doing role-based authentication as I think that’s more flexible.  To specify database users/roles, you edit a database’s security document (called ‘_security’).  The structure of the security object looks like this:

CouchDB allows two different levels of user: ‘members’ and ‘admins.’  Members can read and write documents in the database while admins can edit design documents as well as the database security document.  One thing to note here is that if no users/roles are specified, CouchDB will default a database to ‘public’ access, so… definitely put something here.  If you’re using roles, in order to grant/revoke access to a user, simply add or remove the desired role to/from the roles array from the user’s document in the _users database.

Now, CouchDB allows you to create and assign any roles you want, but doesn’t enforce any permissions beyond the aforementioned ‘member’ and ‘admin’ types.  So, since members can both read and write documents, how can you create roles for read-only users?  Unfortunately, CouchDB doesn’t make this easy or obvious, but it does provide a way to make it work.

I’ll illustrate how to do this with an example:  Let’s say you have an application that’s a photo sharing service.  Each user has various ‘albums’ that they can either keep private or share with others.  Most of the time, users are going to want to give others ‘read-only’ access to an album.  So, every time a user creates a new album, we create a database for it.  When a database is created, we also make sure to configure it with three roles: one admin role and two member roles, like so:

So, Stan has a vacation album.  Let’s say he wants to grant read access to his friend Bill.  He tells your app to grant read access to Bill and your app turns around and edit’s Bill’s user document by adding “stan_vacation_album:read” to his list of roles.  Great!  Bill has access, but how do we enforce read-only access and prevent Bill from, say, using Photoshop to edit himself into Stan’s vacation photos (Bill is not a good friend) and saving them over the originals in Stan’s database?

You can create a special design document in each database called ‘_auth’.  Within that design document, you can specify a function called validate_doc_update, which will be triggered each time someone tries to save a new revision of a document.  So, all we need to do is check the user’s role and not allow the update to go through if the user doesn’t have the right ‘role.’  For example:

is a quick and dirty function that would do the trick.  Anyone that isn’t an admin or doesn’t have the :write role, won’t be able to write to the database.  Of course you can also do any data validation you like here as well.

Hope this helps :)

I’m also developing a CouchDB driver for Go as I work on my project, feel free to use it however you like.

Two weeks with Go

gophercolor
You can tell this is a gopher because… teeth. [1]
It’s been awhile since my last post, as I’m sure my legions of loyal readers have surely noticed….. surely.

Anyway, since my last post I’ve moved my family from Northern Virginia to the Florida coast to take a new job (I’ve discovered that the DC area does not agree with me), so I’ve been a bit distracted.  However, now that I’m (mostly) settled in and not constantly unpacking, setting up house, fixing things, etc., I have some time to pursue my side projects again.  I had been working on a training tracker tool in Scala, but decided to mothball that one for the time being, as interest from potential customers just wasn’t there.

For now, I’ve decided to explore a few new technologies.  I’ve always wanted to learn about NoSQL databases, but I just never had a product idea that seemed to call for a non-relational database solution.  So, I’ve been messing around with CouchDB… and while I’m just messing around, I decided to play with this new Go programming language I’ve been hearing so much about lately.

First Impressions

I remember hearing about Go several years ago when it was first introduced.  At the time I was spending my spare time playing around with Ruby on Rails, and at first blush Go looked very, very similar to C:

I mean, static typing?! Web development was going to be moving to dynamic languages like Ruby and Python anyway, why waste my time with this thing?

Rabbit
I also thought the Go gopher mascot was reminiscent of that Plan9 rabbit thing, but that’s not important. [1]
Well, I’ve since soured on Ruby on Rails.  Actually, mostly Rails — the Ruby language itself is actually pretty fun to use (though I’ve also soured on dynamic type systems a bit as well), and I could see myself using a lightweight framework like Sinatra to write a REST API.  Problem is, the entire Ruby community seems to revolve around Rails and the veneration of DHH (I have yet to experience a recruiter calling me to talk about “Ruby” jobs, just “Rails” jobs), so I decided to move on to other technologies.

Second, er, impressions

Fast forward a couple of years.  My day job involves writing web services and applications mostly in Java (sometimes Scala and Javascript).  While Scala is a fine language, and I used it as the primary language on my last side project, you will never catch me coding in Java during my free time. Writing Java is tedious, frustrating, and always feels like work.  But it pays the mortgage, it does.

Since I was having trouble generating ideas for a new project, I decided to spend some time learning a new language.  Go was/is gaining in popularity (and version 1.3 of the language was released just a few months ago), so I decided to give it a second look.  I figured devoting a few weeks to Go wouldn’t kill me (probably).  I stepped through the excellent Tour of Go and decided my initial opinion of Go was misplaced.  Here was a delightfully small, concise language with actually decent concurrency support (It’s no Akka, but goroutines and channels took me minutes to grasp as opposed to days/weeks with Akka).

Here’s an excerpt from the concurrency section on Tour of Go:

To briefly explain the fun bits involving concurrency, placing the keyword ‘go’ before a function call (as in go sum() ) fires off a goroutine, which is executed concurrently with the main thread of execution.  Go hides the complexities of managing threads, pools, etc., from you in a way that just works.

Go also allows you to send data between goroutines using channels.  The line:

writes the value of sum to channel c.  You can see that the main function reads from that channel (actually, it reads from it twice) with the line:

incidentally, this line of code blocks main() until two values from channel c are received, so you do have to be mindful of what you’re doing when using channels :)

If you’ve ever written concurrent code (or had to debug a multi-threaded application), this probably seems too easy.  I’ll admit it’s not as powerful as Akka actors.  Though – to explain Akka actors to somebody, its concomitant messaging scheme, dispatching, etc., I’d probably need several hours and a whiteboard (or a chalkboard, or a sidewalk).

This kind of simplicity and well thought out language design is why I think Go is going to become very popular in the near future.  It really does strike me as C, modernized.  Unlike Scala, Erlang, etc., here’s a language designed for modern network programming, with pretty darn good concurrency support, that isn’t a bear (or ocelot) to learn.  The most common (admittedly legitimate) concern expressed to me by my bosses whenever I suggested using, say, Scala is the lack of readily available Scala devs out there: “If I need a new Java developer, I can find one pretty much on demand.  Scala developers are hard to find, AND more expensive.”  And even though a smart developer can learn Scala, it’s such an advanced (and feature-rich) language that it’s going to take a considerable amount of time before a new Scala dev can be productive (especially if said dev doesn’t have experience with functional programming).  But with Go — you really ought to be able to pick it up in a few days.  And it has the advantage of not being Java, which is a good thing, always.

Conclusion

I do believe I’ll stick with getting better at programming in Go for a little while and will probably make use of it whenever I can :)

Oh, and I figured a good project to learn more about Go and CouchDB would be… a Go driver for CouchDB :)  I know there are already several out there (some good, some … not), but I figured this would be a good vehicle for learning Go (and couch) and so far it has been — see my repo here.

The Gopher mascot and logo were designed by Renée French, who also designed Glenda, the Plan 9 bunny. The logo and mascot are covered by the Creative Commons Attribution 3.0 license.[1]; permission

Testing Stripe transactions with Scala

I’m nearing completion on my latest project, Training Sleuth, and I’ve once again decided to use Stripe as my payment processor.  I used Stripe on my previous project, Rhino SchoolTracker, and have absolutely nothing negative to say about it :).

While with Rhino SchoolTracker I tested my stripe payments manually, I decided I wanted to write some automated integration tests this time around (particularly since I’m using Scala for this project which has a decidedly slower and more unwieldy edit-compile-test loop than Ruby does).

The Stripe API is actually split into two parts, a client side javascript library (stripe.js), and a server-side API available in several of the popular server-side web languages (php, java, ruby, python, and javascript (for node.js) last time I checked).  Anyway, the basic concept goes like this:

  1. You serve an HTML payment form (with fields for CC number, etc.) from your server to the client.
  2. When the client submits the form, instead of sending it to your sever, you use stripe.js to grab the form data and sends it to Stripe’s servers, which will validate the card and return a unique token (or an error message in case of invalid/expired credit cards, etc.) via an ajax request.
  3. Once you have the stripe card token, you send it up to your server, do whatever processing you need to do on your end (grant access to your site, record an order, etc.), and then submit a charge to Stripe using the Stripe API.

The key feature of all of this is that the user’s credit card information never touches your server, so you don’t need to worry about PCI compliance and all the headaches that go with it (yes, stripe does require you to use SSL, and despite their best efforts it is possible to mis-configure your server in such a way as to expose user payment info if you don’t know what you’re doing).

Now Stripe offers a test mode, which is what we’ll be using here, with a variety of test card numbers to simulate various conditions (successful charge, declined card, expired card, etc.).  The main problem I ran into writing automated tests in Scala was that I needed to use stripe.js to generate a card token before I could interact with the server-side (Java) API.

Enter Rhino, a Javascript interpreter for Java.  Using Rhino, I was able to whip up some quick-and-dirty javascript to generate a stripe token and call it from Scala.  Of course, Rhino alone wasn’t enough — I also needed to bring in Envjs and create some basic HTML to simulate a browser environment for stripe.js.

First, here’s my stripetest.js:

And you need to provide some basic HTML, I created a file called ‘stripetest.html’ which merely contained this:

Simple, but this was enough to get things working.

I dropped these files (along with env.rhino.js which I obtained from the Envjs website) into my test/resources folder.

With all of that in place, I was able to write some specs2 tests:

There you go, kind of painful to get set up, but definitely nice to have.

Backbone and Full Calendar

My current project, Training Sleuth, involves scheduling and keeping track of various training events.  While not strictly necessary, I’m fairly certain potential users are going to want to view all of the events they’ve painstakingly entered into the app in a nice familiar calendar format.

I’m using Backbone for my application’s front-end (which interacts with a REST backend written in Scala), and unfortunately I couldn’t find a javascript calendar library that was designed to work with backbone out of the box.  I briefly toyed with the idea of creating my own, but quickly rejected the idea when I realized such a thing would be a project unto itself, and would just serve as a distraction to getting my current app finished.  The next best thing to Backbone-Calendar, of course, would be a well-designed, highly configurable calendar that’s flexible enough to be used with just about anything.  Full Calendar is just such a library, and also happens to be widely used (and surprisingly well documented for an open source javascript library, perhaps that’s why it’s widely used).

FC is based on ‘event‘ objects, and provides several different mechanisms for feeding events to the calendar.  For integrating FC’s event objects with my Backbone app, I used FC’s custom events function to translate between Backbone models and FC event objects.  The events method is called whenever FC needs more event data (for example, when the user is paging through months).

Here’s one of my view methods that renders a calendar, and defines my ‘events’ method (warning for Javascript purists: I use Coffeescript, and yes, I realize this means we can’t be friends):

EventList is a fairly vanilla Backbone Collection that fetches event data (training session date/time, etc.) from my back-end REST service (the data object in the fetch statement defines the date/time range for which I’m requesting event data).  In all, it was surprisingly… easy.  I overrode the default CSS a bit to match the rest of my UI (I’m using Bootstrap 3), and the result doesn’t look half bad (IMHO):

FC_screenshot

Graham’s Scan in Scala

Sometimes my job throws an interesting problem my way.  This week I was presented with a very odd geometry problem :)

I needed to generate KML files from geographic data and one of my requirements was to represent certain geographic areas as polygons, the vertices of which would be supplied (along with the rest of the data) by another OSGi service running elsewhere.  Seems fairly straightforward — In KML, the vertices of a polygon are usually specified as follows:

The coordinates tag requires at least four longitude/latitude/altitude triples to be specified, with the last coordinate being the same as the first.   Here is where the problem comes in — The order in which these coordinates are specified matters (they must be specified in counter-clockwise order).  To mix up the order of the coordinates would have unpredictable results (e.g. crazy geometric shapes) when the data is later displayed via Google Earth (or some other application that supports KML files).  However, the area vertices are indeed fed to my KML generator in no particular order (and the services providing the data cannot be changed to guarantee a particular ordering).

So… how do I put the points in order?  “Surely this is a solved problem.”  I thought, turning to the all-knowing internet.  A bit of searching turned up an algorithm called Graham’s Scan.  Basically, this algorithm takes a bag of random coordinates and generates a convex hull with vertices defined in counter-clockwise order (Note: This may not be suitable if you’re trying to faithfully recreate complex geometries, fortunately I’m mostly concerned with rectangular areas).  Roughly, the algorithm works as follows:

  1. Find the coordinate with the lowest y-value.
  2. Sort the remaining points by the polar angle between the line defined by the current point and the point found in step 1, and the x-axis.
  3. Go through the list, evaluating 3 points at a time (you’re concerned with the angle of the turn made by the two resulting line segments).  Eliminate any non counter-clockwise turns (i.e., non-convex corners).

I found several example implementations for this algorithm in various languages: C++, Java, etc.  Since I’m coding this up in Scala, I wasn’t too happy with any of those, and I couldn’t find an example in Scala to rip off draw inspiration from.  However, I did manage to find an implementation in Haskell which I used as a rough guide.  Anyway, here’s my attempt at Graham’s Scan in Scala:

I think you’ll find that’s a bit shorter than some of the imperative language implementations out there :)

Akka and Scalatra

On my current project, I’ve been using Akka to handle the Service Layer of my application while using Scalatra for my REST controllers.  This combination works quite well, though it took me a little bit of time to figure out how to integrate Scalatra and Akka.  The examples presented on the Scalatra website didn’t exactly work for me (it’s possible they’ve since been fixed).  But after some studying of the Akka and Scalatra API documentation and some good ol’ fashion trial-and-error, I got to something that worked.  First, Akka actors are set up and initialized in ScalatraBootstrap.scala thusly:

I’m initializing each actor with a router (in this case, a SmallestMailboxRouter, though others, such as a RoundRobinRouter are also available).  The router will create up to 10 child actors and route incoming messages to the actor with the least number of ‘messages’ in its inbox.

The Scalatra controller responds to a request for a resource by sending a message to the appropriate actor (I’m using one Actor type per resource) using a Future and returning the result.  Scalatra provides an AsyncResult construct that helps here:

My actor here happens to return an ‘Either’ type in response to a request.  By convention, a ‘Left’ response indicates an error condition (in this case a tuple containing the HTTP error code to return and a message), and a ‘Right’ response indicates success and contains the requested data (a ‘User’ object). The actor itself looks like this:

The message types are implemented as case classes, and enter the actor in the ‘receive’ method, which passes each message to a handler and returns the result to the message’s ‘sender’ (the controller).

Spring, OSGi, and dropped services

My current job has had me working with a number of technologies with which I was completely unfamiliar when I started. The Spring Framework and OSGi are two such technologies that I believe I’ve become fairly comfortable with, yet still manage to throw something new and/or bizarre at me every once in a while.

My latest issue involved convincing Spring to register an OSGi service. Seems simple enough, Spring offers pretty good OSGi support, and one can usally register a service by doing something like this:

Pretty straightforward.  It creates a bean called theService, then uses that bean to publish a service.  Spring automagically takes care of registering the new service in your container’s OSGi service registry (in my case, I’m running Virgo).  I’ve done something similar to this many times with no problem.  However, this time I needed to do something slightly more complex:

Now for the problem… my “beanB” service was NOT being published.  I tried everything I could think of, switched my beans to constructor injection (no idea why that should work), verified that both my “beanA” and “beanB” beans were being created, etc., and everything looked fine… but the service wasn’t showing up!  Checking the logs… I found nothing.  No exceptions, no warnings, just the lack of the usual log message Spring generates when it publishes a new service.

So what was going on?  By using the good ‘ol “comment things out until it works, then reverse the process until it breaks” method of debugging, I identified the first line of my services.xml file, the osgi:reference tag, as the source of the problem.

The osgi:reference tag deontes a service that is consumed, rather than published.  BeanA uses that service, and this is where the problem comes in.  That otherService was something I had yet to implement.  It wasn’t strictly necessary for “beanA” to work, and beanA was being initialized just fine, but the lack of “someOtherService” in the OSGi service registry was triggering an obscure feature of Spring (or at least it was obscure to me, it’s entirely possible I’m the last Spring/OSGi user on Earth to learn about this).

I found a description of the problem, and its solution, in the spring documentation:

7.5. Relationship Between The Service Exporter And Service Importer
An exported service may depend, either directly or indirectly, on other services in order to perform its function. If one of these services is considered a mandatory dependency (has cardinality 1..x) and the dependency can no longer be satisfied (because the backing service has gone away and there is no suitable replacement available) then the exported service that depends on it will be automatically unregistered from the service registry – meaning that it is no longer available to clients. If the mandatory dependency becomes satisfied once more (by registration of a suitable service), then the exported service will be re-registered in the service registry.
In other words, it doesn’t matter that my beanB service didn’t depend directly on someOtherService, the fact that an unpublished service was somewhere in beanB’s chain of dependencies meant that Spring was simply going to refuse to publish beanB as an OSGI service.  It’s easy to imagine a rather hilarious cascade of dropping services depending on how you have beans and OSGi services wired together.
The solution lies in an option to the osgi:reference tag, namely “cardinality.”  From the Spring docs:
The cardinality attribute is used to specify whether or not a matching service is required at all times. A cardinality value of 1..1 (the default) indicates that a matching service must always be available. A cardinality value of 0..1 indicates that a matching service is not required at all times (see section 4.2.1.6 for more details).

So, changing the first line in my xml to this:

fixed the issue.  In my opinion the default cardinality should be “0..1”, rather than “1..1”.  At the very least, if a mandatory service reference cannot be satisfied, some sort of conspicuous error log message ought to be generated.  But anyway, if you were having a similar problem and found your way here, hopefully I just saved you several hours of pain, anguish, and misery.