Using C libraries with Go

On my current project, which involves wiki-esque collaborative editing of documents, I decided I wanted to use markdown.  And since I wanted to use markdown (or rather one of the many almost-sort-of-compatible implementations of it), I decided that I might as well use CommonMark, which is attempting to introduce some sanity (standardization).

I’m using Go, and a quick search on google told me there weren’t any good golang implementations of CommonMark.  Since the prospect of implementing the CommonMark spec from scratch in Go seemed rather daunting (or rather, a project in and of itself), I decided to look into using the CommonMark C-language implementation.  Turns out it’s much easier than I expected to call C code from Go programs.  So, I spent a weekend coding up a Go wrapper for the CommonMark C library (which happens to be the reference implementation).

Setup

Go provides a handy utility called cgo to help you deal with C code from within Go.  For a simple example, here’s a signature for a libcmark function that I want to call from Go (exported library functions are in cmark.h):

CMARK_EXPORT
char *cmark_markdown_to_html(const char *text, int len);

To call this from Go, I can do the following:

package commonmark

/*
#cgo LDFLAGS: -lcmark
#include <stdlib.h>
#include "cmark.h"
*/
import "C"
import "unsafe"

func Md2Html(mdtext string) string {
	mdCstr := C.CString(mdtext)
	strLen := C.int(len(mdtext))
	defer C.free(unsafe.Pointer(mdCstr))
	htmlString := C.cmark_markdown_to_html(mdCstr, strLen)
	defer C.free(unsafe.Pointer(htmlString))
	return C.GoString(htmlString)
}

The lines in the comment block under the package declaration are not just comments 🙂  They’re actually used by cgo.  First, the line:

#cgo LDFLAGS: -lcmark

specifies options to pass to the c linker.  In this case, I’m instructing it to link in libcmark.  You can also specify any CFLAGS, etc. you want.

#include <stdlib.h>
#include "cmark.h"

If you know C, these lines are self-explanatory 🙂  They’re preprocessor directives telling the compiler to include the stdlib.h and cmark.h files.  Sort of like an import statement… but that’s an oversimplification.

Last but not least, if you’re going to be working with C, you need to import the C package:

import "C"

C Types

Now, there’s a lot going on in the code itself:

	mdCstr := C.CString(mdtext)
	strLen := C.int(len(mdtext))
	defer C.free(unsafe.Pointer(mdCstr))

These lines are basically converting our Go data types into C types.  Our C function, cmark_markdown_to_html(), takes two parameters, a string (in C, this is a char*), and an integer containing the length of the string.  Since C doesn’t understand Go types (and vice-versa), we explicitly convert them using the C package’s handy CString and int functions.

The deferred call to C.free deallocates the CString and releases its memory.  This is necessary because C is most definitely NOT a garbage-collected language (one of the reasons for C’s generally excellent performance is that it’s lean and mean), so when dealing with C types, you have to be mindful of what you’re doing or you’ll end up leaking memory.  In C, strings are really just arrays of characters.  Normally, you pass strings around in C by passing a pointer to the first character in the string (char*).  When you call C.CString, it allocates enough memory to hold the characters from your Go string, copies them over, and then gives you a pointer.  This newly allocated memory is NOT handled by Go’s GC for you.  So, when you’re done with a CString, you need to call C.free (incidentally, the free() C function is in the stdlib library, which is why we included stdlib.h).  An easy way to handle this is to put C.free in a defer statement right after you allocate it (assuming you want it gone when the function finishes).  We provide C.free() the raw C pointer using the unsafe package’s unsafe.Pointer type here.

Now, to finish up:

htmlString := C.cmark_markdown_to_html(mdCstr, strLen)
defer C.free(unsafe.Pointer(htmlString))
return C.GoString(htmlString)

We call our libcmark function via C.cmark_markdown_to_html and feed it our CString and Cint.  It returns a C char*, which we must convert to a Go string in our return statement (the C package also provides functions for converting types from C to Go).  We defer the call to free the htmlString pointer, we won’t need it after our nice Go string is returned 🙂

C Enums

Another little gotcha when dealing with C code is the fact that C has enums while Go does not 😦

For example, in cmark.h, there are several enums, one of which is defined thusly:

typedef enum {
	CMARK_NO_LIST,
	CMARK_BULLET_LIST,
	CMARK_ORDERED_LIST
}  cmark_list_type;

So how do we deal with this in Go?  One thing to note is that C enums are represented as integer values, so in the above example CMARK_NO_LIST has a value of 0, CMARK_BULLET_LIST has a value of 1, and so on.  So, if you need to pass an enum type to a C function, you could just give it an integer.  I don’t particularly like that solution, since my memory tends to suck and I don’t want to have to flip between my Go and C code looking up what type of list a ‘1’ represents.

Fortunately, in Go we can approximate an enumerated type by doing this:

type ListType int

const (
	CMARK_NO_LIST ListType = iota
	CMARK_BULLET_LIST
	CMARK_ORDERED_LIST
)

Then, when I need a C cmark_list_type, I can do this:

lt := CMARK_BULLET_LIST
C.cmark_list_type(lt)

Not too hard.

You can find my commonmark go wrapper here: https://github.com/rhinoman/go-commonmark

CouchDB Authorization

For the last month or so I’ve been working on a little project using CouchDB and Go.  As this is my first time using a NoSQL database, I’ve had to learn to think a little bit differently about how I interact with my data.  One aspect of Couch that had me a little confused at first was how it deals with authentication and authorization.

When creating an application with a traditional RDBMS (with PostgreSQL being my favorite), I was used to handling authentication / authorization myself (usually via a third-party library) at the application level.  While you can of course still do this with CouchDB, when I first came across CouchDB’s auth support, I thought “Great! I can just use Couch for everything!”.  Turns out, I sorta can, but it’s not as simple as I had first assumed.  Oh well 🙂

Users are stored in a special database called “_users” and creating a new one is as simple as a PUT request (Admin credentials are generally required for creating or deleting users).  For authentication, you can either use Couch’s Session API, or in the case of HTTP Basic, just pass the client’s Basic Authentication headers through to Couch from your application. This was easy enough to implement and seems to work well enough for what I need.

Authorization, however, is where things can get tricky.  CouchDB handles permissions on a per database basis.  It is not possible to restrict access to individual documents.  “Well, that’s useless, guess I’ll have to handle authorization some other way” I thought to myself.  But the answer, if your use-case allows for it, is just to create multiple databases at the level of granularity you need to control access.  I resisted doing this at first, as being used to RDBMS systems, I kept thinking “a database is a big deal and creating dozens, hundreds, or even thousands of them is insane.”  But CouchDB is not Postgres 🙂 and a little research told me that creating a database per user account is actually a very common use case and doesn’t cause any serious issues (if you start getting into thousands of databases you may need to tweak your config to allow a larger number of databases to be open simultaneously).

CouchDB allows you to grant access to databases either on a per-user basis or based on user roles.  I would highly recommend doing role-based authentication as I think that’s more flexible.  To specify database users/roles, you edit a database’s security document (called ‘_security’).  The structure of the security object looks like this:

{
    "admins": {
        "names": [
            "superuser"
        ],
        "roles": [
            "admins"
        ]
    },
    "members": {
        "names": [
            "user1",
            "user2"
        ],
        "roles": [
            "developers"
        ]
    }
}

CouchDB allows two different levels of user: ‘members’ and ‘admins.’  Members can read and write documents in the database while admins can edit design documents as well as the database security document.  One thing to note here is that if no users/roles are specified, CouchDB will default a database to ‘public’ access, so… definitely put something here.  If you’re using roles, in order to grant/revoke access to a user, simply add or remove the desired role to/from the roles array from the user’s document in the _users database.

Now, CouchDB allows you to create and assign any roles you want, but doesn’t enforce any permissions beyond the aforementioned ‘member’ and ‘admin’ types.  So, since members can both read and write documents, how can you create roles for read-only users?  Unfortunately, CouchDB doesn’t make this easy or obvious, but it does provide a way to make it work.

I’ll illustrate how to do this with an example:  Let’s say you have an application that’s a photo sharing service.  Each user has various ‘albums’ that they can either keep private or share with others.  Most of the time, users are going to want to give others ‘read-only’ access to an album.  So, every time a user creates a new album, we create a database for it.  When a database is created, we also make sure to configure it with three roles: one admin role and two member roles, like so:

{
    "admins": {
        "names": [],
        "roles": [
            "stan_vacation_album:admin"
        ]
    },
    "members": {
        "names": [],
        "roles": [
            "stan_vacation_album:read",
            "stan_vacation_album:write"
        ]
    }
}

So, Stan has a vacation album.  Let’s say he wants to grant read access to his friend Bill.  He tells your app to grant read access to Bill and your app turns around and edit’s Bill’s user document by adding “stan_vacation_album:read” to his list of roles.  Great!  Bill has access, but how do we enforce read-only access and prevent Bill from, say, using Photoshop to edit himself into Stan’s vacation photos (Bill is not a good friend) and saving them over the originals in Stan’s database?

You can create a special design document in each database called ‘_auth’.  Within that design document, you can specify a function called validate_doc_update, which will be triggered each time someone tries to save a new revision of a document.  So, all we need to do is check the user’s role and not allow the update to go through if the user doesn’t have the right ‘role.’  For example:

function(newDoc, oldDoc, userCtx){ 
    if((userCtx.roles.indexOf('stan_vacation_album:write') == -1)&& 
       (userCtx.roles.indexOf('stan_vacation_album:admin') == -1)&& 
       (userCtx.roles.indexOf('_admin') == -1)){
           throw({forbidden: "Not Authorized"});
    }
}

is a quick and dirty function that would do the trick.  Anyone that isn’t an admin or doesn’t have the :write role, won’t be able to write to the database.  Of course you can also do any data validation you like here as well.

Hope this helps 🙂

I’m also developing a CouchDB driver for Go as I work on my project, feel free to use it however you like.