Using C libraries with Go

On my current project, which involves wiki-esque collaborative editing of documents, I decided I wanted to use markdown.  And since I wanted to use markdown (or rather one of the many almost-sort-of-compatible implementations of it), I decided that I might as well use CommonMark, which is attempting to introduce some sanity (standardization).

I’m using Go, and a quick search on google told me there weren’t any good golang implementations of CommonMark.  Since the prospect of implementing the CommonMark spec from scratch in Go seemed rather daunting (or rather, a project in and of itself), I decided to look into using the CommonMark C-language implementation.  Turns out it’s much easier than I expected to call C code from Go programs.  So, I spent a weekend coding up a Go wrapper for the CommonMark C library (which happens to be the reference implementation).

Setup

Go provides a handy utility called cgo to help you deal with C code from within Go.  For a simple example, here’s a signature for a libcmark function that I want to call from Go (exported library functions are in cmark.h):

CMARK_EXPORT
char *cmark_markdown_to_html(const char *text, int len);

To call this from Go, I can do the following:

package commonmark

/*
#cgo LDFLAGS: -lcmark
#include <stdlib.h>
#include "cmark.h"
*/
import "C"
import "unsafe"

func Md2Html(mdtext string) string {
	mdCstr := C.CString(mdtext)
	strLen := C.int(len(mdtext))
	defer C.free(unsafe.Pointer(mdCstr))
	htmlString := C.cmark_markdown_to_html(mdCstr, strLen)
	defer C.free(unsafe.Pointer(htmlString))
	return C.GoString(htmlString)
}

The lines in the comment block under the package declaration are not just comments 🙂  They’re actually used by cgo.  First, the line:

#cgo LDFLAGS: -lcmark

specifies options to pass to the c linker.  In this case, I’m instructing it to link in libcmark.  You can also specify any CFLAGS, etc. you want.

#include <stdlib.h>
#include "cmark.h"

If you know C, these lines are self-explanatory 🙂  They’re preprocessor directives telling the compiler to include the stdlib.h and cmark.h files.  Sort of like an import statement… but that’s an oversimplification.

Last but not least, if you’re going to be working with C, you need to import the C package:

import "C"

C Types

Now, there’s a lot going on in the code itself:

	mdCstr := C.CString(mdtext)
	strLen := C.int(len(mdtext))
	defer C.free(unsafe.Pointer(mdCstr))

These lines are basically converting our Go data types into C types.  Our C function, cmark_markdown_to_html(), takes two parameters, a string (in C, this is a char*), and an integer containing the length of the string.  Since C doesn’t understand Go types (and vice-versa), we explicitly convert them using the C package’s handy CString and int functions.

The deferred call to C.free deallocates the CString and releases its memory.  This is necessary because C is most definitely NOT a garbage-collected language (one of the reasons for C’s generally excellent performance is that it’s lean and mean), so when dealing with C types, you have to be mindful of what you’re doing or you’ll end up leaking memory.  In C, strings are really just arrays of characters.  Normally, you pass strings around in C by passing a pointer to the first character in the string (char*).  When you call C.CString, it allocates enough memory to hold the characters from your Go string, copies them over, and then gives you a pointer.  This newly allocated memory is NOT handled by Go’s GC for you.  So, when you’re done with a CString, you need to call C.free (incidentally, the free() C function is in the stdlib library, which is why we included stdlib.h).  An easy way to handle this is to put C.free in a defer statement right after you allocate it (assuming you want it gone when the function finishes).  We provide C.free() the raw C pointer using the unsafe package’s unsafe.Pointer type here.

Now, to finish up:

htmlString := C.cmark_markdown_to_html(mdCstr, strLen)
defer C.free(unsafe.Pointer(htmlString))
return C.GoString(htmlString)

We call our libcmark function via C.cmark_markdown_to_html and feed it our CString and Cint.  It returns a C char*, which we must convert to a Go string in our return statement (the C package also provides functions for converting types from C to Go).  We defer the call to free the htmlString pointer, we won’t need it after our nice Go string is returned 🙂

C Enums

Another little gotcha when dealing with C code is the fact that C has enums while Go does not 😦

For example, in cmark.h, there are several enums, one of which is defined thusly:

typedef enum {
	CMARK_NO_LIST,
	CMARK_BULLET_LIST,
	CMARK_ORDERED_LIST
}  cmark_list_type;

So how do we deal with this in Go?  One thing to note is that C enums are represented as integer values, so in the above example CMARK_NO_LIST has a value of 0, CMARK_BULLET_LIST has a value of 1, and so on.  So, if you need to pass an enum type to a C function, you could just give it an integer.  I don’t particularly like that solution, since my memory tends to suck and I don’t want to have to flip between my Go and C code looking up what type of list a ‘1’ represents.

Fortunately, in Go we can approximate an enumerated type by doing this:

type ListType int

const (
	CMARK_NO_LIST ListType = iota
	CMARK_BULLET_LIST
	CMARK_ORDERED_LIST
)

Then, when I need a C cmark_list_type, I can do this:

lt := CMARK_BULLET_LIST
C.cmark_list_type(lt)

Not too hard.

You can find my commonmark go wrapper here: https://github.com/rhinoman/go-commonmark

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s