C and Go – Dealing with void* parameters in cgo

Wrapping C libraries with cgo is usually a pretty straightforward process.  However, one problematic situation I’ve come across recently is dealing with C functions which take a void* type as a parameter.  In C, a void* is a pointer to an arbitrary data type.

I’ve been using one C library which allows you to store arbitrary data in a custom Tree-like data structure.  So naturally, you want to be able to feed a raw interface{} to your wrapper function.  Suppose your C library has two functions:

void set_data(void *the_data);
void *get_data();

What should our Go wrapper pass to set_data?  The easy solution is to do this:

func Wrapper SetData(theData interface{}) {
    C.set_data(unsafe.Pointer(&theData));
}

And this would work, prior to Go 1.6 (and is in fact the solution I had been using in my wrapper — though I don’t really use this feature of the library in question myself).  This, however, is dangerous (and also has the obvious problem that calling C.get_data is going to give you an unsafe.Pointer — hope you remember what kind of data you stored).  You’re passing a Go pointer to a C function which then stores said pointer, which means the Go garbage collector can no longer manage it.  This was considered so inadvisable that it is completely disallowed in Go 1.6 (try it and you’ll be rewarded with a nice runtime error: panic: runtime error: cgo argument has Go pointer to Go pointer )
Since I was busy building stuff using Go instead of following all of the discussions about this issue on the golang mailing list, I missed this little gem and had to spend some time figuring out why my builds started failing :/

So, is there a solution to this?  Not really.  Well, nothing good.  You could do something like this, I suppose:

func SetData(theData interface{}) error{
	//Get bytes from raw interface
	var buf bytes.Buffer
	enc := gob.NewEncoder(&buf)
	if err := enc.Encode(theData); err != nil {
		return false
	}
	data := buf.Bytes()
	C.set_data(unsafe.Pointer(&data[0]))
	return nil
}

What’s going on here?  Well, we’re making use of the encoding/gob package to convert the go interface{} into a Go byte array.  Then we’re passing a pointer to the first element in our byte array to C.set_data.

I don’t like it much.  I mean, sure – the data is there, but in order to extract the data later and gob-decode it, you’re going to need to remember the size of the byte array you fed into C.set_data.

So, let’s just disallow passing a raw interface{} to our wrapper function.  We could force the user to give us a byte array.  This simplifies things a bit:

func SetData(data []byte) {
	set_data(unsafe.Pointer(&data[0]))
}

But we still have the problem of remembering the length of our byte array so we can extract it later using C.GoBytes.

So, unfortunately passing arbitrary data from a Go interface{} into a C void* is just not practical.  For my particular C library wrapper, I just decided to require the user to supply a string rather than an interface{}.  This has the benefit of being easier to convert back and forth and you also don’t have to stash things like array sizes or references to C.free later.  Strings are reasonably versatile in that you could, say, convert a struct to JSON or even b64encode some binary data and shove it into a string if you really need to.  You could implement this solution thusly:

func SetData(theData string) {
	cstr := C.CString(userData)
	res := C.set_data(unsafe.Pointer(&cstr))
}

func GetData() string {
	data := C.get_data()
	return C.GoString((*C.char)(*(*unsafe.Pointer)(data)))
}

As you can see, converting the data back from a void* (unsafe.Pointer) to a String isn’t very straightforward, but it works well enough.

Using C libraries with Go

On my current project, which involves wiki-esque collaborative editing of documents, I decided I wanted to use markdown.  And since I wanted to use markdown (or rather one of the many almost-sort-of-compatible implementations of it), I decided that I might as well use CommonMark, which is attempting to introduce some sanity (standardization).

I’m using Go, and a quick search on google told me there weren’t any good golang implementations of CommonMark.  Since the prospect of implementing the CommonMark spec from scratch in Go seemed rather daunting (or rather, a project in and of itself), I decided to look into using the CommonMark C-language implementation.  Turns out it’s much easier than I expected to call C code from Go programs.  So, I spent a weekend coding up a Go wrapper for the CommonMark C library (which happens to be the reference implementation).

Setup

Go provides a handy utility called cgo to help you deal with C code from within Go.  For a simple example, here’s a signature for a libcmark function that I want to call from Go (exported library functions are in cmark.h):

CMARK_EXPORT
char *cmark_markdown_to_html(const char *text, int len);

To call this from Go, I can do the following:

package commonmark

/*
#cgo LDFLAGS: -lcmark
#include <stdlib.h>
#include "cmark.h"
*/
import "C"
import "unsafe"

func Md2Html(mdtext string) string {
	mdCstr := C.CString(mdtext)
	strLen := C.int(len(mdtext))
	defer C.free(unsafe.Pointer(mdCstr))
	htmlString := C.cmark_markdown_to_html(mdCstr, strLen)
	defer C.free(unsafe.Pointer(htmlString))
	return C.GoString(htmlString)
}

The lines in the comment block under the package declaration are not just comments 🙂  They’re actually used by cgo.  First, the line:

#cgo LDFLAGS: -lcmark

specifies options to pass to the c linker.  In this case, I’m instructing it to link in libcmark.  You can also specify any CFLAGS, etc. you want.

#include <stdlib.h>
#include "cmark.h"

If you know C, these lines are self-explanatory 🙂  They’re preprocessor directives telling the compiler to include the stdlib.h and cmark.h files.  Sort of like an import statement… but that’s an oversimplification.

Last but not least, if you’re going to be working with C, you need to import the C package:

import "C"

C Types

Now, there’s a lot going on in the code itself:

	mdCstr := C.CString(mdtext)
	strLen := C.int(len(mdtext))
	defer C.free(unsafe.Pointer(mdCstr))

These lines are basically converting our Go data types into C types.  Our C function, cmark_markdown_to_html(), takes two parameters, a string (in C, this is a char*), and an integer containing the length of the string.  Since C doesn’t understand Go types (and vice-versa), we explicitly convert them using the C package’s handy CString and int functions.

The deferred call to C.free deallocates the CString and releases its memory.  This is necessary because C is most definitely NOT a garbage-collected language (one of the reasons for C’s generally excellent performance is that it’s lean and mean), so when dealing with C types, you have to be mindful of what you’re doing or you’ll end up leaking memory.  In C, strings are really just arrays of characters.  Normally, you pass strings around in C by passing a pointer to the first character in the string (char*).  When you call C.CString, it allocates enough memory to hold the characters from your Go string, copies them over, and then gives you a pointer.  This newly allocated memory is NOT handled by Go’s GC for you.  So, when you’re done with a CString, you need to call C.free (incidentally, the free() C function is in the stdlib library, which is why we included stdlib.h).  An easy way to handle this is to put C.free in a defer statement right after you allocate it (assuming you want it gone when the function finishes).  We provide C.free() the raw C pointer using the unsafe package’s unsafe.Pointer type here.

Now, to finish up:

htmlString := C.cmark_markdown_to_html(mdCstr, strLen)
defer C.free(unsafe.Pointer(htmlString))
return C.GoString(htmlString)

We call our libcmark function via C.cmark_markdown_to_html and feed it our CString and Cint.  It returns a C char*, which we must convert to a Go string in our return statement (the C package also provides functions for converting types from C to Go).  We defer the call to free the htmlString pointer, we won’t need it after our nice Go string is returned 🙂

C Enums

Another little gotcha when dealing with C code is the fact that C has enums while Go does not 😦

For example, in cmark.h, there are several enums, one of which is defined thusly:

typedef enum {
	CMARK_NO_LIST,
	CMARK_BULLET_LIST,
	CMARK_ORDERED_LIST
}  cmark_list_type;

So how do we deal with this in Go?  One thing to note is that C enums are represented as integer values, so in the above example CMARK_NO_LIST has a value of 0, CMARK_BULLET_LIST has a value of 1, and so on.  So, if you need to pass an enum type to a C function, you could just give it an integer.  I don’t particularly like that solution, since my memory tends to suck and I don’t want to have to flip between my Go and C code looking up what type of list a ‘1’ represents.

Fortunately, in Go we can approximate an enumerated type by doing this:

type ListType int

const (
	CMARK_NO_LIST ListType = iota
	CMARK_BULLET_LIST
	CMARK_ORDERED_LIST
)

Then, when I need a C cmark_list_type, I can do this:

lt := CMARK_BULLET_LIST
C.cmark_list_type(lt)

Not too hard.

You can find my commonmark go wrapper here: https://github.com/rhinoman/go-commonmark