C and Go – Dealing with void* parameters in cgo

Wrapping C libraries with cgo is usually a pretty straightforward process.  However, one problematic situation I’ve come across recently is dealing with C functions which take a void* type as a parameter.  In C, a void* is a pointer to an arbitrary data type.

I’ve been using one C library which allows you to store arbitrary data in a custom Tree-like data structure.  So naturally, you want to be able to feed a raw interface{} to your wrapper function.  Suppose your C library has two functions:

void set_data(void *the_data);
void *get_data();

What should our Go wrapper pass to set_data?  The easy solution is to do this:

func Wrapper SetData(theData interface{}) {
    C.set_data(unsafe.Pointer(&theData));
}

And this would work, prior to Go 1.6 (and is in fact the solution I had been using in my wrapper — though I don’t really use this feature of the library in question myself).  This, however, is dangerous (and also has the obvious problem that calling C.get_data is going to give you an unsafe.Pointer — hope you remember what kind of data you stored).  You’re passing a Go pointer to a C function which then stores said pointer, which means the Go garbage collector can no longer manage it.  This was considered so inadvisable that it is completely disallowed in Go 1.6 (try it and you’ll be rewarded with a nice runtime error: panic: runtime error: cgo argument has Go pointer to Go pointer )
Since I was busy building stuff using Go instead of following all of the discussions about this issue on the golang mailing list, I missed this little gem and had to spend some time figuring out why my builds started failing :/

So, is there a solution to this?  Not really.  Well, nothing good.  You could do something like this, I suppose:

func SetData(theData interface{}) error{
	//Get bytes from raw interface
	var buf bytes.Buffer
	enc := gob.NewEncoder(&buf)
	if err := enc.Encode(theData); err != nil {
		return false
	}
	data := buf.Bytes()
	C.set_data(unsafe.Pointer(&data[0]))
	return nil
}

What’s going on here?  Well, we’re making use of the encoding/gob package to convert the go interface{} into a Go byte array.  Then we’re passing a pointer to the first element in our byte array to C.set_data.

I don’t like it much.  I mean, sure – the data is there, but in order to extract the data later and gob-decode it, you’re going to need to remember the size of the byte array you fed into C.set_data.

So, let’s just disallow passing a raw interface{} to our wrapper function.  We could force the user to give us a byte array.  This simplifies things a bit:

func SetData(data []byte) {
	set_data(unsafe.Pointer(&data[0]))
}

But we still have the problem of remembering the length of our byte array so we can extract it later using C.GoBytes.

So, unfortunately passing arbitrary data from a Go interface{} into a C void* is just not practical.  For my particular C library wrapper, I just decided to require the user to supply a string rather than an interface{}.  This has the benefit of being easier to convert back and forth and you also don’t have to stash things like array sizes or references to C.free later.  Strings are reasonably versatile in that you could, say, convert a struct to JSON or even b64encode some binary data and shove it into a string if you really need to.  You could implement this solution thusly:

func SetData(theData string) {
	cstr := C.CString(userData)
	res := C.set_data(unsafe.Pointer(&cstr))
}

func GetData() string {
	data := C.get_data()
	return C.GoString((*C.char)(*(*unsafe.Pointer)(data)))
}

As you can see, converting the data back from a void* (unsafe.Pointer) to a String isn’t very straightforward, but it works well enough.

Making an extensible wiki system with Go

My little side project for roughly the last year has been a wiki system intended for enterprise use.

This certainly isn’t a new idea — off the top of my head, I can think of several of these “enterprise” wiki systems, with Confluence and Liferay being the most obvious (and widely used) examples.  The problem with the current solutions, at least in my mind, is that they have become so bloated and overloaded with features that they are difficult to use, difficult to administer, and difficult to extend.  I’ve been forced to use these (and other) collaboration/wiki systems and while I see the value in them, the sort of system I want to use just doesn’t seem to exist.

My goals were/are thus:

  1. Provide basic wiki functionality in a simple and clean UI as part of the core system
  2. Use markdown as the markup language (never liked wikicode) for editing
  3. Be horizontally scalable (I’ve suffered through overburdened enterprise software)
  4. Be extensible, both in the frontend UI (plugins) and the backend services
  5. Don’t run on the JVM (because reasons) 😉

Ok, that last one is kind of a joke (but not really). At least the core system ought to be native, while of course additional services can be written in just about any programming language.

After a year of working a few hours a week on this, I’ve come up with something I call Wikifeat (hey, the .com was available). It’s not finished, of course, and likely won’t be for a while yet.  Building an enterprise wiki/collaboration platform is proving a daunting task (especially goal #4 above), but the basics are done and the system is at least usable:

wikifeat_screenshot

Screenshot of the Wikifeat interface, showing a map plugin.

Technology

The Wikifeat backend consists of a CouchDB server(s) and several small (you might almost call them ‘micro’) services written in Go.  These services have names like ‘users’, ‘wikis’, etc. denoting their function.  Multiple instances of these services can be run across multiple machines in a network, in order to provide scalability.  The ‘frontend’ service acts a router for user requests from the Wikifeat web application, directing them to the appropriate service to fulfil each request.  Backend services  communicate directly with one another when needed as well via RESTful APIs.

The service ‘instances’ find each other via a service registry.  I decided on the excellent etcd to serve as my registry.  Each service instance maintains a cache of all of the other services registered with etcd that it can pull from when it needs to send a request to another service.

Extending the backend is a simple matter of developing a new service, registering it with etcd, and making use of the other services’ REST APIs.  The frontend service also has a facility for routing requests to these custom services (in the anticipated common use case of pairing a front-end javascript ‘plugin’ with a custom backend service).   Frontend plugins are written in Javascript and placed in a directory in the frontend service area.

The webapp itself is written as a single-page application with Backbone and Marionette.  The Wiki system is mostly complete.  Pages can be edited using CommonMark, a rather new implementation of Markdown.  I personally like Markdown for its simplicity, and always hated the various WYSIWYG HTML editors commonly included with CMS / Collaboration software.  Most developers already know markdown, and most non-techies should be able to pick it up quickly (being that it *is* meant to be a human-readable markup language):

wikifeat_edit

Wikifeat markdown editor

If that text editor looks familiar, it’s basically a tweaked version of the wmd editor used on stackexchange 🙂

Future Plans

I’m looking forward to continuing to evolve this thing.  The frontend probably needs the most help right now.  I actually hope to replace the wmd markdown editor with something more ‘custom’ that can make inserting plugins easier.  I’d also like to allow plugins to add custom ‘insertion’ helper buttons to the editor, along with a plugin ‘editor’ view, rather than requiring the user to enter a custom div block.

My mind is also overflowing with ideas for future plugins.  Calendars, blogs, integration with third party services/applications, etc.  Hopefully I can get to those eventually.  It would be *really* swell if someone else took on some of that work as well.  The project is open source (GPLv2 BSD), so that’s certainly possible…

UPDATE: After some feedback and reflection, I’ve decided to change the license from GPL to BSD 🙂

Using C libraries with Go

On my current project, which involves wiki-esque collaborative editing of documents, I decided I wanted to use markdown.  And since I wanted to use markdown (or rather one of the many almost-sort-of-compatible implementations of it), I decided that I might as well use CommonMark, which is attempting to introduce some sanity (standardization).

I’m using Go, and a quick search on google told me there weren’t any good golang implementations of CommonMark.  Since the prospect of implementing the CommonMark spec from scratch in Go seemed rather daunting (or rather, a project in and of itself), I decided to look into using the CommonMark C-language implementation.  Turns out it’s much easier than I expected to call C code from Go programs.  So, I spent a weekend coding up a Go wrapper for the CommonMark C library (which happens to be the reference implementation).

Setup

Go provides a handy utility called cgo to help you deal with C code from within Go.  For a simple example, here’s a signature for a libcmark function that I want to call from Go (exported library functions are in cmark.h):

CMARK_EXPORT
char *cmark_markdown_to_html(const char *text, int len);

To call this from Go, I can do the following:

package commonmark

/*
#cgo LDFLAGS: -lcmark
#include <stdlib.h>
#include "cmark.h"
*/
import "C"
import "unsafe"

func Md2Html(mdtext string) string {
	mdCstr := C.CString(mdtext)
	strLen := C.int(len(mdtext))
	defer C.free(unsafe.Pointer(mdCstr))
	htmlString := C.cmark_markdown_to_html(mdCstr, strLen)
	defer C.free(unsafe.Pointer(htmlString))
	return C.GoString(htmlString)
}

The lines in the comment block under the package declaration are not just comments 🙂  They’re actually used by cgo.  First, the line:

#cgo LDFLAGS: -lcmark

specifies options to pass to the c linker.  In this case, I’m instructing it to link in libcmark.  You can also specify any CFLAGS, etc. you want.

#include <stdlib.h>
#include "cmark.h"

If you know C, these lines are self-explanatory 🙂  They’re preprocessor directives telling the compiler to include the stdlib.h and cmark.h files.  Sort of like an import statement… but that’s an oversimplification.

Last but not least, if you’re going to be working with C, you need to import the C package:

import "C"

C Types

Now, there’s a lot going on in the code itself:

	mdCstr := C.CString(mdtext)
	strLen := C.int(len(mdtext))
	defer C.free(unsafe.Pointer(mdCstr))

These lines are basically converting our Go data types into C types.  Our C function, cmark_markdown_to_html(), takes two parameters, a string (in C, this is a char*), and an integer containing the length of the string.  Since C doesn’t understand Go types (and vice-versa), we explicitly convert them using the C package’s handy CString and int functions.

The deferred call to C.free deallocates the CString and releases its memory.  This is necessary because C is most definitely NOT a garbage-collected language (one of the reasons for C’s generally excellent performance is that it’s lean and mean), so when dealing with C types, you have to be mindful of what you’re doing or you’ll end up leaking memory.  In C, strings are really just arrays of characters.  Normally, you pass strings around in C by passing a pointer to the first character in the string (char*).  When you call C.CString, it allocates enough memory to hold the characters from your Go string, copies them over, and then gives you a pointer.  This newly allocated memory is NOT handled by Go’s GC for you.  So, when you’re done with a CString, you need to call C.free (incidentally, the free() C function is in the stdlib library, which is why we included stdlib.h).  An easy way to handle this is to put C.free in a defer statement right after you allocate it (assuming you want it gone when the function finishes).  We provide C.free() the raw C pointer using the unsafe package’s unsafe.Pointer type here.

Now, to finish up:

htmlString := C.cmark_markdown_to_html(mdCstr, strLen)
defer C.free(unsafe.Pointer(htmlString))
return C.GoString(htmlString)

We call our libcmark function via C.cmark_markdown_to_html and feed it our CString and Cint.  It returns a C char*, which we must convert to a Go string in our return statement (the C package also provides functions for converting types from C to Go).  We defer the call to free the htmlString pointer, we won’t need it after our nice Go string is returned 🙂

C Enums

Another little gotcha when dealing with C code is the fact that C has enums while Go does not 😦

For example, in cmark.h, there are several enums, one of which is defined thusly:

typedef enum {
	CMARK_NO_LIST,
	CMARK_BULLET_LIST,
	CMARK_ORDERED_LIST
}  cmark_list_type;

So how do we deal with this in Go?  One thing to note is that C enums are represented as integer values, so in the above example CMARK_NO_LIST has a value of 0, CMARK_BULLET_LIST has a value of 1, and so on.  So, if you need to pass an enum type to a C function, you could just give it an integer.  I don’t particularly like that solution, since my memory tends to suck and I don’t want to have to flip between my Go and C code looking up what type of list a ‘1’ represents.

Fortunately, in Go we can approximate an enumerated type by doing this:

type ListType int

const (
	CMARK_NO_LIST ListType = iota
	CMARK_BULLET_LIST
	CMARK_ORDERED_LIST
)

Then, when I need a C cmark_list_type, I can do this:

lt := CMARK_BULLET_LIST
C.cmark_list_type(lt)

Not too hard.

You can find my commonmark go wrapper here: https://github.com/rhinoman/go-commonmark

Two weeks with Go

gophercolor

You can tell this is a gopher because… teeth. [1]

It’s been awhile since my last post, as I’m sure my legions of loyal readers have surely noticed….. surely.

Anyway, since my last post I’ve moved my family from Northern Virginia to the Florida coast to take a new job (I’ve discovered that the DC area does not agree with me), so I’ve been a bit distracted.  However, now that I’m (mostly) settled in and not constantly unpacking, setting up house, fixing things, etc., I have some time to pursue my side projects again.  I had been working on a training tracker tool in Scala, but decided to mothball that one for the time being, as interest from potential customers just wasn’t there.

For now, I’ve decided to explore a few new technologies.  I’ve always wanted to learn about NoSQL databases, but I just never had a product idea that seemed to call for a non-relational database solution.  So, I’ve been messing around with CouchDB… and while I’m just messing around, I decided to play with this new Go programming language I’ve been hearing so much about lately.

First Impressions

I remember hearing about Go several years ago when it was first introduced.  At the time I was spending my spare time playing around with Ruby on Rails, and at first blush Go looked very, very similar to C:


package main

import "fmt"

func main() {
    fmt.Println("Hello, 世界")
}

I mean, static typing?! Web development was going to be moving to dynamic languages like Ruby and Python anyway, why waste my time with this thing?

Rabbit

I also thought the Go gopher mascot was reminiscent of that Plan9 rabbit thing, but that’s not important. [1]

Well, I’ve since soured on Ruby on Rails.  Actually, mostly Rails — the Ruby language itself is actually pretty fun to use (though I’ve also soured on dynamic type systems a bit as well), and I could see myself using a lightweight framework like Sinatra to write a REST API.  Problem is, the entire Ruby community seems to revolve around Rails and the veneration of DHH (I have yet to experience a recruiter calling me to talk about “Ruby” jobs, just “Rails” jobs), so I decided to move on to other technologies.

Second, er, impressions

Fast forward a couple of years.  My day job involves writing web services and applications mostly in Java (sometimes Scala and Javascript).  While Scala is a fine language, and I used it as the primary language on my last side project, you will never catch me coding in Java during my free time. Writing Java is tedious, frustrating, and always feels like work.  But it pays the mortgage, it does.

Since I was having trouble generating ideas for a new project, I decided to spend some time learning a new language.  Go was/is gaining in popularity (and version 1.3 of the language was released just a few months ago), so I decided to give it a second look.  I figured devoting a few weeks to Go wouldn’t kill me (probably).  I stepped through the excellent Tour of Go and decided my initial opinion of Go was misplaced.  Here was a delightfully small, concise language with actually decent concurrency support (It’s no Akka, but goroutines and channels took me minutes to grasp as opposed to days/weeks with Akka).

Here’s an excerpt from the concurrency section on Tour of Go:

package main

import "fmt"

func sum(a []int, c chan int) {
    sum := 0
    for _, v := range a {
        sum += v
    }
    c

To briefly explain the fun bits involving concurrency, placing the keyword ‘go’ before a function call (as in go sum() ) fires off a goroutine, which is executed concurrently with the main thread of execution.  Go hides the complexities of managing threads, pools, etc., from you in a way that just works.

Go also allows you to send data between goroutines using channels.  The line:

c

writes the value of sum to channel c.  You can see that the main function reads from that channel (actually, it reads from it twice) with the line:

x, y := <-c,

incidentally, this line of code blocks main() until two values from channel c are received, so you do have to be mindful of what you’re doing when using channels 🙂

If you’ve ever written concurrent code (or had to debug a multi-threaded application), this probably seems too easy.  I’ll admit it’s not as powerful as Akka actors.  Though – to explain Akka actors to somebody, its concomitant messaging scheme, dispatching, etc., I’d probably need several hours and a whiteboard (or a chalkboard, or a sidewalk).

This kind of simplicity and well thought out language design is why I think Go is going to become very popular in the near future.  It really does strike me as C, modernized.  Unlike Scala, Erlang, etc., here’s a language designed for modern network programming, with pretty darn good concurrency support, that isn’t a bear (or ocelot) to learn.  The most common (admittedly legitimate) concern expressed to me by my bosses whenever I suggested using, say, Scala is the lack of readily available Scala devs out there: “If I need a new Java developer, I can find one pretty much on demand.  Scala developers are hard to find, AND more expensive.”  And even though a smart developer can learn Scala, it’s such an advanced (and feature-rich) language that it’s going to take a considerable amount of time before a new Scala dev can be productive (especially if said dev doesn’t have experience with functional programming).  But with Go — you really ought to be able to pick it up in a few days.  And it has the advantage of not being Java, which is a good thing, always.

Conclusion

I do believe I’ll stick with getting better at programming in Go for a little while and will probably make use of it whenever I can 🙂

Oh, and I figured a good project to learn more about Go and CouchDB would be… a Go driver for CouchDB 🙂  I know there are already several out there (some good, some … not), but I figured this would be a good vehicle for learning Go (and couch) and so far it has been — see my repo here.

The Gopher mascot and logo were designed by Renée French, who also designed Glenda, the Plan 9 bunny. The logo and mascot are covered by the Creative Commons Attribution 3.0 license.[1]; permission