Making an extensible wiki system with Go

My little side project for roughly the last year has been a wiki system intended for enterprise use.

This certainly isn’t a new idea — off the top of my head, I can think of several of these “enterprise” wiki systems, with Confluence and Liferay being the most obvious (and widely used) examples.  The problem with the current solutions, at least in my mind, is that they have become so bloated and overloaded with features that they are difficult to use, difficult to administer, and difficult to extend.  I’ve been forced to use these (and other) collaboration/wiki systems and while I see the value in them, the sort of system I want to use just doesn’t seem to exist.

My goals were/are thus:

  1. Provide basic wiki functionality in a simple and clean UI as part of the core system
  2. Use markdown as the markup language (never liked wikicode) for editing
  3. Be horizontally scalable (I’ve suffered through overburdened enterprise software)
  4. Be extensible, both in the frontend UI (plugins) and the backend services
  5. Don’t run on the JVM (because reasons) 😉

Ok, that last one is kind of a joke (but not really). At least the core system ought to be native, while of course additional services can be written in just about any programming language.

After a year of working a few hours a week on this, I’ve come up with something I call Wikifeat (hey, the .com was available). It’s not finished, of course, and likely won’t be for a while yet.  Building an enterprise wiki/collaboration platform is proving a daunting task (especially goal #4 above), but the basics are done and the system is at least usable:

wikifeat_screenshot

Screenshot of the Wikifeat interface, showing a map plugin.

Technology

The Wikifeat backend consists of a CouchDB server(s) and several small (you might almost call them ‘micro’) services written in Go.  These services have names like ‘users’, ‘wikis’, etc. denoting their function.  Multiple instances of these services can be run across multiple machines in a network, in order to provide scalability.  The ‘frontend’ service acts a router for user requests from the Wikifeat web application, directing them to the appropriate service to fulfil each request.  Backend services  communicate directly with one another when needed as well via RESTful APIs.

The service ‘instances’ find each other via a service registry.  I decided on the excellent etcd to serve as my registry.  Each service instance maintains a cache of all of the other services registered with etcd that it can pull from when it needs to send a request to another service.

Extending the backend is a simple matter of developing a new service, registering it with etcd, and making use of the other services’ REST APIs.  The frontend service also has a facility for routing requests to these custom services (in the anticipated common use case of pairing a front-end javascript ‘plugin’ with a custom backend service).   Frontend plugins are written in Javascript and placed in a directory in the frontend service area.

The webapp itself is written as a single-page application with Backbone and Marionette.  The Wiki system is mostly complete.  Pages can be edited using CommonMark, a rather new implementation of Markdown.  I personally like Markdown for its simplicity, and always hated the various WYSIWYG HTML editors commonly included with CMS / Collaboration software.  Most developers already know markdown, and most non-techies should be able to pick it up quickly (being that it *is* meant to be a human-readable markup language):

wikifeat_edit

Wikifeat markdown editor

If that text editor looks familiar, it’s basically a tweaked version of the wmd editor used on stackexchange 🙂

Future Plans

I’m looking forward to continuing to evolve this thing.  The frontend probably needs the most help right now.  I actually hope to replace the wmd markdown editor with something more ‘custom’ that can make inserting plugins easier.  I’d also like to allow plugins to add custom ‘insertion’ helper buttons to the editor, along with a plugin ‘editor’ view, rather than requiring the user to enter a custom div block.

My mind is also overflowing with ideas for future plugins.  Calendars, blogs, integration with third party services/applications, etc.  Hopefully I can get to those eventually.  It would be *really* swell if someone else took on some of that work as well.  The project is open source (GPLv2 BSD), so that’s certainly possible…

UPDATE: After some feedback and reflection, I’ve decided to change the license from GPL to BSD 🙂

CouchDB Authorization

For the last month or so I’ve been working on a little project using CouchDB and Go.  As this is my first time using a NoSQL database, I’ve had to learn to think a little bit differently about how I interact with my data.  One aspect of Couch that had me a little confused at first was how it deals with authentication and authorization.

When creating an application with a traditional RDBMS (with PostgreSQL being my favorite), I was used to handling authentication / authorization myself (usually via a third-party library) at the application level.  While you can of course still do this with CouchDB, when I first came across CouchDB’s auth support, I thought “Great! I can just use Couch for everything!”.  Turns out, I sorta can, but it’s not as simple as I had first assumed.  Oh well 🙂

Users are stored in a special database called “_users” and creating a new one is as simple as a PUT request (Admin credentials are generally required for creating or deleting users).  For authentication, you can either use Couch’s Session API, or in the case of HTTP Basic, just pass the client’s Basic Authentication headers through to Couch from your application. This was easy enough to implement and seems to work well enough for what I need.

Authorization, however, is where things can get tricky.  CouchDB handles permissions on a per database basis.  It is not possible to restrict access to individual documents.  “Well, that’s useless, guess I’ll have to handle authorization some other way” I thought to myself.  But the answer, if your use-case allows for it, is just to create multiple databases at the level of granularity you need to control access.  I resisted doing this at first, as being used to RDBMS systems, I kept thinking “a database is a big deal and creating dozens, hundreds, or even thousands of them is insane.”  But CouchDB is not Postgres 🙂 and a little research told me that creating a database per user account is actually a very common use case and doesn’t cause any serious issues (if you start getting into thousands of databases you may need to tweak your config to allow a larger number of databases to be open simultaneously).

CouchDB allows you to grant access to databases either on a per-user basis or based on user roles.  I would highly recommend doing role-based authentication as I think that’s more flexible.  To specify database users/roles, you edit a database’s security document (called ‘_security’).  The structure of the security object looks like this:

{
    "admins": {
        "names": [
            "superuser"
        ],
        "roles": [
            "admins"
        ]
    },
    "members": {
        "names": [
            "user1",
            "user2"
        ],
        "roles": [
            "developers"
        ]
    }
}

CouchDB allows two different levels of user: ‘members’ and ‘admins.’  Members can read and write documents in the database while admins can edit design documents as well as the database security document.  One thing to note here is that if no users/roles are specified, CouchDB will default a database to ‘public’ access, so… definitely put something here.  If you’re using roles, in order to grant/revoke access to a user, simply add or remove the desired role to/from the roles array from the user’s document in the _users database.

Now, CouchDB allows you to create and assign any roles you want, but doesn’t enforce any permissions beyond the aforementioned ‘member’ and ‘admin’ types.  So, since members can both read and write documents, how can you create roles for read-only users?  Unfortunately, CouchDB doesn’t make this easy or obvious, but it does provide a way to make it work.

I’ll illustrate how to do this with an example:  Let’s say you have an application that’s a photo sharing service.  Each user has various ‘albums’ that they can either keep private or share with others.  Most of the time, users are going to want to give others ‘read-only’ access to an album.  So, every time a user creates a new album, we create a database for it.  When a database is created, we also make sure to configure it with three roles: one admin role and two member roles, like so:

{
    "admins": {
        "names": [],
        "roles": [
            "stan_vacation_album:admin"
        ]
    },
    "members": {
        "names": [],
        "roles": [
            "stan_vacation_album:read",
            "stan_vacation_album:write"
        ]
    }
}

So, Stan has a vacation album.  Let’s say he wants to grant read access to his friend Bill.  He tells your app to grant read access to Bill and your app turns around and edit’s Bill’s user document by adding “stan_vacation_album:read” to his list of roles.  Great!  Bill has access, but how do we enforce read-only access and prevent Bill from, say, using Photoshop to edit himself into Stan’s vacation photos (Bill is not a good friend) and saving them over the originals in Stan’s database?

You can create a special design document in each database called ‘_auth’.  Within that design document, you can specify a function called validate_doc_update, which will be triggered each time someone tries to save a new revision of a document.  So, all we need to do is check the user’s role and not allow the update to go through if the user doesn’t have the right ‘role.’  For example:

function(newDoc, oldDoc, userCtx){ 
    if((userCtx.roles.indexOf('stan_vacation_album:write') == -1)&& 
       (userCtx.roles.indexOf('stan_vacation_album:admin') == -1)&& 
       (userCtx.roles.indexOf('_admin') == -1)){
           throw({forbidden: "Not Authorized"});
    }
}

is a quick and dirty function that would do the trick.  Anyone that isn’t an admin or doesn’t have the :write role, won’t be able to write to the database.  Of course you can also do any data validation you like here as well.

Hope this helps 🙂

I’m also developing a CouchDB driver for Go as I work on my project, feel free to use it however you like.

Two weeks with Go

gophercolor

You can tell this is a gopher because… teeth. [1]

It’s been awhile since my last post, as I’m sure my legions of loyal readers have surely noticed….. surely.

Anyway, since my last post I’ve moved my family from Northern Virginia to the Florida coast to take a new job (I’ve discovered that the DC area does not agree with me), so I’ve been a bit distracted.  However, now that I’m (mostly) settled in and not constantly unpacking, setting up house, fixing things, etc., I have some time to pursue my side projects again.  I had been working on a training tracker tool in Scala, but decided to mothball that one for the time being, as interest from potential customers just wasn’t there.

For now, I’ve decided to explore a few new technologies.  I’ve always wanted to learn about NoSQL databases, but I just never had a product idea that seemed to call for a non-relational database solution.  So, I’ve been messing around with CouchDB… and while I’m just messing around, I decided to play with this new Go programming language I’ve been hearing so much about lately.

First Impressions

I remember hearing about Go several years ago when it was first introduced.  At the time I was spending my spare time playing around with Ruby on Rails, and at first blush Go looked very, very similar to C:


package main

import "fmt"

func main() {
    fmt.Println("Hello, 世界")
}

I mean, static typing?! Web development was going to be moving to dynamic languages like Ruby and Python anyway, why waste my time with this thing?

Rabbit

I also thought the Go gopher mascot was reminiscent of that Plan9 rabbit thing, but that’s not important. [1]

Well, I’ve since soured on Ruby on Rails.  Actually, mostly Rails — the Ruby language itself is actually pretty fun to use (though I’ve also soured on dynamic type systems a bit as well), and I could see myself using a lightweight framework like Sinatra to write a REST API.  Problem is, the entire Ruby community seems to revolve around Rails and the veneration of DHH (I have yet to experience a recruiter calling me to talk about “Ruby” jobs, just “Rails” jobs), so I decided to move on to other technologies.

Second, er, impressions

Fast forward a couple of years.  My day job involves writing web services and applications mostly in Java (sometimes Scala and Javascript).  While Scala is a fine language, and I used it as the primary language on my last side project, you will never catch me coding in Java during my free time. Writing Java is tedious, frustrating, and always feels like work.  But it pays the mortgage, it does.

Since I was having trouble generating ideas for a new project, I decided to spend some time learning a new language.  Go was/is gaining in popularity (and version 1.3 of the language was released just a few months ago), so I decided to give it a second look.  I figured devoting a few weeks to Go wouldn’t kill me (probably).  I stepped through the excellent Tour of Go and decided my initial opinion of Go was misplaced.  Here was a delightfully small, concise language with actually decent concurrency support (It’s no Akka, but goroutines and channels took me minutes to grasp as opposed to days/weeks with Akka).

Here’s an excerpt from the concurrency section on Tour of Go:

package main

import "fmt"

func sum(a []int, c chan int) {
    sum := 0
    for _, v := range a {
        sum += v
    }
    c

To briefly explain the fun bits involving concurrency, placing the keyword ‘go’ before a function call (as in go sum() ) fires off a goroutine, which is executed concurrently with the main thread of execution.  Go hides the complexities of managing threads, pools, etc., from you in a way that just works.

Go also allows you to send data between goroutines using channels.  The line:

c

writes the value of sum to channel c.  You can see that the main function reads from that channel (actually, it reads from it twice) with the line:

x, y := <-c,

incidentally, this line of code blocks main() until two values from channel c are received, so you do have to be mindful of what you’re doing when using channels 🙂

If you’ve ever written concurrent code (or had to debug a multi-threaded application), this probably seems too easy.  I’ll admit it’s not as powerful as Akka actors.  Though – to explain Akka actors to somebody, its concomitant messaging scheme, dispatching, etc., I’d probably need several hours and a whiteboard (or a chalkboard, or a sidewalk).

This kind of simplicity and well thought out language design is why I think Go is going to become very popular in the near future.  It really does strike me as C, modernized.  Unlike Scala, Erlang, etc., here’s a language designed for modern network programming, with pretty darn good concurrency support, that isn’t a bear (or ocelot) to learn.  The most common (admittedly legitimate) concern expressed to me by my bosses whenever I suggested using, say, Scala is the lack of readily available Scala devs out there: “If I need a new Java developer, I can find one pretty much on demand.  Scala developers are hard to find, AND more expensive.”  And even though a smart developer can learn Scala, it’s such an advanced (and feature-rich) language that it’s going to take a considerable amount of time before a new Scala dev can be productive (especially if said dev doesn’t have experience with functional programming).  But with Go — you really ought to be able to pick it up in a few days.  And it has the advantage of not being Java, which is a good thing, always.

Conclusion

I do believe I’ll stick with getting better at programming in Go for a little while and will probably make use of it whenever I can 🙂

Oh, and I figured a good project to learn more about Go and CouchDB would be… a Go driver for CouchDB 🙂  I know there are already several out there (some good, some … not), but I figured this would be a good vehicle for learning Go (and couch) and so far it has been — see my repo here.

The Gopher mascot and logo were designed by Renée French, who also designed Glenda, the Plan 9 bunny. The logo and mascot are covered by the Creative Commons Attribution 3.0 license.[1]; permission