James Adam

A Plugin architecture with Spring, Consul, and Camel

For the past couple of months I’ve been working on a data management tool I’m calling OpenDMP (see my earlier post). As I’ve started adding more features, I’ve run into a scalability issue a bit sooner than I had expected and so I decided to tackle what is hopefully the biggest remaining piece of the project’s system architecture.

The Need for Plugins

From the architecture diagram at the top of this post, you can probably tell that the “Processor Service” is where most of the work for OpenDMP gets done. Dataflows are designed by users in a web app and then handed off to the Processor Service, which sets up Apache Camel routes to ingest, transform, and then store the results. I wanted to allow users to run arbitrary scripts, programs, and/or tools on their data as it traverses their flow, so I created different types of processors users could add to their flows to enable this.

First, the Script Processor, which allows users to enter their own Python or Clojure scripts to transform their data which is then executed within the Processor Service.

Second, the External Processor, which runs an arbitrary command in an OS shell.

The problem? All of the OpenDMP services are running in containers, which means any external tools you want to run in your dataflow would have to be included in the Processor Container (in theory, a user would extend the opendmp-processor image and add whatever dependencies they need). This obviously won’t scale, as you would eventually end up with an enormously bloated processor service image if you needed to add more than a few dependencies. The Script processor, for example, already required me to add python support (along with a few key libraries, such as numpy) to the base image, which increased the image size considerably.

I realized when I started experimenting with other tools such as imagemagick and ffmpeg with the external processor, that solution just wasn’t going to work. So I needed to come up with something different. I’d had the idea for plugins early on, so I decided to tackle that now rather than continuing to put it off. I figured I could also use plugins to do something I’ve been wanting to do with the Script processor – get all the python stuff out of the base image 🙂

The Plugin Processor

So, enter a new processor type: the Plugin Processor.

The idea is that plugins run in their own containers, and are meant to do one specific thing. In the example above, the data would be a sent to a container with FFMPEG installed, which would run the user specified ffmpeg command line on the data and return the result to the flow.

Simple idea, but this introduces a new bit of complexity into the system. I no longer have a fixed collection of services running. I.e, OpenDMP has no way of knowing what plugins are going to be running on any arbitrary installation (not to mention the possibility of multiple instances of plugins). So, the first thing I knew I needed was a Service Discovery solution. For this, I chose Hashicorp Consul. It’s supported by both Apache Camel and Spring Boot, so I figured that would make integration easier.

Consul itself was simple enough to get running. The available images on docker hub work well and spinning up a small cluster was easy.

Apache Camel provides support for calling external services as part of a Camel route via the ServiceCall EIP. Combined with Spring Boot Cloud, Camel will automatically query Consul for an available instance and call the requested service, feeding the result back into the Camel route. Of course, error handling becomes important here – we’re relying on external services which may or may not be running, and user input and/or data which may or may not result in said service throwing an error. Fortunately, Camel provides the ability to specify an error handler on either a global, per route, or per processor basis, so I made use of the DeadLetter error handler to make sure I could report errors back to the Dataflow service so that they can be brought to the user’s attention.

Because I feel like I should include a code example somewhere in this post, Here’s how I’m calling external services in the processor service. Note I’m also using Camel’s Circuit Breaker EIP, because I already ran into a situation where an error resulted in Camel infinitely retrying to send the same bad request to a plugin 🙂

  /**
     * Make a service call!
     */
    private fun serviceCall(route: RouteDefinition, proc: ProcessorRunModel)  {
        val service = proc.properties?.get("serviceName").toString()
        val params = getQueryParams(proc)
        route
                .log("Making call to $service")
                .circuitBreaker().inheritErrorHandler(true)
                  .serviceCall()
                    .serviceCallConfiguration("basicServiceCall")
                    .name(service)
                    .uri("http://$service/process?${getQueryParams(proc)}")
                    .end()
                .endCircuitBreaker()
                .log("completed call to $service")
                .removeHeaders("CamelServiceCall*")

    }

Note: I’m a little annoyed that the WordPress code block doesn’t support Kotlin syntax highlighting

Customizable Configuration

Besides specifying what plugin to run, it’s probable that users are going to need to specify some parameters for most plugins. In the case of our simplistic FFMPEG plugin, for example, the command line switches and options to provide to FFMPEG. So, how is the OpenDMP front-end supposed to know what parameters a random plugin is going to need?

To address this issue, I decided a plugin needed to expose two endpoints: “/process”, for processing data as part of a flow, and “/config”, to provide configuration information to OpenDMP.

Here’s an example config response from the ffmpeg plugin:

{
  "opendmp-ffmpeg-plugin": {
    "serviceName": "opendmp-ffmpeg-plugin",
    "displayName": "FFMPEG Processor",
    "type": "EXTERNAL",
    "fields": {
      "command": {
        "type": "STRING",
        "required": true,
        "helperText": "The command line to pass to FFMPEG"
      },
      "timeout": {
        "type": "NUMBER",
        "required": false,
        "helperText": "The amount of time (in seconds) to wait for FFMPEG to complete"
      }
    }
  }
}

The Dataflow Service is given a list of enabled plugin names as an environment variable. It then queries all of the enabled plugins (retrieving instance information from consul, of course) for their config information. The UI can then retrieve this config information for a particular plugin and use it to build a form for the user to populate.

Extensibility Enabled

So, using plugins, we can add pretty much any capability we’d like to OpenDMP. Since the plugin services communicate over HTTP, a plugin could be implemented in any language. Right now, I’m whipping up a little SDK in Kotlin for JVM usage, but creating a simple SDK for, say, Python later should be a simple task. I also plan to use the plugin functionality “under the hood” for other processors (particularly, scripts) in order to get some dependencies out of the main Processor Service container, and of course, decrease the CPU load on that service.

So, this opens up a whole range of possibilities – A tensorflow plugin running in a CUDA-aware container is one possible example.

Thanks for reading, if you’re interested in digging through some code, check out the OpenDMP code repository here.

My Lock-down Project: A Data Management Thing

Visit the Github repo for this project

I tend to write code when I get bored.

I figure it’s been two years since my last blog post so maybe I ought to actually finish one of the several half-finished drafts I have sitting around and publish it. Or maybe I’ll just write a new blog post talking about the little personal project I’ve been working on for the past few months.

Yep, that sounds more fun, I’ll do that.

So, like most of us I’ve had some extra time on my hands since this COVID situation started, and I happened to get an idea for a new project. I’m terrible at naming things (see: Contabulo), so I just gave up on trying to be clever and went with a boring, generic, somewhat descriptive name for this project: Open Data Management Platform (OpenDMP). Open, because it’s open source, and “Data Management Platform” because it’s a platform that… manages data.

Motivation

Well, the motivation for this little project was to create a generic data management system that can serve as a common starting point for new data projects and help prevent the proliferation of bespoke solutions that are being implemented for every single organization, project and/or program. In short: To help engineers stop reinventing the wheel for every project involving data processing and management. I think there’s a lot of common functionality at least for small to medium-scale needs (obviously, there’s no getting around the need for custom solutions for organizations dealing with highly specialized data/use cases or truly massive volumes of data. But such organizations probably have the time, inclination, and in-house talent necessary to build whatever they need).

I’d like to think I’m targeting users that have been manually processing data with scripts on their own workstations and are in need of something a little bit more automated and robust.

How Does it Work?

The idea is to allow users to design simple data flows that ingest data, perform transformations and/or run data through different scripts and external tools, and export the data elsewhere (could be the file system, object storage, a database, etc).

A user constructs a flow made out of processors organized into “phases.” Processors can be of various types, for example: Ingest, for bringing data into the system, Script, for executing basic transformations on data, Collect, for collecting results and exporting them to external data stores, and more. Phases are meant to help the user layout the dependencies between processors (i.e., a flow should run in phase order), although it should be noted the system doesn’t necessarily obey this order if a processor lacks a hard dependency on a previous phase.

Here’s a basic example of a Script Processor:

This is a super-simple example that just takes in a string and converts it to upper case. As you can see, this processor allows you to write an arbitrary script in Python that modifies your data and passes it along to the next processor in the flow (As an aside, the Script Processor supports Clojure as well – I couldn’t resist adding support for my favorite language). The flow interface is also nice enough to let you know when something goes wrong:

Something has gone wrong in a Script Processor

When data is collected, it is exported to an external data source, and an entry is made into a “Collection” of the user’s choice:

Currently only a few rudimentary types of processors are available, but I definitely plan to expand this. It’s a design goal that OpenDMP be expandable with Plugin processors, for example.

Technology

OpenDMP architecture (Note: not 100% implemented yet)

If I haven’t lost you yet, read on as I delve into the architecture and technical details of OpenDMP:

First, OpenDMP is a system made up various services running in a containerized environment. It does not make any assumptions about your container orchestration solution (there are sample docker-compose files in the repo), since that choice should be dictated by your workload, infrastructure, and ops expertise. Also, beyond the core services required to run a basic system, users will probably want to run additional services (databases, external tools, etc.) that interact with OpenDMP within their environment.

Core Services

The frontend consists of a re-frame application written in Clojurescript. This was my first project using re-frame and Clojurescript and I found it to be a huge win over javascript. State management in particular was much less of a problem than I’d experienced on previous projects. The astute observer will probably notice the use of Material-UI as well. I found integrating Material-UI components with re-frame to be completely painless.

The backend services are written in Kotlin using Spring Boot. The Dataflow Service exposes an API for the frontend and owns the MongoDB database that stores dataflow, processor, and collection information. The Processor Service (also Kotlin + Spring Boot) is where the data flows are actually “run.” It relies heavily on Apache Camel to control the flow of data through the system. It also uses a Redis cache to keep track of things as it executes dataflows. If you’re wondering how the scripting support works, Clojure can be evaluated by the Clojure interpreter right on the JVM, whilst Python support is achieved using Jep, which embeds CPython in the JVM via JNI (I thought this a better solution than Jython).

Communication between the various backend services utilizes Apache Pulsar, which is starting to get a lot of attention as a Kafka alternative. Before I came across Pulsar in my research, I was torn between Kafka and using a traditional message broker such as RabbitMQ. Pulsar makes an attempt at combining the functionality of both, and at least so far appears to do a fairly decent job of it.

OpenDMP uses OpenID for auth and Keycloak, a widely used SSO and Identity Management solution, is used to handle Identity and Access Management. In fact, user management is completely absent from the rest of the OpenDMP system as well as the frontend. From a development perspective, this simplified things tremendously. The use of an industry-standard auth mechanism (OpenID) should also help in situations where OpenDMP needs to be integrated into an existing enterprise architecture.

Everything sits behind HAProxy that does reverse proxying and ssl termination.

How a Dataflow Runs

So, what happens when you build a dataflow in the UI and click the “enable” button? Well, after the request is sent to the Dataflow API, the following sequence is executed:

The Dataflow Service loads the Dataflow and creates a “Run Plan.” It does this by analyzing the dataflow and mapping out the dependencies between processors to determine the order in which different processors need to be executed, and which can be run in parallel. Note: “Phases” are ignored here in favor of finding the most efficient way to execute the dataflow – phases are a device of the UI/UX to help users layout their flows.
The Run Plan is sent to the Processor Service. It contains everything the Processor Service needs to run the dataflow.
The Processor Service goes through the Run Plan and generates Camel routes and begins executing them immediately.
Errors and/or Collection results are sent back to the Dataflow Service via Pulsar.

It sounds simpler than it is, I promise 🙂

Future Plans

Quite a bit of the functionality I’ve got planned hasn’t been done yet, of course, and OpenDMP is definitely not production-ready at the time of this writing. There’s still critical features to be added and much proofing to do on various workloads. Big features still to be added include:

Aggregation Processors – bringing divergent flow paths together again.
Plugin processors – I’d like to see common Data Science/ML tools wrapped in OpenDMP API-aware services for clean integration in dataflows.
UI improvements – In particular, better searching, sorting, etc. of Data sets
Notification support – Send notifications to people when things go wrong (email, et al).
Better scaling – While multiple instances of the processor services can be run to better support running many dataflows, within a dataflow it would be helpful to create worker nodes for the Processor service to distribute individual processor tasks using Camel and Pulsar.
And of course, more ingest and export data stores need to be supported.

Final Thoughts

This has been a challenging and fun project so far to build, I hope others will eventually find it useful. Also, it’s open source, so contributions are definitely welcome!

Golang – The Okayest Programming Language

Note: This is a repost (and expansion) of my quora answer here.

Someone on Quora recently posted a question asking for developers to discuss their negative experiences with Golang. That got me thinking about my past (and current) experiences with the language, and I ended up replying with a list of pros and cons:

It’s hard to hate Golang. It’s also hard to love Golang. It has got to be the okayest programming language I’ve ever used.

Things I like about Go:

The development environment: Simple to get set up and use. I usually work in Javaland and have had to suffer through Ant/Ivy, Maven, sbt, et al., so Go’s environment feels much more thought out and coherent by comparison.
No need for an IDE – Java development practically requires the use of an IDE to be productive (as much as I like IntelliJ, I resent having to keep a heavyweight IDE running all the time). I routinely edit Go source files in vim and compile/run/test all from the command line. Which is nice.
It isn’t Object Oriented (OOP has its uses, but I hate, and I mean hate the everything-is-an-object paradigm in Java). Object-Oriented Design (the raison d’être of Java) in particular needs to die.
Channels and goroutines. Not a terrible way to do concurrency. Nicer than managing threads in Java. Not as powerful as Clojure or Akka (but definitely simpler, at least in the case of Akka).

Things I don’t like about Go:

I’m a big fan of functional programming (I use Clojure or Scala on the JVM whenever I can get away with it). Golang isn’t a functional programming language. This is more a problem with me than Golang, I suppose 🙂
Lack of some advanced programming constructs. Generics, in particular, are something I miss, especially given my last point:
The type system. It’s clunky, limiting, and just feels so antiquated. Easily the worst type system of any language I’ve used in the last 20 years. Ada, for example, has the best static type system I’ve ever seen. Scala is pretty decent as well. This is the main reason I only use Go for small utilities and can’t stand programming large projects with it.

I don’t have a lot of optimism anymore about the future of Golang as a general purpose language. It seems to be carving out a niche for itself in network services (it’s become especially popular with fans of ‘microservices’), but I can’t see it expanding much beyond that. It’s not really suitable for use as a systems programming language (better to use something like Rust or even good ‘ol C), and I certainly don’t see it displacing Java in enterprise environments. As for myself, I’ll continue to use it here and there when it suits the problem I’m working on, but it’s not going to be my “go to” language for general use (Clojure fills that role for me at the moment).

Building a SaaS product with Clojure

The Idea

Like many software developers, I spend a lot of my spare time on side projects. Some side projects are purely for my own edification: learning a new language (such as my current favorite, Clojure), or exploring a new technology or problem domain. However, a great many project ideas that tend to pop into my head at random intervals tend more toward the entrepreneurial. These sorts of ideas either originate internally (usually the most insane or impractical, since I tend to live inside my own head a little too much), or as a result of observing (or experiencing) problems for which a good solution is not immediately evident. The former can usually be disposed of quickly by subjecting them to a few thought experiments, while the latter type tend to be more persistent, begging to be investigated further.

So, enter my current obsession with creating an enterprise wiki that isn’t painful to use. I’ve used many expensive “Knowledge Management” and corporate wiki tools over my career. Some are better than others, but I have yet to find one that I actually like. Either they have serious problems in the information search and discover-ability department (e.g., SharePoint), or they seem to be designed around the idea that you want to spend your time interacting with the application itself rather than finding the information you need before going back to your actual job-related tasks (e.g., various tools with overwrought interfaces with hundreds of knobs and buttons that consume half of the available screen real estate).

I want a KM tool in which content is front-and-center and an interface that stays out of the way. I also want a functional search capability and a way to browse information and content when I don’t already know exactly what I’m looking for. My first attempt at this ended up in what was basically a traditional wiki with perhaps a nicer interface. The voices in my head laughed at me, saying “So, what’s different about this thing? You’ve created nothing new. Forget it and try again.” Eventually a new concept presented itself to me, which I’ve named Contabulo. A wiki with a much more visual interface – inspired by card-based PM tools such as Trello, but geared toward being a knowledge management and collaboration tool. The basic idea for the interface came to me when I observed my wife using Pinterest: “There should be a productivity tool that looks kind of like that.” Earlier versions of Contabulo even used a masonry-like layout, but I quickly realized that was a poor fit and switched to a gridded layout based on CSS-grid (which happened to gain majority browser support just in time to be usable).

Given that I was tackling this project alone, in my spare time, I needed to select tools and technologies that would help me maximize my personal productivity. While I’m paid good money to develop in Java at my day job, I have to say Java is absolutely atrocious for individual productivity. I could write at length about my problems with Java, but suffice to say that a high-ceremony language that forces you into designing your software architecture in terms of class hierarchies (i.e., OOD) does a pretty good job of sucking the joy (and speed) out of software development. When I’m beginning a new personal project where I have total freedom of choice, I will generally gravitate toward a functional programming language. Being that I’m both a huge Lisp fan and also very (albeit somewhat unwillingly) familiar with the Java ecosystem, I chose Clojure to implement the backend services.

Architecture

My background in enterprise software development is clearly evident in the system architecture 🙂 My favorite relational database, PostgreSQL, serves as the primary data-store. Most of the CRUD (Create, Read, Update, Delete) functionality is taken care of by the “API Service”, which is built around the excellent compojure-api. Now, when it comes to working with a relational database, I’m generally used to being forced to do things the Java way and use an ORM like hibernate or eclipselink. However, with a functional programming language an ORM doesn’t make sense (not that an ORM makes sense in any case, but I digress), and I found HugSQL to be an awesome way to work with a SQL database. I’ll take writing SQL queries anytime over composing JPA/Hibernate criteria queries. Additionally, rows are written to and read from the database as simple Clojure maps and vectors, rather than having to define domain objects for every table in the database.

The contabulo-auth service is responsible for verifying user login credentials an issuing tokens. Tokens are generated and verified using the buddy-sign library.

Elasticsearch powers the search functionality, provided by the “Search Service” and is kept in sync with the main database via an “Index Service.” I actually opted to use Elastic’s own Java SDK for interacting with elasticsearch, writing Clojure wrappings as necessary (One of the many benefits of running on the JVM is the wealth of pre-existing Java libraries for just about anything you can think of — the only downside being they’re written in Java and you’ll have to wrap them in your language of choice).

Synchronous communication between processes is accomplished using HTTP RESTful interfaces, while asynchronous tasks are dispatched via RabbitMQ. The contabulo-mail service, for example, is responsible for sending email notifications to users and receives all of its tasks via a JMS queue. For communicating with RabbitMQ from Clojure services, the Langohr library is used.

The frontend is a rich javascript web application (I decided to stick with plain javascript/ES6 rather than go with clojurescript) written using the mithril.js framework. Mithril is a great little client-side framework that boasts a very functional-style of javascript programming (truth be told, you can be quite functional with javascript, but it requires a lot of discipline).

Clojure (and FP) Benefits

Thanks to Clojure, I have backend services composed from side-effect free functions and immutable data structures. Whole classes of runtime errors I’m used to seeing on large Java applications are simply absent from Contabulo. This makes iterating on the product and adding new features *so* much easier. Further, ‘statelessness’ is considerably easier to achieve in Clojure, partly due to it’s non OO-ness, partly due to its immutability-by-default design. This means that adding additional instances of each service will be much easier.

I’ve also found the unit testing experience in Clojure to be vastly superior to Java’s. Though, I doubt this is due to any inherit superiority of clojure-test over JUnit but rather simply a result of the fact that side-effect free deterministic functions are easier to unit test. In any case, it’s still a win. As an aside, I’ve also found Scala to have a much more pleasant testing experience than Java (probably tied with Clojure in that regard).

While the frontend web application was written in ES6, the mithril.js framework’s functional style has resulted in a rich web application constructed from modular, de-coupled components that can be reused easily. This makes regressions on the frontend much less likely as well (Which is good in my case, because I’m really much more of a backend developer).

I will have to admit to writing one small transient service in a non-FP language. A small program written in Golang is triggered by a systemd timer periodically to perform housekeeping tasks (moving files around, doing some basic database ops, etc). This was one case where I didn’t want to suffer the overhead of spinning up a JVM every ten minutes and since the housekeeping service does nothing but I/O operations (which tend to be inherently procedural anyway), I decided to go (pun sort of intended) with something simple.

Conclusion

I’ve heard some critics say that Clojure is unsuitable for big projects with large teams, usually due to some combination of dynamic typing and difficulty in finding Clojure developers. I’m not sure I’d agree with that assessment. I think prodigious use of type hints and tools like schema (and more recently, clojure.spec) can largely mitigate most of the problems caused by dynamic typing, and I’m not convinced the latter issue even exists. Contabulo is the largest project I’ve done using Clojure (or any Lisp, for that matter) and now that the all of the basic architecture and core functionality is in place, I’m quite pleased with how quickly I was able to build it and how easy it is to add new features or make modifications. I think that’s a win. Now it remains to be seen how Contabulo succeeds as a product.

On Boards and Cards

Unless you’ve been living under a rock, you’ve doubtless noticed the current trend in project management tools involving “boards full of cards.” Trello is probably the most well known (and cheapest, being free, which might have something to do with its popularity) of this archetype. Other, more elaborate (and expensive) enterprise tools have implemented “agile” boards as part of their offerings.

Personally, I think most of these tools are well suited for their intended use case: Agile project management. However, as someone who’s tried to use Trello as a more, erm, general purpose tool, I’ve found it lacking (shopping lists? No. Notes? No). Which is fine, it’s a specialized tool designed for a particular set of use cases, which just so happen to not align with my needs. Most of these PM tools assume the following:

1.) You are managing projects (duh). Which are temporary efforts with a definite start and end date. Thus, boards are meant to be used temporarily.

2.) You’re using some sort of Agile (or perhaps Kanban) Project Management methodology. You may not like Agile, or it may not suit your particular effort.

3.) They’re not geared toward long term knowledge management, you’ll need a completely separate tool (such as a wiki) for this.

Issues 1 and 3 are related. You want to store lessons learned, institutional knowledge and other “knowledge” artifacts created during the course of a project. The traditional answer to this is to make your engineers enter this knowledge into a wiki, as the project board that was so nice for organizing and tracking tasking turns out not to be so great when it comes to organizing and archiving information for long term usage. This, of course, introduces yet another ‘chore’ for engineers who would rather be building things. And it boils down to the fact that taking the time to enter information into a wiki takes people out of their normal workflow. Thus, it is frequently forgotten and/or skipped when deadlines loom. I, for one, have never seen a corporate wiki that was well maintained and kept up-to-date 🙂

So, I think a worthy goal should be to keep information together as much as possible. Place notes, design artifacts, project planning documents, meeting minutes, miscellaneous documentation, etc. in the same system you also place your tasking “cards,” thus folding “updating the wiki” into your normal workflow. The trick, of course, is to keep this vast pile of information from turning into an unmanageable, unsearchable, and unusable mess.

That’s the idea I’ve been pursuing with my latest project, which I’ve named Contabulo.

Half wiki, half board

contabulo_screenshot1 — An example Contabulo project board

From the screenshot, at first blush, Contabulo looks similar to an agile board. Unlike Trello and friends, however, these cards aren’t organized into lists. In fact, there’s not a set ordering or arrangement at all as the interface assumes (at least on a well-populated board) that you’re going to be making use of that search bar to find the cards you seek. The search feature is powered by Elasticsearch on the backend, so it does a fairly decent job (the ‘main’ database is PostgreSQL). As you can see from the graphic, cards can be assigned different colors, and you can assign a board background image, which I know is pure vanity.

Cards

contabulo_donut_card — A Contabulo card with a header image

Contabulo cards are designed to be flexible, and contain anything from a quick note to an entire ‘wiki’-esque article. The editing format is Markdown (specifically, Commonmark with a few extensions), and I don’t know about you, but I much prefer Markdown to the Wikitext format you’ll find in MediaWiki based wikis 🙂 As you can see from the image above, you can also attach files (including images), and can specify a card image to appear in the header.

Card_in_edit_mode — A card being edited using Markdown – at some point I’ll add a rich editor

In addition to the card body and attachments, you can add special ‘content’ blocks to a card. As of this writing, checklists are supported, but it is easy to imagine adding many other types of useful content: maps (think – directions to events), discussion/comment sections, integrations with third party apps, etc. You can also add ‘tags’ to cards, which of course are optional, but the search index does use to improve results.

Groups

Contabulo has the concept of ‘groups’ in order to control access to different boards. Each user can have ‘personal’ boards which are either totally private, or can be given public read-only access for outsiders. “Group” boards can likewise be visible only to the group, or additionally be read-only to non-group members (one notional use of a public read-only board could be a job board). Each group member can be assigned read-only, editor, or admin-level access.

Tech

For the curious, I’ll give a brief overview of the Contabulo “tech stack” (I think that’s what the kids are calling it these days). I’ve been building Contabulo in my spare time (evenings and weekends), since right now I have to keep a day job in order to eat and pay the mortgage. Thus, the need to maximize my personal productivity was definitely a major factor in selecting tools and technologies. The backend is written primarily in Clojure, which is in the Lisp family of languages and runs on the Java Virtual Machine. I chose Clojure because:

I Like Lisp, and think it’s an amazingly productive language
I have a large amount of familiarity with the JVM and the Java ecosystem in general (and there are a *huge* number of helpful Java libraries which you can easily call from Clojure), even if I may not care much for the Java language itself 🙂

I briefly flirted with the idea of writing the backend services in Haskell after working through Learn You a Haskell (great book, btw), but then the fever broke and I got better.

So, I’ve split up my backend into several Clojure services (not quite as fine-grained as ‘microservices‘ would be, but I don’t care), and one little database maintenance/housekeeping utility I wrote in Golang that runs once an hour so. interprocess communication is largely using RabbitMQ, though some services call each other using HTTP when synchronous communication is needed. I’m using my favorite webserver, Nginx, as a load-balancer (and also for SSL termination), and all of my backend services are stateless so that running multiple instances of each shouldn’t be a problem. I’ve tried to lay the foundation for a scalabe architecture which will hopefully prove necessary. I suppose I could have probably just spun up a Ruby on Rails app much more quickly, but where is the fun in that? 🙂

As I mentioned earlier, the main database is PostgreSQL, and search is powered by Elasticsearch (though a Clojure ‘search’ service sits in front of the Elasticsearch service, since ES, as with any database, shouldn’t be directly exposed to the Internet).

For the web application, I used ES6 with the Mithril microframework and Bootstrap, along with a plethora of smaller Javascript libraries, of course. I’m not the most experienced or skilled frontend developer in the world, but I think I do ok (when properly incentivized, like when there’s noone else to do it).

Conclusion

Contabulo isn’t my first attempt at a Software-as-a-Service (SaaS) offering, but I feel pretty good about this one. I find it useful, in any case, and will continue to use it (and keep it updated) for my own needs. My focus right now is to complete testing and squash any show-stopping bugs so I can launch this thing and hopefully start getting some paying users. I have a long list of features I want to implement, but I think Contabulo is far enough along it’s time to start getting user feedback in order to guide its future direction. If you’d like to receive updates (i.e., to be notified when Contabulo is live), please sign up for the mailing list!

Implementing Search-as-you-type with Mithril.js

I’ve been working on a new project, the front-end of which I’m coding up in ES6 with Mithril.js (using 1.0.x now after spending the better part of a day migrating from 0.2.x). I wanted to implement “search as you type” functionality, since I’m using Elasticsearch on the back-end for its full-text search capability.

Took me a bit of trial-and-error, but I came up with this mithril component that provides a simple text field that automatically fires a callback periodically (the timing is configurable)

export let SearchInputComponent = {

    /**
     * attributes:
     * helpText - small text to display under search box
     * callback - callback to be fired when timeout/minchar conditions are met
     * minchars - min number of characters that must be present to begin searching
     * inputId  - DOM ID for input field
     * @param vnode
     */
    oninit: function(vnode) {
        vnode.attrs = vnode.attrs || {};
        // Search box caption
        vnode.attrs.helpText = vnode.attrs.helpText || "Enter search text here";
        //Callback to fire when user is done typing
        vnode.attrs.callback = vnode.attrs.callback || function(){};
        //Minimum number of characters that must be present to fire callback
        this.minchars = vnode.attrs.minchars || 3;
        this.timeout = vnode.attrs.timeout || 1e3; //default 1 second for input timeout
        this.inputId = vnode.attrs.inputId || "searchTextInput";
        this.timeref = null;
        this.err = stream(false);

        this.doneTyping = function(){
            let searchString = $('#' + this.inputId).val();
            if (this.timeref && searchString.length >= this.minchars){
                this.timeref = null;
                vnode.attrs.callback(searchString);
            }

        }.bind(this);

        this.oninput = function(event){
            if (this.timeref){
                clearTimeout(this.timeref);
            }
            this.timeref = setTimeout(this.doneTyping, this.timeout);
        }.bind(this);

    },

    view: function(vnode) {
        return m("fieldset", {class: "search-input-box"}, [
            m("input", {type: "text", class: "form-control", id: vnode.state.inputId,
                autofocus: true, "aria-describedby": "searchHelp", oninput: vnode.state.oninput,
                onblur: vnode.state.doneTyping}),
            m("small", {id:"searchHelp", class: "form-text text-muted"}, vnode.attrs.helpText)
        ]);
    }

};

Pretty basic. In my case, I wanted to avoid firing my search callback (which makes a request to my back-end search service) until a certain number of characters had been entered.

Thoughts On Java 8 Functional Programming (and also Clojure)

After working with Java 8 for the better part of a year, I have to say I find its new “Functional” features both useful and aggravating at the same time. If you’ve used true functional programming languages (or you use them on your own personal projects while being forced to use Java at work), you’ll find yourself comparing each of Java 8’s new functional constructs with those in say Clojure, Scala, or Haskell, and you’ll inevitably be dissapointed.

Case in point: Java 8 includes an “Optional” container which wants you to treat it like a monad. Now, I find it useful, but of course it’s nowhere near as nice as Scala’s “option” or Haskell’s “maybe”:

For example, at work I found myself having to refactor an older part of the code base. Attempting to think “Functionally” I decomposed this block of fairly typical Java (simplified with class and variable names changed, of course) :

arrivals = new HashMap<>();

for (Delivery delivery : route.getDeliveries()) {
    Truck deliveryTruck;
    try {
        deliveryTruck = delivery.getPackage().getTruck();
    } catch (NullPointerException ex) {
    //Log the error
    continue;
    }
    if (deliveryTruck == null || arrivals.containsKey(deliveryTruck)) {
        continue;
    }

    Location truckLoc;
    try {
        truckLoc = deliveryTruck.getLocation();
    } catch (NullPointerException ex) {
        //Log error
        continue;
    }
    arrivals.put(deliveryTruck, new TimeAndLocation(delivery.getDeliveryTime(), truckLoc));
}

Into this:

public Optional getArrival(Delivery delivery){
       return delivery.getPackageOpt()
            .flatMap(Package::getTruckOpt)
            .map((t) -> new TimeAndLocation(delivery.getDeliveryTime(), t.getLocation()));
}


public List getArrivalsFoRoute(Route route){
        Map<Truck, TimeAndLocation> arrivals = new HashMap<>();
        for (Delivery delivery : route.getDeliveries()){
            Optional truck = delivery.getPackageOpt().flatMap(Package::getTruckOpt);
            truck.flatMap((t) -> getArrival(delivery)).ifPresent((a) -> arrivals.put(truck.get(), a));
        }
        return new ArrayList<>(arrivals.values());
}

These two functions have the advantage of being easier to reason about and trivial to unit test individually. Java 8’s Optional monad eliminates the possibility for NullPointerExceptions, so there is no longer an excuse for the evil “`return null;“` when you can just as easily do “`return Optional.empty();“` 😉

Now, the bad parts: Optional inherits from Object. That’s it. It’s really just a Java container class. Much nicer (and more useful) would have been a hierarchy of Monadic types (Either, an “Exception” monad, etc) all implementing a common Monad interface. But alas, no.

That said, this code is fairly functional, if ugly. True functional programming languages make things like this easier to express (and easier to express elegantly).

Off the top of my head, in clojure you could do something like this:

(defn arrival [delivery] {:datetime (get-dtime-M delivery) :location (get-truck-M delivery)}) 

(defn arrivals [route] (cat-maybes (map #(arrival %) (get-deliveries route))))

Which I think everyone would agree is easier to read. Now, I’m making use of Monads here, which aren’t really “built-in” to the language, but Clojure is extensible enough (macros, etc.) that adding them is easy. Personally, I rather like this library: http://funcool.github.io/cats/latest/. So, if we assume our get-dtime-M and get-truck-M functions return “maybe” monads (which contain either a ‘Just x’ or a ‘Nothing’), we get the same advantage as Java 8’s Optionals (no null checks littering our code). We can also wait to evaluate the value of our Maybe monads till the end of our processing (The cat-maybes function does that, pulling only the ‘Just’ values from our array of Maybe monads generated by the map function).

In addition to Maybe, The cats library exposes many other useful monadic types (and you can create your own as well) which have no Java 8 counterpart.

Of course, If you really want to explore the full power of Monads, I’d suggest learning some Haskell — then come back to Clojure (or Scala, I suppose) when you need to write some real-world software 🙂

C and Go – Dealing with void* parameters in cgo

Wrapping C libraries with cgo is usually a pretty straightforward process. However, one problematic situation I’ve come across recently is dealing with C functions which take a void* type as a parameter. In C, a void* is a pointer to an arbitrary data type.

I’ve been using one C library which allows you to store arbitrary data in a custom Tree-like data structure. So naturally, you want to be able to feed a raw interface{} to your wrapper function. Suppose your C library has two functions:

void set_data(void *the_data);
void *get_data();

What should our Go wrapper pass to set_data? The easy solution is to do this:

func Wrapper SetData(theData interface{}) {
    C.set_data(unsafe.Pointer(&theData));
}

And this would work, prior to Go 1.6 (and is in fact the solution I had been using in my wrapper — though I don’t really use this feature of the library in question myself). This, however, is dangerous (and also has the obvious problem that calling C.get_data is going to give you an unsafe.Pointer — hope you remember what kind of data you stored). You’re passing a Go pointer to a C function which then stores said pointer, which means the Go garbage collector can no longer manage it. This was considered so inadvisable that it is completely disallowed in Go 1.6 (try it and you’ll be rewarded with a nice runtime error: panic: runtime error: cgo argument has Go pointer to Go pointer )
Since I was busy building stuff using Go instead of following all of the discussions about this issue on the golang mailing list, I missed this little gem and had to spend some time figuring out why my builds started failing

So, is there a solution to this? Not really. Well, nothing good. You could do something like this, I suppose:

func SetData(theData interface{}) error{
	//Get bytes from raw interface
	var buf bytes.Buffer
	enc := gob.NewEncoder(&buf)
	if err := enc.Encode(theData); err != nil {
		return false
	}
	data := buf.Bytes()
	C.set_data(unsafe.Pointer(&data[0]))
	return nil
}

What’s going on here? Well, we’re making use of the encoding/gob package to convert the go interface{} into a Go byte array. Then we’re passing a pointer to the first element in our byte array to C.set_data.

I don’t like it much. I mean, sure – the data is there, but in order to extract the data later and gob-decode it, you’re going to need to remember the size of the byte array you fed into C.set_data.

So, let’s just disallow passing a raw interface{} to our wrapper function. We could force the user to give us a byte array. This simplifies things a bit:

func SetData(data []byte) {
	set_data(unsafe.Pointer(&data[0]))
}

But we still have the problem of remembering the length of our byte array so we can extract it later using C.GoBytes.

So, unfortunately passing arbitrary data from a Go interface{} into a C void* is just not practical. For my particular C library wrapper, I just decided to require the user to supply a string rather than an interface{}. This has the benefit of being easier to convert back and forth and you also don’t have to stash things like array sizes or references to C.free later. Strings are reasonably versatile in that you could, say, convert a struct to JSON or even b64encode some binary data and shove it into a string if you really need to. You could implement this solution thusly:

func SetData(theData string) {
	cstr := C.CString(userData)
	res := C.set_data(unsafe.Pointer(&cstr))
}

func GetData() string {
	data := C.get_data()
	return C.GoString((*C.char)(*(*unsafe.Pointer)(data)))
}

As you can see, converting the data back from a void* (unsafe.Pointer) to a String isn’t very straightforward, but it works well enough.

Making an extensible wiki system with Go

My little side project for roughly the last year has been a wiki system intended for enterprise use.

This certainly isn’t a new idea — off the top of my head, I can think of several of these “enterprise” wiki systems, with Confluence and Liferay being the most obvious (and widely used) examples. The problem with the current solutions, at least in my mind, is that they have become so bloated and overloaded with features that they are difficult to use, difficult to administer, and difficult to extend. I’ve been forced to use these (and other) collaboration/wiki systems and while I see the value in them, the sort of system I want to use just doesn’t seem to exist.

My goals were/are thus:

Provide basic wiki functionality in a simple and clean UI as part of the core system
Use markdown as the markup language (never liked wikicode) for editing
Be horizontally scalable (I’ve suffered through overburdened enterprise software)
Be extensible, both in the frontend UI (plugins) and the backend services
Don’t run on the JVM (because reasons) 😉

Ok, that last one is kind of a joke (but not really). At least the core system ought to be native, while of course additional services can be written in just about any programming language.

After a year of working a few hours a week on this, I’ve come up with something I call Wikifeat (hey, the .com was available). It’s not finished, of course, and likely won’t be for a while yet. Building an enterprise wiki/collaboration platform is proving a daunting task (especially goal #4 above), but the basics are done and the system is at least usable:

wikifeat_screenshot — Screenshot of the Wikifeat interface, showing a map plugin.

Technology

The Wikifeat backend consists of a CouchDB server(s) and several small (you might almost call them ‘micro’) services written in Go. These services have names like ‘users’, ‘wikis’, etc. denoting their function. Multiple instances of these services can be run across multiple machines in a network, in order to provide scalability. The ‘frontend’ service acts a router for user requests from the Wikifeat web application, directing them to the appropriate service to fulfil each request. Backend services communicate directly with one another when needed as well via RESTful APIs.

The service ‘instances’ find each other via a service registry. I decided on the excellent etcd to serve as my registry. Each service instance maintains a cache of all of the other services registered with etcd that it can pull from when it needs to send a request to another service.

Extending the backend is a simple matter of developing a new service, registering it with etcd, and making use of the other services’ REST APIs. The frontend service also has a facility for routing requests to these custom services (in the anticipated common use case of pairing a front-end javascript ‘plugin’ with a custom backend service). Frontend plugins are written in Javascript and placed in a directory in the frontend service area.

The webapp itself is written as a single-page application with Backbone and Marionette. The Wiki system is mostly complete. Pages can be edited using CommonMark, a rather new implementation of Markdown. I personally like Markdown for its simplicity, and always hated the various WYSIWYG HTML editors commonly included with CMS / Collaboration software. Most developers already know markdown, and most non-techies should be able to pick it up quickly (being that it *is* meant to be a human-readable markup language):

wikifeat_edit — Wikifeat markdown editor

If that text editor looks familiar, it’s basically a tweaked version of the wmd editor used on stackexchange 🙂

Future Plans

I’m looking forward to continuing to evolve this thing. The frontend probably needs the most help right now. I actually hope to replace the wmd markdown editor with something more ‘custom’ that can make inserting plugins easier. I’d also like to allow plugins to add custom ‘insertion’ helper buttons to the editor, along with a plugin ‘editor’ view, rather than requiring the user to enter a custom div block.

My mind is also overflowing with ideas for future plugins. Calendars, blogs, integration with third party services/applications, etc. Hopefully I can get to those eventually. It would be *really* swell if someone else took on some of that work as well. The project is open source (~~GPLv2~~ BSD), so that’s certainly possible…

UPDATE: After some feedback and reflection, I’ve decided to change the license from GPL to BSD 🙂

Creating RPMs from python packages

While I can use pip to install additional python packages on my development box, sometimes I need to deploy an application into an environment where this isn’t possible. The best solution if the target box is an RPM-based linux distro is to install any necessary python dependencies as RPMs. However, not all python packages are available as rpms.

To build them yourself, you’ll need a package called py2pack. Install it thusly:

pip install py2pack

Let’s say you need to RPM-ify the fastkml package. On CentOS/Fedora/RHEL, do the following:

mkdir -p ~/rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS} # If you don't already have this

cd ~/rpmbuild/SOURCES
py2pack fetch fastkml 0.9  # Version number is optional

This will download the fastkml tarball into ~rpmbuild/SOURCES. Next, you’ll need to create the RPM spec file. Py2pack has a few templates, we’ll use the ‘Fedora’ one:

cd ~/rpmbuild/SPECS
py2pack generate fastkml 0.9 -t fedora.spec -f python-fastkml.spec

This will generate a spec file, which you may then feed into rpmbuild:

rpmbuild -bb python-fastkml.spec #Use -bs to build a source rpm

This hopefully should work, and will dump an rpm file into the ~/rpmbuild/RPMS directory. Note: This isn’t perfect, I’ve already encountered a few python packages for which this procedure doesn’t work cleanly.