Implementing Search-as-you-type with Mithril.js

I’ve been working on a new project, the front-end of which I’m coding up in ES6 with Mithril.js (using 1.0.x now after spending the better part of a day migrating from 0.2.x).  I wanted to implement “search as you type” functionality, since I’m using Elasticsearch on the back-end for its full-text search capability.

search_box

A simple user search box

Took me a bit of trial-and-error, but I came up with this mithril component that provides a simple text field that automatically fires a callback periodically (the timing is configurable)

export let SearchInputComponent = {

    /**
     * attributes:
     * helpText - small text to display under search box
     * callback - callback to be fired when timeout/minchar conditions are met
     * minchars - min number of characters that must be present to begin searching
     * inputId  - DOM ID for input field
     * @param vnode
     */
    oninit: function(vnode) {
        vnode.attrs = vnode.attrs || {};
        // Search box caption
        vnode.attrs.helpText = vnode.attrs.helpText || "Enter search text here";
        //Callback to fire when user is done typing
        vnode.attrs.callback = vnode.attrs.callback || function(){};
        //Minimum number of characters that must be present to fire callback
        this.minchars = vnode.attrs.minchars || 3;
        this.timeout = vnode.attrs.timeout || 1e3; //default 1 second for input timeout
        this.inputId = vnode.attrs.inputId || "searchTextInput";
        this.timeref = null;
        this.err = stream(false);

        this.doneTyping = function(){
            let searchString = $('#' + this.inputId).val();
            if (this.timeref && searchString.length >= this.minchars){
                this.timeref = null;
                vnode.attrs.callback(searchString);
            }

        }.bind(this);

        this.oninput = function(event){
            if (this.timeref){
                clearTimeout(this.timeref);
            }
            this.timeref = setTimeout(this.doneTyping, this.timeout);
        }.bind(this);

    },

    view: function(vnode) {
        return m("fieldset", {class: "search-input-box"}, [
            m("input", {type: "text", class: "form-control", id: vnode.state.inputId,
                autofocus: true, "aria-describedby": "searchHelp", oninput: vnode.state.oninput,
                onblur: vnode.state.doneTyping}),
            m("small", {id:"searchHelp", class: "form-text text-muted"}, vnode.attrs.helpText)
        ]);
    }

};

Pretty basic.  In my case, I wanted to avoid firing my search callback (which makes a request to my back-end search service) until a certain number of characters had been entered.

Thoughts On Java 8 Functional Programming (and also Clojure)

After working with Java 8 for the better part of a year, I have to say I find its new “Functional” features both useful and aggravating at the same time.  If you’ve used true functional programming languages (or you use them on your own personal projects while being forced to use Java at work), you’ll find yourself comparing each of Java 8’s new functional constructs with those in say Clojure, Scala, or Haskell, and you’ll inevitably be dissapointed.

Case in point: Java 8 includes an “Optional” container which wants you to treat it like a monad.  Now, I find it useful, but of course it’s nowhere near as nice as Scala’s “option” or Haskell’s “maybe”:

For example, at work I found myself having to refactor an older part of the code base.  Attempting to think “Functionally” I decomposed this block of fairly typical Java (simplified with class and variable names changed, of course)  :

arrivals = new HashMap<>();

for (Delivery delivery : route.getDeliveries()) {
    Truck deliveryTruck;
    try {
        deliveryTruck = delivery.getPackage().getTruck();
    } catch (NullPointerException ex) {
    //Log the error
    continue;
    }
    if (deliveryTruck == null || arrivals.containsKey(deliveryTruck)) {
        continue;
    }

    Location truckLoc;
    try {
        truckLoc = deliveryTruck.getLocation();
    } catch (NullPointerException ex) {
        //Log error
        continue;
    }
    arrivals.put(deliveryTruck, new TimeAndLocation(delivery.getDeliveryTime(), truckLoc));
}

Into this:

public Optional getArrival(Delivery delivery){
       return delivery.getPackageOpt()
            .flatMap(Package::getTruckOpt)
            .map((t) -> new TimeAndLocation(delivery.getDeliveryTime(), t.getLocation()));
}


public List getArrivalsFoRoute(Route route){
        Map<Truck, TimeAndLocation> arrivals = new HashMap<>();
        for (Delivery delivery : route.getDeliveries()){
            Optional truck = delivery.getPackageOpt().flatMap(Package::getTruckOpt);
            truck.flatMap((t) -> getArrival(delivery)).ifPresent((a) -> arrivals.put(truck.get(), a));
        }
        return new ArrayList<>(arrivals.values());
}

These two functions have the advantage of being easier to reason about and trivial to unit test individually.  Java 8’s Optional monad eliminates the possibility for NullPointerExceptions, so there is no longer an excuse for the evil “`return null;“` when you can just as easily do “`return Optional.empty();“` 😉

Now, the bad parts:  Optional inherits from Object.  That’s it.  It’s really just a Java container class.  Much nicer (and more useful) would have been a hierarchy of Monadic types (Either, an “Exception” monad, etc) all implementing a common Monad interface.  But alas, no.

That said, this code is fairly functional, if ugly.  True functional programming languages make things like this easier to express (and easier to express elegantly).

Off the top of my head, in clojure you could do something like this:

(defn arrival [delivery] {:datetime (get-dtime-M delivery) :location (get-truck-M delivery)}) 

(defn arrivals [route] (cat-maybes (map #(arrival %) (get-deliveries route))))

Which I think everyone would agree is easier to read.  Now, I’m making use of Monads here, which aren’t really “built-in” to the language, but Clojure is extensible enough (macros, etc.) that adding them is easy.  Personally, I rather like this library: http://funcool.github.io/cats/latest/.   So, if we assume our get-dtime-M and get-truck-M functions return “maybe” monads (which contain either a ‘Just x’ or a ‘Nothing’), we get the same advantage as Java 8’s Optionals (no null checks littering our code).  We can also wait to evaluate the value of our Maybe monads till the end of our processing (The cat-maybes function does that, pulling only the ‘Just’ values from our array of Maybe monads generated by the map function).

In addition to Maybe, The cats library exposes many other useful monadic types (and you can create your own as well) which have no Java 8 counterpart.

Of course, If you really want to explore the full power of Monads, I’d suggest learning some Haskell — then come back to Clojure (or Scala, I suppose) when you need to write some real-world software 🙂

C and Go – Dealing with void* parameters in cgo

Wrapping C libraries with cgo is usually a pretty straightforward process.  However, one problematic situation I’ve come across recently is dealing with C functions which take a void* type as a parameter.  In C, a void* is a pointer to an arbitrary data type.

I’ve been using one C library which allows you to store arbitrary data in a custom Tree-like data structure.  So naturally, you want to be able to feed a raw interface{} to your wrapper function.  Suppose your C library has two functions:

void set_data(void *the_data);
void *get_data();

What should our Go wrapper pass to set_data?  The easy solution is to do this:

func Wrapper SetData(theData interface{}) {
    C.set_data(unsafe.Pointer(&theData));
}

And this would work, prior to Go 1.6 (and is in fact the solution I had been using in my wrapper — though I don’t really use this feature of the library in question myself).  This, however, is dangerous (and also has the obvious problem that calling C.get_data is going to give you an unsafe.Pointer — hope you remember what kind of data you stored).  You’re passing a Go pointer to a C function which then stores said pointer, which means the Go garbage collector can no longer manage it.  This was considered so inadvisable that it is completely disallowed in Go 1.6 (try it and you’ll be rewarded with a nice runtime error: panic: runtime error: cgo argument has Go pointer to Go pointer )
Since I was busy building stuff using Go instead of following all of the discussions about this issue on the golang mailing list, I missed this little gem and had to spend some time figuring out why my builds started failing :/

So, is there a solution to this?  Not really.  Well, nothing good.  You could do something like this, I suppose:

func SetData(theData interface{}) error{
	//Get bytes from raw interface
	var buf bytes.Buffer
	enc := gob.NewEncoder(&buf)
	if err := enc.Encode(theData); err != nil {
		return false
	}
	data := buf.Bytes()
	C.set_data(unsafe.Pointer(&data[0]))
	return nil
}

What’s going on here?  Well, we’re making use of the encoding/gob package to convert the go interface{} into a Go byte array.  Then we’re passing a pointer to the first element in our byte array to C.set_data.

I don’t like it much.  I mean, sure – the data is there, but in order to extract the data later and gob-decode it, you’re going to need to remember the size of the byte array you fed into C.set_data.

So, let’s just disallow passing a raw interface{} to our wrapper function.  We could force the user to give us a byte array.  This simplifies things a bit:

func SetData(data []byte) {
	set_data(unsafe.Pointer(&data[0]))
}

But we still have the problem of remembering the length of our byte array so we can extract it later using C.GoBytes.

So, unfortunately passing arbitrary data from a Go interface{} into a C void* is just not practical.  For my particular C library wrapper, I just decided to require the user to supply a string rather than an interface{}.  This has the benefit of being easier to convert back and forth and you also don’t have to stash things like array sizes or references to C.free later.  Strings are reasonably versatile in that you could, say, convert a struct to JSON or even b64encode some binary data and shove it into a string if you really need to.  You could implement this solution thusly:

func SetData(theData string) {
	cstr := C.CString(userData)
	res := C.set_data(unsafe.Pointer(&cstr))
}

func GetData() string {
	data := C.get_data()
	return C.GoString((*C.char)(*(*unsafe.Pointer)(data)))
}

As you can see, converting the data back from a void* (unsafe.Pointer) to a String isn’t very straightforward, but it works well enough.

Making an extensible wiki system with Go

My little side project for roughly the last year has been a wiki system intended for enterprise use.

This certainly isn’t a new idea — off the top of my head, I can think of several of these “enterprise” wiki systems, with Confluence and Liferay being the most obvious (and widely used) examples.  The problem with the current solutions, at least in my mind, is that they have become so bloated and overloaded with features that they are difficult to use, difficult to administer, and difficult to extend.  I’ve been forced to use these (and other) collaboration/wiki systems and while I see the value in them, the sort of system I want to use just doesn’t seem to exist.

My goals were/are thus:

  1. Provide basic wiki functionality in a simple and clean UI as part of the core system
  2. Use markdown as the markup language (never liked wikicode) for editing
  3. Be horizontally scalable (I’ve suffered through overburdened enterprise software)
  4. Be extensible, both in the frontend UI (plugins) and the backend services
  5. Don’t run on the JVM (because reasons) 😉

Ok, that last one is kind of a joke (but not really). At least the core system ought to be native, while of course additional services can be written in just about any programming language.

After a year of working a few hours a week on this, I’ve come up with something I call Wikifeat (hey, the .com was available). It’s not finished, of course, and likely won’t be for a while yet.  Building an enterprise wiki/collaboration platform is proving a daunting task (especially goal #4 above), but the basics are done and the system is at least usable:

wikifeat_screenshot

Screenshot of the Wikifeat interface, showing a map plugin.

Technology

The Wikifeat backend consists of a CouchDB server(s) and several small (you might almost call them ‘micro’) services written in Go.  These services have names like ‘users’, ‘wikis’, etc. denoting their function.  Multiple instances of these services can be run across multiple machines in a network, in order to provide scalability.  The ‘frontend’ service acts a router for user requests from the Wikifeat web application, directing them to the appropriate service to fulfil each request.  Backend services  communicate directly with one another when needed as well via RESTful APIs.

The service ‘instances’ find each other via a service registry.  I decided on the excellent etcd to serve as my registry.  Each service instance maintains a cache of all of the other services registered with etcd that it can pull from when it needs to send a request to another service.

Extending the backend is a simple matter of developing a new service, registering it with etcd, and making use of the other services’ REST APIs.  The frontend service also has a facility for routing requests to these custom services (in the anticipated common use case of pairing a front-end javascript ‘plugin’ with a custom backend service).   Frontend plugins are written in Javascript and placed in a directory in the frontend service area.

The webapp itself is written as a single-page application with Backbone and Marionette.  The Wiki system is mostly complete.  Pages can be edited using CommonMark, a rather new implementation of Markdown.  I personally like Markdown for its simplicity, and always hated the various WYSIWYG HTML editors commonly included with CMS / Collaboration software.  Most developers already know markdown, and most non-techies should be able to pick it up quickly (being that it *is* meant to be a human-readable markup language):

wikifeat_edit

Wikifeat markdown editor

If that text editor looks familiar, it’s basically a tweaked version of the wmd editor used on stackexchange 🙂

Future Plans

I’m looking forward to continuing to evolve this thing.  The frontend probably needs the most help right now.  I actually hope to replace the wmd markdown editor with something more ‘custom’ that can make inserting plugins easier.  I’d also like to allow plugins to add custom ‘insertion’ helper buttons to the editor, along with a plugin ‘editor’ view, rather than requiring the user to enter a custom div block.

My mind is also overflowing with ideas for future plugins.  Calendars, blogs, integration with third party services/applications, etc.  Hopefully I can get to those eventually.  It would be *really* swell if someone else took on some of that work as well.  The project is open source (GPLv2 BSD), so that’s certainly possible…

UPDATE: After some feedback and reflection, I’ve decided to change the license from GPL to BSD 🙂

Creating RPMs from python packages

While I can use pip to install additional python packages on my development box, sometimes I need to deploy an application into an environment where this isn’t possible.  The best solution if the target box is an RPM-based linux distro is to install any necessary python dependencies as RPMs.  However, not all python packages are available as rpms.

To build them yourself, you’ll need a package called py2pack.  Install it thusly:

pip install py2pack

Let’s say you need to RPM-ify the fastkml package.  On CentOS/Fedora/RHEL, do the following:

mkdir -p ~/rpmbuild{BUILD,RPMS,SOURCES,SPECS,SRPMS} # If you don't already have this

cd ~/rpmbuild/SOURCES
py2pack fetch fastkml 0.9  # Version number is optional

This will download the fastkml tarball into ~rpmbuild/SOURCES.  Next, you’ll need to create the RPM spec file.  Py2pack has a few templates, we’ll use the ‘Fedora’ one:

cd ~/rpmbuild/SPECS
py2pack generate fastkml 0.9 -t fedora.spec -f python-fastkml.spec

This will generate a spec file, which you may then feed into rpmbuild:

rpmbuild -bb python-fastkml.spec #Use -bs to build a source rpm

This hopefully should work, and will dump an rpm file into the ~/rpmbuild/RPMS directory.  Note: This isn’t perfect, I’ve already encountered a few python packages for which this procedure doesn’t work cleanly.

CouchDB – check your default settings

While running a pkg upgrade on my FreeBSD box, I noticed the following message scroll by while couchdb was being updated to the latest version:

CONFIGURATION NOTES:

PERFORMANCE
For best response (minimal delay) most sites will wish to uncomment this line from /usr/local/etc/couchdb/local.ini:

socket_options = [{recbuf, 262144}, {sndbuf, 262144}, {nodelay, true}]

Otherwise you'll see a large delay when establishing connections to the DB.

And just like that my response times for simple fetch requests to couchdb went from ~200ms to ~4ms.  This was something that was seriously bothering me, and had me contemplating dropping couchdb in favor of something else.  If ‘most sites’ will want to do this, why isn’t this the default setting, I wonder?  Oh well.

How to corrupt your git repo with sed

So, some day you’re going to be doing some refactoring that involves changing a single line in multiple files in your code base.  Like, say, the license header in every file in your project (because you actually sat down and read the GPLv3 and decided to revert to GPLv2 before publishing your project on github).

It seems like doing something like this would be a quick and easy solution:

$ find ./ -type f -exec sed -i -e 's/either version 3 of the License/either version 2 of the License/g' {} ;

Do not do this.  As I discovered, sed is a powerful tool that also happens to be really adept at wrecking things.  Like all the binary files in your .git directory.  Good luck fixing that.

In my case, no amount of git fsck/git reset/ etc. could bring my repo back.  Since I had a lot of uncommitted changes (I know, bad), I ended up re-initializing my local repo and doing a big, ugly pull/merge/push.  Which, while being a quick and dirty of getting back up and running (and back to work), is only an acceptable solution when you’re working on a solo project (because it does ugly things to the repo history, which tends to irritate other people for some reason).

Next time, I’ll need to come up with a search/replace script that ignores binary files 🙂