Sep 15

Better Scala pattern matching with erasure

With many beautiful and some rather repulsive elements in Scala pattern matching is definitely the former. Unfortunately, due to Scala’s reliance on the JVM it suffers from [type erasure].

Scala’s latest editions have an improved reflection facility that allows you to retain types and then use them at runtime.

Below is an example of using TypeTag to pattern match types at various depths. [I’m actually pattern matching Java’s collection types here].

Jan 14

Express.js dynamic route definitions

Express provides the bare necessities to bootstrap a web application and although due to its non-opinionated nature it is very flexible, bootstrapping a web application requires a bit of work, which isn’t available out of the box as with rails or another full stack web framework.

I’d like to have all my routes defined in a directory or resources and allow the url mappings to be defined in those files as well. There is some debate whether centralizing routing is beneficial, many frameworks (i.e. rails, play, etc…) do it. I think that there is a benefit to just writing a resource without having to also route it in a completely different file. Some java frameworks allow you to define url routes with annotations, though you write a resource, annotate it and you’re done. I personally like that style.

In order to make express follow some sort of convention by inferring the directory structure and allow you to define routings within the actual resource files, we came up with the following convention:

  1. The ‘routes’ directory will hold any .js files, which are basically resources. In those files, you can define your handlers and then provide a definition of how these handlers are routed.
  2. The directory can be arbitrarily nested
  3. There is convention of ‘routes/some/path/handler.js’ will be mapped to ‘/some/path/handler’ but can be overridden by doing your own route bindings
  4. Methods within the resource files can be mapped to the usual RESTful resource routing, though ‘some/path/handler’ with the HTTP methods, or you can follow the non-RESTful convention if needed, but modifying the path as needed, for example, you are more than welcome to do this: ‘some/path/handler/someMethod’.
    File-per-resource is nice, but sometimes you want might have resources that you want to group together.

Below is the code you need to bootstrap your application routing as described above. One can customize this as you wish, but this works for us right now. Also, there is a resource definition below that shows how the routing inference works and how you can define/customize routes.

Routes can be defined by using exports.routes in your resource file. The value can be on of the three below:

  1. A function with one argument will be called with that argument bound as the app object and you can do whatever route bindings you want yourself
  2. A function with no arguments will be called on to provide a data structure of bindings. The format of the data structure can be seen in the below resource files. Path is bound to an object of verb/handlers a. path can be relative or absolute. Absolute paths start with ‘/’, relatives do not b. path can be an underscore ‘_’, which basically a way to not specify any path for the resource and though it’ll be ‘path/to/resource’ with the proper verb dispatch
  3. You can also provide a data structure vs. a function that returns one. The only issue with that, is that you must then define your exports.routes after all the handlers, otherwise due to eager evaluation of javascript data structures, it’ll bind an undefined handler.

Jan 14

Cyclical dependency detection in the database

We recently had a need to find cyclical dependencies in our database. This happens to be a rather straightforward graph algorithm issue. The database foreign key constraints form a directed graph. Finding a cycle in a directed graph is mostly detecting an already visited node in a DFS algorithm (back-edge). We mark nodes as visited and if the ancestor of a node in the tree is already visited, then a back-edge (cycle) exists.

In order to do this on our own, we’d have to read the metadata from the database for each table, construct a directed graph using the foreign keys and then run the algorithm discussed above, rather straightforward. Most of the complexity comes from the cross cutting concerns of database metadata munging. We can easily accomplish all of the above using sqlalchemy and its ability to perform a topological sort on the reflected tables. Topological sort fails in there is a cycle detected and the exception thrown includes the nodes that produce the back-edge. Using this simple trick, we allow sqlalchemy to detect the cycles for us.

You’ll need to install sqlalchemy (and your db driver), networkx and graphviz (for visualization).

Sep 13

Flurry – our 64-bit id generation service

Flurry was inspired by Twitter Snowflake. We had a need for generating unique distributed 64-bit ids to utilize within our applications that are backed by RDBMS. There are numerous approaches to this. A simple (and in some cases my favorite) approach if you only use these ids for storage within a RDBMS is Instagram’s approach. They basically use a stored procedure within Postgres to generate these ids that comprise of time, logical shard id, and auto increment bit components. Postgres has pretty advanced facilities for writing stored procedures and triggers, making this job rather simple. We tried this approach, but due to the fact that we use Mysql, Mysql’s poor stored procedure support, and the fact that Mysql versions before 5.6 don’t seem to have any way to generate a millisecond timestamp, we quickly discarded that idea.

Our next approach was to try Twitter Snowflake and after a day of ripping hair out of our heads for various reasons, decided to write our own. Snowflake is overly complex for someone outside of Twitter to use. Besides not being polished and distributed in an binary fashion, it suffers from having a dependency nightmare. Current head is dependent on older versions of Scala and various other dependencies that suffer from same issues. Upgrading these dependencies isn’t very easy. The fact that there is also an overabundance of twitter libraries that are used for Snowflake and these libraries suffer from same dependency issues, made is pretty easy to make the decision to write our own.

This isn’t meant as a criticism of Twitter. We’ve used other Twitter open source projects and love them. This software is open sourced and although they are nice enough to do that, the priority is to support their internal infrastructure, though changes/modifications are only made when they need it internally it seems or if there are bugs. Last update was a year ago. No viable forks exist to fix the issues I outlined and we didn’t want to fork it as we figured we can start from scratch and make it leaner by forgoing some functionality in order to achieve a clean code base that’s easy to use and extend. We also wanted to make it configurable so you don’t have to change code and recompile in order to change the bit schemes or utilize a different strategy for naming worker hosts.

Flurry was born and after extensive testing internally, we’re confident enough in it’s stability and functionality and are releasing it to the world. It performs on par with Snowflake and is very configurable. There are features that aren’t yet added to the current release that we plan on adding in the near future, but we’re confident that it will benefit others like us with similar needs.

You can see the project source and documention here.

Download the latest release v0.1.0-beta here.


Oct 12

Handling GSM 03.38 encoding

We recently internationalized our application and out requirement was to send SMS messages in different languages. SMS supports GSM 03.38 7 bit encoding as well as you can send messages using UTF-16 for characters that you can’t represent using the pseudo-ascii.

Our messages come in and have to be dispersed on the fly, though although sometimes a message is meant to be in ascii, it contains enough data in there in say Japanese, that would require it to be encoded in UTF-16 to make any sense.

The solution is pretty straightforward. Below is a java code snippet that first checks to see if the message is encodable as ISO-8859-1 and if so, transliterates the message to the GSM 03.38 and strips out any characters that are left out and didn’t transliterate properly.

Of course, there are other things that need to happen that I didn’t include, like trimming the string to 140 characters. For the UTF-16 hex string, it allows 280 hex characters, since characters are represented in either one or two byte encoding.

Page 1 of 1312345...10...Last »