Posts Tagged: clojure


25
Oct 10

State machine with Clojure macros and runtime argument inference

I few years ago, before I delved into functional programming, I had a small stint with Flex/ActionScript. ActionScript is an imperative language very similar to Java. At the time, I needed a very simple state machine, which had a single path of execution (basically a chain of commands). The design included a chain object, which joined command objects and executed them sequentially as long as no exceptions where thrown. Because these chains where also used for transformations and had dependencies (one command might compute something that is needed by another command), the commands had to keep state, though a global context object was used to store/retrieve state. I’m sure there are other ways of designing such a system, but it turned out to be pretty maintainable and rather clean. One thing that bothered me at the time were the implicit dependencies amongst the command objects, which relied on certain context information to be there in forms of map keys, which means if a command changed how it stored a particular results, its dependents would have to be modified as well. Because of the lack of static typing and runtime inference (unless done at each command object level), there was no way to ensure that something wasn’t silently failing. The problem was due to utilization of map structures for context storage which besides not having any static typing abilities, also didn’t allow the chain invocations to perform runtime inference of argument matching. The implementation was very functional and wasn’t too badly designed, but definitely not very pretty.

I don’t have access to the exact code at this time, but below is a simple example that demonstrated similar issue in Java.

Running the above yields:

Results: 
    result1: 1234
    result2: 2468

Besides the mandatory java ceremony, it’s also not apparent to me that this can be accomplished any better without the use of reflection, which of course would add yet more boilerplate.

Macros to the rescue. If any of you aren’t familiar with what makes lisp (besides its simple syntax allure) so powerful, you should familiarize yourself with macros. The example I give below doesn’t even make a dent into the possibilities of macros.

With a simple macro

The above can now be utilized with the following api…

Yielding (in SLIME REPL):

Executing first
Executing second: first arg
Executing third: second arg random-arg
nil

Running a similar script with an added parameter in the third command that doesn’t exist in the chain

Throws an exception (in SLIME REPL):

'does-not-exist' argument is required!
  [Thrown class java.lang.Exception]

Of course the above can be much improved, one off the top of my head improvement is to allow default values for arguments that don’t exist and possibly be able to specify arguments who’s lack of should throw and exception an in other cases either bind a nil or default value. But the concise example demonstrates similar abilities of my java program but with added runtime argument name matching. I’d love to see if the same can be accomplished with a statically typed language with type compile time vs. runtime argument checking. Going to investigate this with Haskell and Scala this week.


21
Oct 10

On JVM, languages, platforms, and frameworks

Today apple announced through their Java update that their support for the JVM is now deprecated and will possibly be removed from the future OS releases. The blogosphere is flaming, mostly with Java supporters who are either pissed off at Apple, worried about the future of their investment in the Java platform, or both. I don’t think the future of the Java platform should be in question due to any apple decisions, at the end of the day, there aren’t many production Java deployments on OS X, but it is a fact that a large portion of Java developers utilize OS X as their primary development platform. These developers, without proper support for their environments, will move to Linux and maybe even Windows. This move alone will probably not hurt apple in the short run, but their behavior towards isolating different developer groups, will eventually come back on them. Developers from any environments will cautiously approach, as the tamed Leopard and eventually Lion might bite them in the ass at the least expected moment.

It might be that Apple wants Oracle to take the charge in maintaining the OS X port and deprecating support might be a way to negotiate this without face-to-face negotiations. I’m fine with that, frankly I could care less who provides the JVM, as long as one is provided and is relatively actively supported. Until that announcement happens, this is yet another bump in the future of the JVM. First the Oracle purchase of Sun, then the lawsuit, and now the decision by Apple, definitely creates unneeded distractions for this platform’s developers all over the globe.

So I started this not to gripe any more about Apple’s decision, I’m sure there are enough posts out there flooding your RSS streams to keep you busy, rather I wanted to question the future of languages/industry in regards to the language/platform of the future.

Last week I attended the StrangeLoop conference in St. Louis. It was a gem of a conference, definitely the best I’ve been too in a long time. Alex, besides seeming like an overall awesome guy, has some extraordinary “brilliant people herding” abilities. How he managed to bring together a group of brilliant speakers and then convince another group of awesome developers to attend, is beyond me. The conference had some great talks and panels about the latest/greatest and bleeding edge tech stuff. One of the best panels was about the future of programming languages. The panel consisted of (Guy Steele, Alex Payne, Josh Bloch, Bruce Tate, and Douglas Crockford), all whom I have great respect for. One prevailing factor in most discussions in this panel as well as throughout the conference, has been concurrency. In the mutli-core/cpu world, what language/platform will allow for this paradigm transition to happen seamlessly. The fact is that although there are some awesome innovations/implementations going on in this area, STM, Actors, fork/join, and various others, none have yet abstracted the concurrency model away from the developer, as seamless as memory management and garbage collection is done in today’s runtime environments. But this is an exciting time to be in, many ideas are flowing around and something will appear on the horizon sooner or later. This something, as Guy Steele pointed out, will most likely be a model that will allow for divide/conquer, map/reduce operations to happen through language idioms and possible seamless abstractions. Accumulators are evil 🙂

There are many languages/platforms out there today, but none have been as predominant and as overall polished as Java and the JVM. From the language perspective Java’s getting stagnant and to some, boring, but the fact that it has an ecosystem of wonderful libraries and products is hard to ignore. The fact that all of these are bytecode compatible is even more to rant about, as with the advent of numerous great languages built on top of the JVM, it makes the transition to a different language and programming paradigm, much easier. It is truly hard to think of any current platform/VM that’s more prevalent and better suited for large scale enterprise development than the JVM. .NET comes to mind, but I doubt anyone from the non-Microsoft camp will be switching :-). There are other platforms, most notably Python and Ruby, but although both are credible, the presence of GIL on both, make the choice of using them in a concurrency model very difficult. You can architect and deploy your system as multi-process vs. multi-thread and arguably that model has its benefits, mostly by getting rid of the shared state model concurrency issues, but we (at least I do) like to have a choice. This decision shouldn’t be shoved down our throats because the language development camp doesn’t want, doesn’t think one is necessary, or [add your own excuse here] to produce a thread-safe non-GIL thread model.

The other issue with most of these languages/platforms, as well as the other ones I like, is the deployment options. They suck! From providing modular builds to deploying production applications, they just aren’t as polished and in most cases as stable/supported as the JVM ones. Common Lisp, one of my favorites as of late, for example, is an awesome language with numerous compilers/interpreters. Lisp doesn’t have a good packaging, dependency resolution, and build story, but even if you can get past that with some of the available half-baked solutions, then when it’s time to build/compile/deploy your app, you’re fucked, unless you want to build one yourself. I enjoy such challenges on Friday/Saturday nights, but not when time is limited and milestones are due (which is most of the time).

Ruby and Python for example, have a decent package managers gem and easy_install/pip respectively, but two problems lurk. First, lots of modules are written in C and in many cases, in my experience, are a big pain in the ass to compile, especially with today’s heterogeneous architectures i386, x86_64, etc… Lots of incompatibilities arise, forcing more time away from doing what I should be doing. Somehow my milestones never include the 2+ day derailments due to such issues. Maybe that’s what’s left of my optimism. The second problem only applies if you’re writing a web app and if you are, then you know the issues. Where are those stable/supported app servers? WSGI and rack should provide answers soon, for now, there are many options and none are without major issues as well. Some are a pain to install/deploy, some aren’t actively maintained. I mean, am I just being anal and asking for way too much or am I eternally spoiled by the JVM. Is it too much to ask to bundle the application into some archive or directory structure and just drop it in somewhere or point your server config towards it. Either way, even if they ease the pain of deploying webapps, the fact that [Python/Ruby] are not suitable in multi-core/cpu environments where threads are needed, is a show stopper for lots of apps I write. I know I can architect around the issues, but again, why should I have to program to the platform vs. the other way around. Give me the choices and trust me to make the best decision.

The next things is native GUI development. It is true that lots of interesting apps today are developed and deployed as web apps, but that doesn’t discount the fact that there is still a need for a native GUI in lots of use cases. Swing provides a good and in some instances really good, cross platform GUI library which allows to deploy your GUI across most popular platforms with 95% or more cross platform consistency. That sounds pretty good to me.

There are other toolkits, wxWidgets, QT, etc…, which also have bindings to python and ruby, but again, with today’s multicore, it would be a shame to not be able to utilize these cores simultaneously due to GIL. The bindings in languages that due provide a better concurrency story, work great, but these languages still suffer from the other pain points I mentioned before (i.e. deployment, build, package management, etc…). It’s a Catch-22.

So maybe I’m missing something here, but I think the JVM is the best option we have at this time that allows for multiple platforms, languages, paradigms, and comes with a great success story in the enterprise (build tools, deployment/modularity, enterprise grade servers, etc…). Languages implemented on top of the JVM benefit from this quite successful ecosystem. Ah, and might I mention that great libraries exist for about anything you’re trying to do. This is also true of Python, but I can’t say the same for Ruby. Ruby has numerous gems for most tasks, but they all seem half-baked at most. There are frameworks like Rails and Sinatra, which are great and fully supported with active communities, though as long as you don’t venture too far off the traditional path.

JVM has it’s own set of issue, the fact that it was written with static languages in mind and lacks support for dynamic bindings, tail call optimizations, and other things that make writing languages on top of it more difficult. It’s future is now also in question due to the new Oracle stewardship and the legal obstacles it chose to pursue rather than spend that time and money on the platform. Nevertheless, the ecosystem is still flourishing, kept afloat but tons of great developers and supporting companies who care about the platform and greatly benefit from it. JVM allows us to program in different languages while being concentrated on the task at hand, not peripheral issues like compiling for different architectures, battling the deployment inadequacies, not being able to utilize cores efficiently, and a variety of other issues. JVM ecosystem might not have the most ideal solutions to these problems, but they are far better than anything out there right now. If people that spend their time bashing the JVM platform would spend as much time making their platform better, maybe we’d have other choices.

I’d love to hear other’s thoughts on this topic. What do you think about the JVM and what’s your language/platform of choice. How do you build, deploy, distribute your applications? What concurrency options are available on that platform and how they compare to others? I’m familiar with most JVM options, especially Clojure and Scala, so I’m mostly asking for anything outside of the JVM ecosystem. I hope to someday compile a list of these and present them in an objective manner, for now, all I have is my empirical opinions.


3
Feb 10

The start of the Scala journey (concurrency and idiomatic Scala rant)

I’ve been following Scala off and on for about 2 years now. Mostly in spurts, I liked the language, but due to the workload and other priorities I never had the time to take it for a full ride. Well, over the last 2 weeks, I decided to take the full plunge. Full meaning, I’m taking a highly concurrent production application which power’s a very critical component of our application, and rewriting it in Scala. I’m doing this for more than just fun. This application has grown from a very cleanly architected one, to one that is still rather nicely designed, but has accumulated a lot of technical debt. With everything I’ve learned about Scala, I think I can redesign it to be cleaner, more concise, and probably more scalable. The other big driving reason I’m looking to give Scala a shot, is due to its Actor based concurrency. I’ve worked with Java’s threading primitives for many years and have accumulated a love/hate relationship. The JSE 5 concurrency package brought some nice gems to my world, but it didn’t eliminate the fact that you’re still programming to the imperative model of shared state synchronization. Scala actors hide some of the ugliness of thread synchronization, though don’t eliminate the issue completely. Due to the nature of Scala, being a mix between imperative and functional language and the fact that actors are implemented as a library, nothing stops one from running into same issues as in more primitive thread state-sharing operations (i.e. race conditions, lock contentions, deadlocks/livelocks). Basically, if you’re using actors as just an abstraction layer over old practices, you’ll be in the same boat as you started with Java. With all of that said, unlike Java, Scala provides you the facilities for designing cleaner and more thread safe systems due to its functional programming facilities. Mutable shared state is the leading cause of non-determinism in Java concurrent applications, so immutability and message passing is a way into a more deterministic world.

I’ve also looked at other concurrent programming models, like STM/MVCC. STM is the basis of concurrent programming in Clojure and it’s a different paradigm than Actors. STM is a simpler model if you’re used to programming the old imperative threading, as they abstract you from concurrency primitives by forcing state modifications to occur in a transactional context. When this occurs, the STM system takes care of ensuring the state modifications occur atomically and in isolation. In my opinion this system suites the multi-core paradigm very well and allows smoother transition, the problem with it, at least in the context of MVCC, is that for each transaction and data structure being modified, a copy is made for the purposes of isolation (implementation of copying is system dependent, some might be more efficient than others), but you can already see an issue. For a system that has to handle numerous concurrent transactions involving many objects, this can become a bottleneck and the creation of copies can overburden system’s memory and performance. There are some debates about that in the STM world, mostly involving finding the sweet spot for such systems, where the cost of MVCC is less relevant the the cost of constant synchronization through locking.

Actors model is different, it works in terms of isolated objects (actors), all working in isolation by message passing. None can modify or query the state of another, short of requesting such an operation by sending a message to that particular object (actor). In Scala, you can break that model, as you can send around mutable objects, but if you are to really benefit from the Actor model, one should probably avoid doing that. Actors lend themselves better to concurrent applications, that not only span multiple-cores, but also can easily be scaled to multiple physical nodes. Because messages being passed are immutable data structures that can be easily synchronized and shared, the underlying Actor system can share these message across physically dispersed actors just as it can for the actors within the same physical memory space.

So the world of concurrency is getting more exciting with these awesome paradigms. One thing to remember is that there is no one size fits all concurrency model and I don’t see any one of the above becoming the de-facto standard any time soon. There is a sweet spot for each, so one should learn the ins and outs of each model.

Now that I got the concurrency out of the way, let’s get back to the actual syntax of Scala. Scala is very powerful (at least compared to Java). This power comes with responsibility. You can use Scala to write beautiful/concise programs, or you can use it to write obscure/illegible programs that no one, including the original author, will be able to comprehend. Personally, I prefer and can responsible handle this responsibility. I’m a long time Perl programmer (way before I started programming Java), and I’ve seen (and even written at times), programs that Larry Wall himself wouldn’t be able to comprehend.

Scala comes with operator overloading, but when not judiciously used, that power alone can be responsible for ineligibility of any system. This is one of the major reasons why languages like Java decided to not include it. Personally, I think operator overloading can be a beautiful addition to any API. It can make writing DSLs easier and using them more natural. Again, this power is great in the use of experienced and responsible programmers.

After having experience great power (Perl) and great restraint (Java), I’m leaning more towards power (who wouldn’t :-). One one hand, it’s nice to be able to read and comprehend anyone’s Java program, even when it’s not nicely written, on the other hand, it’s a pain trying to write a program and jumping through all the hoops and limitations because of the various constraints. In a perfect AI world, the compiler would infer the capabilities of the programmer and restrict its facilities based on those, in some way as to not offend anyone:-) So if a novice is inferred, ah, there goes the operator overloading and implicit conversions, etc… But for now, I’d rather have a powerful tool to use when I write software and Scala seems to push the right buttons for me at this point.

I’m going to start of a list of posts, starting with this one, about my experiences with Scala.

Here is a little something I came up with a few hours ago. Our software has some limited interoperability with a SQL database and requires a light abstraction. We chose not to use any 3rd party ORM or SQL abstraction, mostly due to the fact that the dependency on these abstractions don’t really provide any benefit for our limited use of SQL. So I developed a simple SQL variant abstraction layer, which allows us to execute SQL queries which are defined in the SQLVariant implementation. Moving from one database to another, just requires one to implement a SQLVariant interface to provide the proper abstraction. I initially wrote this in java and although it was decent, it required quite a bit more code and didn’t look as concise as I wanted it. One issue was PreparedStatement and it’s interface for placeholder bindings. How would one bind java’s primitive and wrapper types as placeholders and how would the SQLVariant know which PreparedStatement.bind* method to call? I resorted to using an enumeration which defines these operations and reflection for the purpose of invoking these operations. I’m basically sidestepping static typing in a place I’m not sure I really want or have to. Here is the java implementation.

I got rid of a few methods, specifically dealing with resultset, statement, and connection cleanup, as they don’t really emphasize my point here.

  import java.lang.reflect.Method;
  import java.sql.*;
  import java.util.ArrayList;
  import java.util.Collections;
  import java.util.List;

  public abstract class SqlVariant {

    public abstract SqlSelectStatement getResultsNotYetNotifiedForStatement(NotificationType... types);

    public abstract SqlSelectStatement getResultsNotYetNotifiedForStatement(int limit, NotificationType... types);

    public abstract SqlUpdateStatement getUpdateWithNotificationsForStatement(Result result);

    private abstract static class SqlStatement<T> {

      protected String sql;
      protected List<BindParameter> bindParams = new ArrayList<BindParameter>();
      protected PreparedStatement stmt;

      public SqlStatement(String sql) {
        this.sql = sql;
      }

      public SqlStatement addBindParam(BindParameter param) {
        bindParams.add(param);
        return this;
      }

      public String getSql() {
        return sql;
      }

      public List<BindParameter> getBindParams() {
        return Collections.unmodifiableList(bindParams);
      }

      protected PreparedStatement prepareStatement(Connection conn) throws SQLException {
        PreparedStatement stmt = conn.prepareStatement(sql);
        for (int bindIdx = 0; bindIdx < bindParams.size(); bindIdx++) {
          BindParameter p = bindParams.get(bindIdx);
          try {
            Method m = stmt.getClass().getMethod(p.type.method, Integer.TYPE, p.type.clazz);
            m.invoke(stmt, bindIdx + 1, p.value);
          }
          catch (Exception e) {
            throw new RuntimeException("Couldn't execute method: " + p.type.method + " on " + stmt.getClass(), e);
          }
        }
        return stmt;
      }

      public abstract T execute(Connection conn) throws SQLException;
    }

    public static final class SqlSelectStatement extends SqlStatement<ResultSet> {

      public SqlSelectStatement(String sql) {
        super(sql);
      }

      @Override
      public ResultSet execute(Connection conn) throws SQLException {
        return prepareStatement(conn).executeQuery();
      }
    }

    public static final class SqlUpdateStatement extends SqlStatement<Boolean> {
      public SqlUpdateStatement(String sql) {
        super(sql);
      }

      @Override
      public Boolean execute(Connection conn) throws SQLException {
        stmt = prepareStatement(conn);
        return stmt.execute();
      }
    }


    public static final class BindParameter<T> {
      private final BindParameterType type;
      private final T value;

      public BindParameter(Class<T> type, T value) {
        this.type = BindParameterType.getTypeFor(type);
        this.value = value;
      }

      public BindParameter(BindParameterType type, T value) {
        this.type = type;
        this.value = value;
      }
    }

    private static enum BindParameterType {
      STRING(String.class, "setString"),
      INT(Integer.TYPE, "setInt"),
      LONG(Long.TYPE, "setLong");

      private Class clazz;
      private String method;

      private BindParameterType(Class clazz, String method) {
        this.clazz = clazz;
        this.method = method;
      }

      private static BindParameterType getTypeFor(Class clazz) {
        for (BindParameterType t : BindParameterType.values()) {
          if (t.clazz.equals(clazz)) {
            return t;
          }
        }
        throw new IllegalArgumentException("Type: " + clazz.getClass() + " is not defined as a BindParameterType enum.");
      }
    }
  }

Now, here is how one would implement the SQLVariant interface. The below implementation is in groovy. I choose groovy when I have to do lots of string interpolation, which somehow java and scala refuse to support. The code was shortened to just demonstrate the bare minimum.

  class MySqlVariant extends SqlVariant {

    @Override
    public SqlVariant.SqlSelectStatement getResultsNotYetNotifiedForStatement(int limit, NotificationType[] types) {
      SqlVariant.SqlSelectStatement stmt = new SqlVariant.SqlSelectStatement("SELECT ...")
      for (NotificationType t : types)
        stmt.addBindParam(new SqlVariant.BindParameter(String.class, t.name().toUpperCase()))
      return stmt;
    }

    @Override
    public SqlVariant.SqlUpdateStatement getUpdateWithNotificationsForStatement(Result result) {
      SqlVariant.SqlUpdateStatement stmt = new SqlVariant.SqlUpdateStatement("INSERT INTO ....")
      result.notifications?.each { Notification n ->
        stmt.addBindParam(new SqlVariant.BindParameter(SqlVariant.BindParameterType.LONG, n.id))
        stmt.addBindParam(new SqlVariant.BindParameter(SqlVariant.BindParameterType.LONG, result.intervalId))
      }
      return stmt
    }

    ......
  }

I started reimplementing the above in Scala and I ran across a very powerful and beautiful Scala implicit conversion feature. This allowed me to truly abstract the SQLVariant implementations from any bindings specific knowledge, through an implicit casting facility that normally only dynamically typed languages provide. Scala gives us this ability, but also ensures static type safety of implicit conversions during compilation.

Another wonderful feature, is lazy vals, which allows us to cleanly implement lazy evaluation that we (java programmers) are so used to doing by instantiating a member field as null and then checking it before initializing on the initial accessor call. If you’ve seen code similar to below a lot, you’ll rejoice to find out that you no longer have to do this in Scala.

public class SomeClass {
  private SomeType type;

  public SomeType getSomeType() {
    if (type == null) type = new SomeType(); // Often more complex than that
    return type;
  }
}

The above, besides not being ideal, is also error prone if say a type is used anywhere else in SomeClass and you don’t use the accessor method to retrieve it. You must ensure the use of accessor through convention or deal with the fact that it could be non-instantiated. This is no longer the case in Scala as its runtime handles lazy instantiation for you. See below code.

Note: I still allow the client data access abstractions to work with a raw jdbc ResultSet returned from the SQLVariant. I don’t see this as an issue at this point, first since these abstractions are SQL specific and also because ResultSet is a standard interface for any JDBC SQL interaction. Here is my concise Scala implementation. I’m still learning, so this might change as I get more familiar with Scala idioms and start writing more idiomatic Scala code.

  import javax.sql.DataSource
  import java.sql.{ResultSet, Connection, PreparedStatement}
  import com.bazusports.chipreader.sql.SqlVariant.{SqlSelectStatement, BindingValue}

  abstract class SqlVariant(private val ds: DataSource) {

    def retrieveConfigurationStatementFor(eventTag: String): SqlSelectStatement;

    protected final def connection: Connection = ds.getConnection
  }

  object SqlVariant {

    trait BindingValue {def >>(stmt: PreparedStatement, idx: Int): Unit}

    // This is how implicit bindings happen.  This is beauty, we can now
    // bind standard types and have the compiler perform implicit conversions
    implicit final def bindingIntWrapper(v: Int) = new BindingValue {
      def >>(stmt: PreparedStatement, idx: Int) = {stmt.setInt(idx, v)}
    }

    implicit final def bindingLongWrapper(v: Long) = new BindingValue {
      def >>(stmt: PreparedStatement, idx: Int) {stmt.setLong(idx, v)}
    }

    implicit final def bindingStringWrapper(v: String) = new BindingValue {
      def >>(stmt: PreparedStatement, idx: Int) {stmt.setString(idx, v)}
    }

    abstract class SqlStatement[T](conn: Connection, sql: String, params: BindingValue*) {

      // Ah, another beautiful feature, lazy vals.  Basically, it's
      // evaluated on initial call.  This is great for the
      // so common lazy memoization technique, of checking for null.
      protected lazy val statement: PreparedStatement = {
        val stmt:PreparedStatement = conn.prepareStatement(sql)
        params.foreach((v) => v >> (stmt, 1))
        stmt
      }

      def execute(): T
    }

    class SqlUpdateStatement(conn: Connection, sql: String, params: BindingValue*)
            extends SqlStatement[Boolean](conn, sql, params: _*) {
      def execute() = statement.execute()
    }

    class SqlSelectStatement(conn: Connection, sql: String, params: BindingValue*)
            extends SqlStatement[ResultSet](conn, sql, params: _*) {
      def execute() = statement.executeQuery()
    }
  }

  /* Implementation of the SQLVariant */

  class MySqlVariant(private val dataSource:DataSource) extends SqlVariant(dataSource) {

    def retrieveConfigurationStatementFor(eventTag: String) =
      new SqlSelectStatement(connection,  "SELECT reader_config FROM event WHERE tag = ?", eventTag)

  }

And the obligatory unit test using the o’ so awesome Scala Specs framework.

  object MySqlVariantSpec extends Specification {
    val ds = getDataSource();

    "Requesting a configuration statement for a specific event" should {
      "return a SqlSelectStatement with properly bound parameters" in {
        val sqlVariant:SqlVariant = new MySqlVariant(ds)
        val stmt:SqlSelectStatement = sqlVariant.retrieveConfigurationStatementFor("abc")
        stmt must notBeNull
        // .... Other assertions go here
      }
    }
  }

Although I barely scraped the tip of the iceberg, I hope this helps you see some of what Scala has to offer. More to come as I progress.