18
Jul 09

Groovy SQL Builder

I’m currently working on a project which requires some pretty heavy SQL to OO stuff. JPA of course can’t handle the full load, so my Repository pattern abstractions abstract away JPA as well as raw JDBC stuff. This architectures actually works out pretty well, except when it comes time to build queries. Java’s string interpolation/comprehension is not present and even if it was there, who wants to build queries using strings. I used to do it years ago, now that I had to come back to doing it (for non-trivial large queries), I definitely think it’s a big design smell. Even abstracting it away behind some sort of a sql variant, it’s just not as nice as I’m used to programming with DDD.

I also had to question simple sql variant implementation which I had. SQLVariant is nice when you know that you’ll always have only one query per operation (in each db you have to support) and building queries is pretty straightforward, not requiring a lot of logic.

So I added this concept of a SQLExecutor, which returned anonymous inner classes that represented query or update operations interface. The client can then execute this (psudo-closure) to perform the operation. And all query stuff is complete abstracted away in the returned closure’s execution logic. Worked great, but I’m still stuck with building queries. Java just sucks when it comes to this, ah, but what about groovy?

So I realized that I can use java and groovy interchangeably in the absolute same code base. Both are compiled into JVM bytecode and IntelliJ supports both file types rather nicely, allowing for cross compilation and all the other IDE goodies.

So why not build a sql builder in groovy? Groovy’s metaprogramming facilities allow for that level of dynamic logic way more than java. So in the last 2 hours I put together something very simple, which currently allows me to at least support my query building tasks in a cleaner fashion. Groovy’s FactoryBuilderSupport is a rather nice abstraction that allows to build builder DSLs on top of. Mind, this is 2 hours of work, many features I need (besides inserts and updates as well) are not yet there. I need to add column validation (to not allow to select or have included in the predicates columns and/or tables that were not defined as a part of the query or joins). There are numerous other things I’ll add with time, but thought someone might be interested in at least how easy it is to build a simple groovy DSL.

If I ever get it to the point of where I feel it will be helpful to anyone other than me, I’ll release it in some near future. Right now, I’m just glad that I can write simple DSLs and get rid of the SQL building crud.

Here is a small use case for a select statement (that’s all I can build at this point).

  @Test
  void buildSelectSql() {
    def sql = new SqlBuilder();
    Select select = sql.select ("table1") {
      join("table2", type: "INNER") {
        using(table1: "col1", table2: "col1")
      }
      join("table3", type: "OUTER") {
        using(table1: "col2", table2: "col2")
        using(table1: "col3", table2: "col3")
      }
      where("table1.col1 = 'test'")
      groupBy(table1: "col1", table2: "col1")
      orderBy(table1: "col1", table2: "col1")
    }
    assertNotNull select, "Select object should not be null"
    println "SELECT: " + select.getSql()
    assertNotNull select.getSql(), "Generated SQL SELECT expression should not be null"
    assertFalse select.getSql().trim() == "", "Generated SQL SELECT expression should not be empty"
  }

Here is the code:

/**
 * SqlBuilder is a DSL to build portable SQL using groovy internal DSL facilities.  It allows you to specify
 * SQL queries using groovy expressions vs. string concatenation/interpolation as it's usually done.
 *
 *
 * @author Ilya Sterin
 */
public class SqlBuilder extends FactoryBuilderSupport {

  // Register expression factories
  {
    registerFactory('select', new SelectFactory())
    registerFactory('join', new JoinFactory())
    registerFactory('using', new UsingFactory())
    registerFactory('where', new WhereFactory())
    registerFactory('groupBy', new GroupAndOrderFactory())
    registerFactory('orderBy', new GroupAndOrderFactory())
  }
}

/**
 * Select class encapsulates the select clause node of the SQL query
 * and acts as a parent node to all select queries.
 */
private class Select {

  def table,        // Name of the main select table
      tables  = [], // All tables join tables and select table
      joins   = [], // Collection of all joins
      where,        // Where clause
      groupBy = [], // Collection of group by columns
      orderBy = []  // Collection of order by columns

  /**
   * Builds and returns the SQL query based on the DSL expressions
   */
  String getSql() {
    def sql = "SELECT * FROM $table"
    joins.each {Join join ->
      sql += " ${join.type} JOIN ${join.table} ON"
      join.using.eachWithIndex {Using using, idx ->
        if (idx > 0) sql += " AND"
        sql += " ${using.lhs} ${using.op} ${using.rhs}"
      }
    }
    where.eachWithIndex {Where where, i ->
      if (i == 0) sql += " WHERE ";
      sql += "${where.clause}"
    }
    if (groupBy) sql += " GROUP BY ${groupBy.join(', ')}";
    if (orderBy) sql += " ORDER BY ${orderBy.join(', ')}";
    return sql
  }

}

///////// Private classes to support the builder functionality /////////////

private class Join {
  def table
  def type = "INNER"
  def using = []
}

private class Using {
  def lhs, rhs, op = "="

  def methodMissing(String name, args) {
    println "Looking for ${name}"
  }
}

private class Where {
  def clause
}

///////// Factories that construct the appropriate objects based on the DSL expressions //////////////

private class SelectFactory extends AbstractFactory {

  public Object newInstance(FactoryBuilderSupport factoryBuilderSupport, name, value, Map map) {
    return new Select(table: value, tables: [value])
  }

  public void setChild(FactoryBuilderSupport factoryBuilderSupport, Object parent, Object child) {
    println "Child ${child}"
    if (child instanceof Join) {
      println "Adding a join"
      parent.joins << child
    }
    if (child instanceof Where) {
      println "Adding a where"
      parent.where = child
    }
  }
}

private class JoinFactory extends AbstractFactory {

  public Object newInstance(FactoryBuilderSupport factoryBuilderSupport, name, value, Map map) {
    Join join = new Join(table: value)
    return join
  }

  public void setChild(FactoryBuilderSupport factoryBuilderSupport, Object parent, Object child) {
    super.setChild(factoryBuilderSupport, parent, child)
    if (child instanceof Using) {
      parent.using << child
    }
  }

  public void setParent(FactoryBuilderSupport factoryBuilderSupport, Object parent, Object child) {
    super.setParent(factoryBuilderSupport, parent, child);
    parent.tables << child.table
  }
}

private class UsingFactory extends AbstractFactory {

  public Object newInstance(FactoryBuilderSupport factoryBuilderSupport, name, op, Map map) {
    Using using = new Using()
    if (op) using.op = op
    map.each {k, v ->
      if (!using.lhs) {
        using.lhs = "${k}.${v}"
      }
      else if (!using.rhs) {
        using.rhs = "${k}.${v}"
      }
    }
    map.clear()
    return using
  }

  public boolean onHandleNodeAttributes(FactoryBuilderSupport factoryBuilderSupport, Object o, Map map) {
    return true;
  }

  public boolean isLeaf() {
    return true;
  }
}

private class WhereFactory extends AbstractFactory {

  public Object newInstance(FactoryBuilderSupport factoryBuilderSupport, name, value, Map map) {
    return new Where(clause: value)
  }

  public boolean isLeaf() {
    return true;
  }
}


private class GroupAndOrderFactory extends AbstractFactory {

  public Object newInstance(FactoryBuilderSupport factoryBuilderSupport, name, value, Map map) {
    def fqCols = [];
    map.each {k, v ->
      fqCols << "${k}.${v}"
    }
    return [name: name, cols: fqCols];
  }

  public void setParent(FactoryBuilderSupport factoryBuilderSupport, Object parent, Object child) {
    super.setParent(factoryBuilderSupport, parent, child);
    if (child.name == "groupBy") {
      parent.groupBy += child.cols
    }
    if (child.name == "orderBy") {
      parent.orderBy += child.cols
    }
  }

  public boolean onHandleNodeAttributes(FactoryBuilderSupport factoryBuilderSupport, Object o, Map map) {
    return true;
  }

  public boolean isLeaf() {
    return true;
  }
}


12
May 09

Postgres prepared statements gotcha

We’ve been profiling a slow groovy application for the last week and all fingers were pointing towards the socket reading in Postgres JDBC Driver. Of course, once the data is set over a socket and the thread is blocked awaiting return packet, not much else a java profiler can do.

The biggest bummer was that we had two relatively similar applications, except that one performed more business logic, the amount of queries that each executed was the same, but the times that it took to execute each query were drastically different. Different on a magnitude of 20-30 thousand times slower (1-5 millis to 20 seconds). This was pretty baffling, but a coworker of mine was able to stumble upon a Postgres caveat. I’m not sure if Postgres interface is the only one that has a similar issue (I haven’t tested this theory with any other RDBMS), but it was definitely not the JDBC driver, since the same behavior was observed with Perl DBI.

The issue is that when binding variables to query placeholders in prepared statements, one doesn’t think about types as much as he/she should. Although it’s natural (at least in a statically typed language like Java) to use setInt, setString, etc…, in some dynamically typed languages, this might not be so straightforward. Using Groovy SQL library, binding methods are not inferred based on the type, so groovy uses java’s setObject(int, Object) for help binding all the variables. This is fine, except that because of the nature of the dynamic typing, it’s easy to bind a wrong type, especially when not having access to setType methods. Behind the scenes, the Postgres driver gives the wrong type hint to Postgres, which I can only imagine at this point, has an issue binding the wrong type. Somehow (I’m not sure about the internals of binding and query preparation in Postgres engine), this causes an insurmountable amount of overhead. If anyone knows how Postgres binds variables internally and why the issue demonstrated below happens, please let me know. The slowdown is so drastic that it’s actually pretty unbelievable. Take a look at this test.


import java.sql.*;

public class Test {

  private static final String SQL =
    "UPDATE some_table SET " +
    "column1 = column1 + 0 " +
    "WHERE some_table_id = ?";

  public static void main(String[] args) throws Exception {

    Class.forName("org.postgresql.Driver"); //load the driver
    Connection conn = DriverManager.getConnection("jdbc:postgresql://localhost/test", "...", "...");

    PreparedStatement ps = conn.prepareStatement(SQL);
    ps.setObject(1, "8987459118");
    long start = System.currentTimeMillis();
    ps.executeUpdate();
    long finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setObject as String: " + finish + " ms");
    ps.close();

    ps = conn.prepareStatement(SQL);
    ps.setObject(1, new Long(8987459118L));
    start = System.currentTimeMillis();
    ps.executeUpdate();
    finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setObject as Long: " + finish + " ms");
    ps.close();

    ps = conn.prepareStatement(SQL);
    ps.setString(1, "8987459118");
    start = System.currentTimeMillis();
    ps.executeUpdate();
    finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setString: " + finish + " ms");
    ps.close();

    ps = conn.prepareStatement(SQL);
    ps.setLong(1, 8987459118L);
    start = System.currentTimeMillis();
    ps.executeUpdate();
    finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setLong: " + finish + " ms");
    ps.close();

    conn.close();

  }

}

This output is somewhat surprising.

Test with setObject as String: 11348 ms
Test with setObject as Long: 1 ms
Test with setString: 10619 ms
Test with setLong: 1 ms

So as you can see, binding a string type when a numeric type is required causes drastic slowdowns on the postgres engine side. I haven’t experienced with other types yet, but might in the upcoming week if I get some time.

There also seems to be an option now in the postgres JDBC documentation, that allows you to add a connection option stringtype. The option is set in the connection url as jdbc:postgresql://localhost/test?stringtype=unspecified. The postgres documentation says…

stringtype = String

Specify the type to use when binding PreparedStatement parameters set via setString(). If stringtype is set to varchar (the default), such parameters will be sent to the server as varchar parameters. If stringtype is set to unspecified, parameters will be sent to the server as untyped values, and the server will attempt to infer an appropriate type. This is useful if you have an existing application that uses setString() to set parameters that are actually some other type, such as integers, and you are unable to change the application to use an appropriate method such as setInt().

The stringtype directive basically tells the Postgres JDBC driver that you might be binding variables using wrong set*** method, so it doesn’t statically type the bind value, though delegating the type inference to Postgres. Running the above test with this parameter set yields…

Test with setObject as String: 7 ms
Test with setObject as Long: 1 ms
Test with setString: 1 ms
Test with setLong: 1 ms

As you can see, there is still some negligible (in my case) latency associated with using setObject(int, String) binding, that might be the latency due to type inference on Postgres side.

It seems that most languages database libraries allow some way to bind typed and untyped values. In perl for exemple, default is to send untyped values, but you can statically type them using SQL_* types.

$sth->bind_param(1, “12345″, SQL_INTEGER);

So what I learned from this whole ordeal, is basically use the right bindings, which was almost a given using java, but with us making a move towards using a dynamically typed language at times, we have to really what these types.


13
Apr 09

Beware of scoping

So I was bit by ECMA’s scoping rules the other day working with ActionScript. Not because they are hard to understand, but rather because they are not common in today’s programming languages. Most languages have C-like block level scoping, AS3′s lowest scope is function level. This presents some challenges, especially using closures. For the life of me I can’t understand why a language that supports such a beautiful construct as closures, would limit variable scopes to such a high level. Closures are made to capture scope, but in AS3 and Javascript you can only do it at the function level.

What’s more disturbing to me, is that due to the single threaded nature of AS3 and Javascript, the bugs that creep up by not following the correct scoping rules are sometimes hard to detect using tests and/or to a naked eye. Test have to be deterministic, but thread timing really isn’t, so with some unlucky timing or in a different environment, it can blow up in your face.

Here is an example that really emphasizes the issue. We introduce some of this “unlucky” timing explicitly using a scheduler.

for each (var item in [1,2,3,4]) {
   var t:Timer = new Timer(delay, 1);
   t.addEventListener(TimerEvent.TIMER, function(event:TimerEvent):void {
    trace("Item: " + item);
   });
   t.start();
}

So what do you think the above will print? If you think it’ll print

Item: 1
Item: 2
Item: 3
Item: 4

then you’re wrong. It prints

Item: 4
Item: 4
Item: 4
Item: 4

As you can see, this is because the variable item is not scoped to the for each loop, but rather function scope (if it’s defined within a function, otherwise the closes outer scope). This is a major issue, as someone reading this will have a hard time figuring this out, until you realize how closures capture scopes. To fix this issue, you must define the scheduling code inside its own function like below.

function run():void {
  for each (var item in [1,2,3,4]) {
   scheduleMe(item);
  }
}

function scheduleMe(item):void {
  var t:Timer = new Timer(delay, 1);
  t.addEventListener(TimerEvent.TIMER, function(event:TimerEvent):void {
   trace("Item: " + item);
  });
  t.start();
}

The above will work just fine, considering that a new item scoped for each invocation of scheduleMe.

I recommend that one define their variables at their scope level. This might make the code a bit hard to read at first for those so used to more granular scoping, but it will also prevent some oversights as above, considering the programmer knows the scoping rules and doesn’t just try to redefine the variable in the block scope.


12
Feb 09

iPhone SDK impressions and Objective C rumblings

I’ve been doing some work with iPhone SDK lately and of course that introduced me to the world of Objective C. So here are some of my opinions, likes, dislikes, etc…

iPhone SDK

I think the SDK is great. Very similar to Cocoa (which I have no experience in using, outside of some readings). I’m really liking XCode and some of it’s features, but it’s hard to “love” it, after developing with IntelliJ IDEA over the last few years. IntelliJ is the king of all IDEs and any future IDE project should at the least take notice. And if you can’t produce anything that even closely resembles IntelliJ in its functionality and code editing abilities, then don’t call it and IDE. Ok, enough of the IntelliJ biases here, XCode is good enough at least for the language at hand, Objective C, more on that later.

Building iPhone apps is a breeze, especially after a few days of playing with the SDK, reading a good book (there are a few out there already), and beating your head against the wall trying to figure out how UINavigationController works. After you figure it out, you’ll realize that it’s not that it tooks so long because you lack brain power, it’s because we’ve been conditioned to think that something like this should be more difficult than this. No, really, UINavigationControllers are very powerful and very very simple. So throw away all your past baggage and open up your mind. You should be proficient in the SDK in a day or two.

Objective C

Well, what can I say. I like some of it’s features, but I hate many more. I guess after programming in Perl, Java, Python, AS3, Ruby, etc…, one can’t really get excited about a language build on top of C preprocessor and inherit all of its limitations. But one would also think that if they already build so many powerful features on top of the C preprocessor, why not do more?

Here is one example. Interfaces are great. I use them all the time in Java, AS3. I defined them as abstract classes in Perl, Python, Ruby, etc… that don’t have an explicit interface concept, but you are not forced to. There are classes that don’t have and/or need an interface accessible outside of the internal package where it will be used. Sometimes I just want to define a class. I know I can have @interface and @implementation sections in one file, but WTF is the @interface section for, just so I can type the definition twice? Ok, I think what I’m really pissed about is something else, and I’m taking it out on @interfaces. I can live with interfaces if…

Where are private methods? Please help me? I’ve searched and searched and searched. There are numerous techniques.

  • Convention
    Define your private method with an underscore, but then it’s still visible to XCode completions, which is annoying. I know that at runtime, dynamic dispatch doesn’t differentiate between private/public, but convention would be OK I guess (it is in perl), but apple discourages it. Because they use the underscore convention, they don’t want you to. Ok, that doesn’t make much sense. So what do we have to do, write 500 line files and repeat our code in every public method? Or just define 40 different private helper methods as public? WTF????

  • Categories
    This is a better solution. I love the way Objective C has a concept we call mixin in other languages. You can basically augment behavior of objects by defining categories. That’s cool, but I just want to define a simple #$@%ing private method. So I have to define a category, but I don’t want to define it in the public header file, so I have to also have yet another header file ClassNamePrivate.h, where I will define @interface ClassName (Private). So now, I need to have 3 files for a simple class that never needed a public and private interfaces, just a simple helper class that I want to reuse. Would that would more than double the # of files in a project as compared to some other language that doesn’t have that constraint. I don’t say tripple, because interfaces are important and there are many cases where one would want to define a public interface. Dynamic languages don’t require that, static languages like Java do, but I can live with an extra file in Java, but not extra 2 files.
    You can also define your category @interface and @implementation inside your .m implementation file, but to avoid compiler warnings, you have to define your @implementation for the category as the first construct in the .m file, otherwise using these methods will cause a compiler warning. Well, that is fine, except I usually define private methods towards the end of the file and cluttering up my code with 10+ definitions for private methods right when someone first glances at the file is not really a clean coding practice.

There are many other small quirks that bother me and other things that I’m beginning to like. After all, I can live with it, being that I’m not really given another choice by Apple.

So my final words are: Apple, please listen to developers, take a look at the language innovations and get out of the stone age. Objective C 2.0 has some very cool innovative features, but some are so restrictive and ridiculous that one might wonder why you’d actually get someone to use it if you didn’t force them. Also, if you choose not to improve the language at least open your platform for a more modern language that folks will have a pleasure working with.


12
Jan 09

CodeMash 2009

Last week I attended my first CodeMash conference in Sandusky, OH. I must say, I was very impressed. Besides that fact that it’s a regional conferences with a limit of 500 participants, it had great tracks and a very cozy atmosphere that made one feel relevant. Bigger conferences usually attract many more participants to a track, which makes it hard to grab the attention of the speaker if one has a question. Not so at CodeMash, where tracks had an average of 20-30 participants.

My first track was on Thursday and I decided to attend Scala presentation by Venkat Subramaniam. Content aside, Venkat is an awesome speaker. Actually, awesome is an understatement. He communicates very well, has a great sense of humor, and is very thorough (he really knows his stuff). Throughout the sessions and conference in general, I took notes in the form of a Mind Map. This was my first shot at it, but I loved it. Instead of formal note taking I’m so used to, this was a refreshing experience. The mind map technique is described in many literatures, but I really learned about it from the Pragmatic Thinking and Learning book. Here is Venkat’s Scala talk as a Mind Map.

Scala Presentation MindMap (click on image to enlarge)

Scala


I used XMind to draw it, though at the conference I just drew on paper. I did an exhaustive search for Mind Mapping software a week ago and couldn’t find anything that was worth while, until a friend of mine David McKinnon recommended XMind. This software is awesome.

Page 8 of 12« First...678910...Last »