Posts Tagged: sql


18
Jul 09

Groovy SQL Builder

I’m currently working on a project which requires some pretty heavy SQL to OO stuff. JPA of course can’t handle the full load, so my Repository pattern abstractions abstract away JPA as well as raw JDBC stuff. This architectures actually works out pretty well, except when it comes time to build queries. Java’s string interpolation/comprehension is not present and even if it was there, who wants to build queries using strings. I used to do it years ago, now that I had to come back to doing it (for non-trivial large queries), I definitely think it’s a big design smell. Even abstracting it away behind some sort of a sql variant, it’s just not as nice as I’m used to programming with DDD.

I also had to question simple sql variant implementation which I had. SQLVariant is nice when you know that you’ll always have only one query per operation (in each db you have to support) and building queries is pretty straightforward, not requiring a lot of logic.

So I added this concept of a SQLExecutor, which returned anonymous inner classes that represented query or update operations interface. The client can then execute this (psudo-closure) to perform the operation. And all query stuff is complete abstracted away in the returned closure’s execution logic. Worked great, but I’m still stuck with building queries. Java just sucks when it comes to this, ah, but what about groovy?

So I realized that I can use java and groovy interchangeably in the absolute same code base. Both are compiled into JVM bytecode and IntelliJ supports both file types rather nicely, allowing for cross compilation and all the other IDE goodies.

So why not build a sql builder in groovy? Groovy’s metaprogramming facilities allow for that level of dynamic logic way more than java. So in the last 2 hours I put together something very simple, which currently allows me to at least support my query building tasks in a cleaner fashion. Groovy’s FactoryBuilderSupport is a rather nice abstraction that allows to build builder DSLs on top of. Mind, this is 2 hours of work, many features I need (besides inserts and updates as well) are not yet there. I need to add column validation (to not allow to select or have included in the predicates columns and/or tables that were not defined as a part of the query or joins). There are numerous other things I’ll add with time, but thought someone might be interested in at least how easy it is to build a simple groovy DSL.

If I ever get it to the point of where I feel it will be helpful to anyone other than me, I’ll release it in some near future. Right now, I’m just glad that I can write simple DSLs and get rid of the SQL building crud.

Here is a small use case for a select statement (that’s all I can build at this point).

  @Test
  void buildSelectSql() {
    def sql = new SqlBuilder();
    Select select = sql.select ("table1") {
      join("table2", type: "INNER") {
        using(table1: "col1", table2: "col1")
      }
      join("table3", type: "OUTER") {
        using(table1: "col2", table2: "col2")
        using(table1: "col3", table2: "col3")
      }
      where("table1.col1 = 'test'")
      groupBy(table1: "col1", table2: "col1")
      orderBy(table1: "col1", table2: "col1")
    }
    assertNotNull select, "Select object should not be null"
    println "SELECT: " + select.getSql()
    assertNotNull select.getSql(), "Generated SQL SELECT expression should not be null"
    assertFalse select.getSql().trim() == "", "Generated SQL SELECT expression should not be empty"
  }

Here is the code:

/**
 * SqlBuilder is a DSL to build portable SQL using groovy internal DSL facilities.  It allows you to specify
 * SQL queries using groovy expressions vs. string concatenation/interpolation as it's usually done.
 *
 *
 * @author Ilya Sterin
 */
public class SqlBuilder extends FactoryBuilderSupport {

  // Register expression factories
  {
    registerFactory('select', new SelectFactory())
    registerFactory('join', new JoinFactory())
    registerFactory('using', new UsingFactory())
    registerFactory('where', new WhereFactory())
    registerFactory('groupBy', new GroupAndOrderFactory())
    registerFactory('orderBy', new GroupAndOrderFactory())
  }
}

/**
 * Select class encapsulates the select clause node of the SQL query
 * and acts as a parent node to all select queries.
 */
private class Select {

  def table,        // Name of the main select table
      tables  = [], // All tables join tables and select table
      joins   = [], // Collection of all joins
      where,        // Where clause
      groupBy = [], // Collection of group by columns
      orderBy = []  // Collection of order by columns

  /**
   * Builds and returns the SQL query based on the DSL expressions
   */
  String getSql() {
    def sql = "SELECT * FROM $table"
    joins.each {Join join ->
      sql += " ${join.type} JOIN ${join.table} ON"
      join.using.eachWithIndex {Using using, idx ->
        if (idx > 0) sql += " AND"
        sql += " ${using.lhs} ${using.op} ${using.rhs}"
      }
    }
    where.eachWithIndex {Where where, i ->
      if (i == 0) sql += " WHERE ";
      sql += "${where.clause}"
    }
    if (groupBy) sql += " GROUP BY ${groupBy.join(', ')}";
    if (orderBy) sql += " ORDER BY ${orderBy.join(', ')}";
    return sql
  }

}

///////// Private classes to support the builder functionality /////////////

private class Join {
  def table
  def type = "INNER"
  def using = []
}

private class Using {
  def lhs, rhs, op = "="

  def methodMissing(String name, args) {
    println "Looking for ${name}"
  }
}

private class Where {
  def clause
}

///////// Factories that construct the appropriate objects based on the DSL expressions //////////////

private class SelectFactory extends AbstractFactory {

  public Object newInstance(FactoryBuilderSupport factoryBuilderSupport, name, value, Map map) {
    return new Select(table: value, tables: [value])
  }

  public void setChild(FactoryBuilderSupport factoryBuilderSupport, Object parent, Object child) {
    println "Child ${child}"
    if (child instanceof Join) {
      println "Adding a join"
      parent.joins << child
    }
    if (child instanceof Where) {
      println "Adding a where"
      parent.where = child
    }
  }
}

private class JoinFactory extends AbstractFactory {

  public Object newInstance(FactoryBuilderSupport factoryBuilderSupport, name, value, Map map) {
    Join join = new Join(table: value)
    return join
  }

  public void setChild(FactoryBuilderSupport factoryBuilderSupport, Object parent, Object child) {
    super.setChild(factoryBuilderSupport, parent, child)
    if (child instanceof Using) {
      parent.using << child
    }
  }

  public void setParent(FactoryBuilderSupport factoryBuilderSupport, Object parent, Object child) {
    super.setParent(factoryBuilderSupport, parent, child);
    parent.tables << child.table
  }
}

private class UsingFactory extends AbstractFactory {

  public Object newInstance(FactoryBuilderSupport factoryBuilderSupport, name, op, Map map) {
    Using using = new Using()
    if (op) using.op = op
    map.each {k, v ->
      if (!using.lhs) {
        using.lhs = "${k}.${v}"
      }
      else if (!using.rhs) {
        using.rhs = "${k}.${v}"
      }
    }
    map.clear()
    return using
  }

  public boolean onHandleNodeAttributes(FactoryBuilderSupport factoryBuilderSupport, Object o, Map map) {
    return true;
  }

  public boolean isLeaf() {
    return true;
  }
}

private class WhereFactory extends AbstractFactory {

  public Object newInstance(FactoryBuilderSupport factoryBuilderSupport, name, value, Map map) {
    return new Where(clause: value)
  }

  public boolean isLeaf() {
    return true;
  }
}


private class GroupAndOrderFactory extends AbstractFactory {

  public Object newInstance(FactoryBuilderSupport factoryBuilderSupport, name, value, Map map) {
    def fqCols = [];
    map.each {k, v ->
      fqCols << "${k}.${v}"
    }
    return [name: name, cols: fqCols];
  }

  public void setParent(FactoryBuilderSupport factoryBuilderSupport, Object parent, Object child) {
    super.setParent(factoryBuilderSupport, parent, child);
    if (child.name == "groupBy") {
      parent.groupBy += child.cols
    }
    if (child.name == "orderBy") {
      parent.orderBy += child.cols
    }
  }

  public boolean onHandleNodeAttributes(FactoryBuilderSupport factoryBuilderSupport, Object o, Map map) {
    return true;
  }

  public boolean isLeaf() {
    return true;
  }
}


12
May 09

Postgres prepared statements gotcha

We’ve been profiling a slow groovy application for the last week and all fingers were pointing towards the socket reading in Postgres JDBC Driver. Of course, once the data is set over a socket and the thread is blocked awaiting return packet, not much else a java profiler can do.

The biggest bummer was that we had two relatively similar applications, except that one performed more business logic, the amount of queries that each executed was the same, but the times that it took to execute each query were drastically different. Different on a magnitude of 20-30 thousand times slower (1-5 millis to 20 seconds). This was pretty baffling, but a coworker of mine was able to stumble upon a Postgres caveat. I’m not sure if Postgres interface is the only one that has a similar issue (I haven’t tested this theory with any other RDBMS), but it was definitely not the JDBC driver, since the same behavior was observed with Perl DBI.

The issue is that when binding variables to query placeholders in prepared statements, one doesn’t think about types as much as he/she should. Although it’s natural (at least in a statically typed language like Java) to use setInt, setString, etc…, in some dynamically typed languages, this might not be so straightforward. Using Groovy SQL library, binding methods are not inferred based on the type, so groovy uses java’s setObject(int, Object) for help binding all the variables. This is fine, except that because of the nature of the dynamic typing, it’s easy to bind a wrong type, especially when not having access to setType methods. Behind the scenes, the Postgres driver gives the wrong type hint to Postgres, which I can only imagine at this point, has an issue binding the wrong type. Somehow (I’m not sure about the internals of binding and query preparation in Postgres engine), this causes an insurmountable amount of overhead. If anyone knows how Postgres binds variables internally and why the issue demonstrated below happens, please let me know. The slowdown is so drastic that it’s actually pretty unbelievable. Take a look at this test.


import java.sql.*;

public class Test {

  private static final String SQL =
    "UPDATE some_table SET " +
    "column1 = column1 + 0 " +
    "WHERE some_table_id = ?";

  public static void main(String[] args) throws Exception {

    Class.forName("org.postgresql.Driver"); //load the driver
    Connection conn = DriverManager.getConnection("jdbc:postgresql://localhost/test", "...", "...");

    PreparedStatement ps = conn.prepareStatement(SQL);
    ps.setObject(1, "8987459118");
    long start = System.currentTimeMillis();
    ps.executeUpdate();
    long finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setObject as String: " + finish + " ms");
    ps.close();

    ps = conn.prepareStatement(SQL);
    ps.setObject(1, new Long(8987459118L));
    start = System.currentTimeMillis();
    ps.executeUpdate();
    finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setObject as Long: " + finish + " ms");
    ps.close();

    ps = conn.prepareStatement(SQL);
    ps.setString(1, "8987459118");
    start = System.currentTimeMillis();
    ps.executeUpdate();
    finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setString: " + finish + " ms");
    ps.close();

    ps = conn.prepareStatement(SQL);
    ps.setLong(1, 8987459118L);
    start = System.currentTimeMillis();
    ps.executeUpdate();
    finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setLong: " + finish + " ms");
    ps.close();

    conn.close();

  }

}

This output is somewhat surprising.

Test with setObject as String: 11348 ms
Test with setObject as Long: 1 ms
Test with setString: 10619 ms
Test with setLong: 1 ms

So as you can see, binding a string type when a numeric type is required causes drastic slowdowns on the postgres engine side. I haven’t experienced with other types yet, but might in the upcoming week if I get some time.

There also seems to be an option now in the postgres JDBC documentation, that allows you to add a connection option stringtype. The option is set in the connection url as jdbc:postgresql://localhost/test?stringtype=unspecified. The postgres documentation says…

stringtype = String

Specify the type to use when binding PreparedStatement parameters set via setString(). If stringtype is set to varchar (the default), such parameters will be sent to the server as varchar parameters. If stringtype is set to unspecified, parameters will be sent to the server as untyped values, and the server will attempt to infer an appropriate type. This is useful if you have an existing application that uses setString() to set parameters that are actually some other type, such as integers, and you are unable to change the application to use an appropriate method such as setInt().

The stringtype directive basically tells the Postgres JDBC driver that you might be binding variables using wrong set*** method, so it doesn’t statically type the bind value, though delegating the type inference to Postgres. Running the above test with this parameter set yields…

Test with setObject as String: 7 ms
Test with setObject as Long: 1 ms
Test with setString: 1 ms
Test with setLong: 1 ms

As you can see, there is still some negligible (in my case) latency associated with using setObject(int, String) binding, that might be the latency due to type inference on Postgres side.

It seems that most languages database libraries allow some way to bind typed and untyped values. In perl for exemple, default is to send untyped values, but you can statically type them using SQL_* types.

$sth->bind_param(1, “12345”, SQL_INTEGER);

So what I learned from this whole ordeal, is basically use the right bindings, which was almost a given using java, but with us making a move towards using a dynamically typed language at times, we have to really what these types.