Posts Tagged: groovy


12
May 09

Postgres prepared statements gotcha

We’ve been profiling a slow groovy application for the last week and all fingers were pointing towards the socket reading in Postgres JDBC Driver. Of course, once the data is set over a socket and the thread is blocked awaiting return packet, not much else a java profiler can do.

The biggest bummer was that we had two relatively similar applications, except that one performed more business logic, the amount of queries that each executed was the same, but the times that it took to execute each query were drastically different. Different on a magnitude of 20-30 thousand times slower (1-5 millis to 20 seconds). This was pretty baffling, but a coworker of mine was able to stumble upon a Postgres caveat. I’m not sure if Postgres interface is the only one that has a similar issue (I haven’t tested this theory with any other RDBMS), but it was definitely not the JDBC driver, since the same behavior was observed with Perl DBI.

The issue is that when binding variables to query placeholders in prepared statements, one doesn’t think about types as much as he/she should. Although it’s natural (at least in a statically typed language like Java) to use setInt, setString, etc…, in some dynamically typed languages, this might not be so straightforward. Using Groovy SQL library, binding methods are not inferred based on the type, so groovy uses java’s setObject(int, Object) for help binding all the variables. This is fine, except that because of the nature of the dynamic typing, it’s easy to bind a wrong type, especially when not having access to setType methods. Behind the scenes, the Postgres driver gives the wrong type hint to Postgres, which I can only imagine at this point, has an issue binding the wrong type. Somehow (I’m not sure about the internals of binding and query preparation in Postgres engine), this causes an insurmountable amount of overhead. If anyone knows how Postgres binds variables internally and why the issue demonstrated below happens, please let me know. The slowdown is so drastic that it’s actually pretty unbelievable. Take a look at this test.


import java.sql.*;

public class Test {

  private static final String SQL =
    "UPDATE some_table SET " +
    "column1 = column1 + 0 " +
    "WHERE some_table_id = ?";

  public static void main(String[] args) throws Exception {

    Class.forName("org.postgresql.Driver"); //load the driver
    Connection conn = DriverManager.getConnection("jdbc:postgresql://localhost/test", "...", "...");

    PreparedStatement ps = conn.prepareStatement(SQL);
    ps.setObject(1, "8987459118");
    long start = System.currentTimeMillis();
    ps.executeUpdate();
    long finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setObject as String: " + finish + " ms");
    ps.close();

    ps = conn.prepareStatement(SQL);
    ps.setObject(1, new Long(8987459118L));
    start = System.currentTimeMillis();
    ps.executeUpdate();
    finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setObject as Long: " + finish + " ms");
    ps.close();

    ps = conn.prepareStatement(SQL);
    ps.setString(1, "8987459118");
    start = System.currentTimeMillis();
    ps.executeUpdate();
    finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setString: " + finish + " ms");
    ps.close();

    ps = conn.prepareStatement(SQL);
    ps.setLong(1, 8987459118L);
    start = System.currentTimeMillis();
    ps.executeUpdate();
    finish = (System.currentTimeMillis() - start);
    System.err.println("Test with setLong: " + finish + " ms");
    ps.close();

    conn.close();

  }

}

This output is somewhat surprising.

Test with setObject as String: 11348 ms
Test with setObject as Long: 1 ms
Test with setString: 10619 ms
Test with setLong: 1 ms

So as you can see, binding a string type when a numeric type is required causes drastic slowdowns on the postgres engine side. I haven’t experienced with other types yet, but might in the upcoming week if I get some time.

There also seems to be an option now in the postgres JDBC documentation, that allows you to add a connection option stringtype. The option is set in the connection url as jdbc:postgresql://localhost/test?stringtype=unspecified. The postgres documentation says…

stringtype = String

Specify the type to use when binding PreparedStatement parameters set via setString(). If stringtype is set to varchar (the default), such parameters will be sent to the server as varchar parameters. If stringtype is set to unspecified, parameters will be sent to the server as untyped values, and the server will attempt to infer an appropriate type. This is useful if you have an existing application that uses setString() to set parameters that are actually some other type, such as integers, and you are unable to change the application to use an appropriate method such as setInt().

The stringtype directive basically tells the Postgres JDBC driver that you might be binding variables using wrong set*** method, so it doesn’t statically type the bind value, though delegating the type inference to Postgres. Running the above test with this parameter set yields…

Test with setObject as String: 7 ms
Test with setObject as Long: 1 ms
Test with setString: 1 ms
Test with setLong: 1 ms

As you can see, there is still some negligible (in my case) latency associated with using setObject(int, String) binding, that might be the latency due to type inference on Postgres side.

It seems that most languages database libraries allow some way to bind typed and untyped values. In perl for exemple, default is to send untyped values, but you can statically type them using SQL_* types.

$sth->bind_param(1, “12345”, SQL_INTEGER);

So what I learned from this whole ordeal, is basically use the right bindings, which was almost a given using java, but with us making a move towards using a dynamically typed language at times, we have to really what these types.