31
Aug 06

XML Persistence Part 1

So currently we’re looking into native XML persistence.  I’ve been doing a lot of reading/research in the last few months.  Although I’m pretty versed in XML technologies, XML persistence is a completely new beast.  I’ll explain…

I think most enterprise developers today look at XML as an intermediary transport format, configuration file format, and other applications where readability of data and interoperability is a must.  It’s been embraced more as an integration technology and for good reasons of course.  There has always been a feeling that persisting application data can be done in XML, being that your domain model fits into the benefits criteria of XML Schema storage.  Just like Object Oriented databases, native XML databases have not caught up and/or even made a dent in the relational database market, though relational vendors are boosting XML features on top of their relational schema.  Why is that the case?  I think it’s the fact that most developers are comfortable with relational databases and there are magnitudes of technologies and frameworks to support their application endeavors.

Unlike OO databases, which I think have missed their calling and now only have a place in a niche market, XML persistence mechanisms have a chance to change the way we persist data.   I don’t think relational models are going anywhere, but I think that many application domain models do not fit very well into the rectangular view of the relational schema.

I think the biggest benefit of persisting data in XML is if your schema is dynamic and requires constant, and at times, unpredictable change.  Yes, there are ways around it with relational schemas, like having generic tables that store field name/value pairs, etc…  But that’s an ugly way of covering up a true limitation of the relational schemas.  XML schema is a dynamic hierarchical model that perfectly fits that particular use case.

So being that our storage model is a perfect candidate for native XML persistence, we’re greatly experimenting with it with hopes to use it in production.

Because most enterprise applications today are written using some OO language/framework (i.e. Java, C#, etc…), developers are always looking to abstract and eliminate various concerns from their applications.  The bottom line is, applications developers are hired to write business logic not framework code.  In the current environment, where OO/Relational technologies dominate the application development domain, there are many popular frameworks that allow you to ease the impedance mismatch concerns of both.  ORM (Object Relational Mapping) frameworks are the most talked about topic in the J2EE and .NET communities (followed closely, if not tied with SOA and Spring:-).   There are many proven/robust solutions to ease the OO/Relational integration pains, out there.  Some offer more features/flexibility, some closely follow proposed standards, etc…, but either way, developers have a choice and they feel comfortable going with such framework, due to it’s popularity and proven successes.  This is yet another reason why relational models dominate the persistence world.

There seems to be a lack of a full fledged XML Persistence framework.  There are many very robust Object/XML mapping technologies (i.e. JAXB, JiBX, Castor, etc…), but they don’t truly abstract the actual persistence concerns that come with storing your data in a native XML database.  Most of the mapping technologies were developed to marshal/unmarshal XML to and from Objects.  Most are used as a part of a bigger picture, which is web services stack, as they facilitate the mapping of objects for SOAP and other remote invocations proxies that use XML as their transport protocol.

There is a need for a true XML persistence framework, that is tightly coupled with XQuery.  Why XQuery?  Well, just as the ORM frameworks are tightly coupled to SQL and one can’t really talk about one without the other, XQuery is the standard (although relatively new) for querying XML.  The day is coming, when XQuery is going to be the true standard for querying XML.  Another question is what will happen to XPath, XSLT, etc…, we’ll see.

The XQuery/OO framework needs to fill the space of transparently persisting your objects that are mapped to an XML schema.  There is the question of how much should the framework cover, and how much flexibility should it allow.  Due to the dynamic nature of XML and the fact that XQuery result sets are not structured, as are SQL result sets, it becomes a lot harder to make certain assumptions about the structure of the results sets, dynamic XQuery generation, etc…  Though, I think a fine balance lies in allowing the flexibility of writing XQuery in the most optimized way, and mapping it’s results to an object graph.  I don’t think that taking XQuery out of the equation, as some ORM frameworks take SQL out of the equation, would be a practical short term approach.  A lot of this is due to the nature of XQuery as well as the fact that the standard is relatively new and is still questionable as far as it’s implementations are concerned.  Though true optimizations of XQueries behind the scenes might not be as practical at this time, as they are in the SQL world.  I think the word iBatis comes to mind somewhere here.  No, I’m not talking about iBatis for XML, but conceptually the idea of flexibility of writing XQuery expressions and unmarshal it’s result set into an object graph.  There is of course a lot more too it, as with any other robust framework.  Enteprise features like caching, thread safety, transparent access to backend XML sources through a common interface (maybe XQJ, if only they’d add it to JSE already), etc…

In the next few posts I will explain in more detail the functionalities and benefits of native XML persistence stores, their limitations, roadblocks we encountered, my new endeavor of creating an XML persistence framework, etc…  I hope that these experiences will allow others to shorten their time for evaluating similar technologies and actually spread the word about how to effectively use native XML in your applications.


 

24 comments

  1. XML Databases

    Ilya, have a look at eXist. They have a nice web base query engine. …

  2. XML Databases

    Ilya, have a look at eXist. They have a nice web base query engine. …

  3. Interesting perspective. In a computing world increasingly dominated by markup languages it only seems natural that native XML storage should also become a topic of great interest. It’s such a common pattern in development today that we perform the following transforms: 1) Get the post’ed HTML (or XHTML) 2) Map that into POJOs 3) Perform some business logic (often fairly straightforward data validation) 4) Map those POJOs to SQL for storage Then what do we do?… Basically the reverse. I’m in basic agreement with the premise that we need a transparent POJO to XQuery capability. But, I have to admit I’ve been having heretical thoughts… With XQuery being extensible using Java – maybe the role of transforms to and from POJO are reduced in importance. I mean in the end (at least with the current state of web-oriented development) it seems the goal is to consume or produce markup in some form or fashion. Ok, I’ll be the first to admit – creating applications whos implementation looks more like a bag of globally accessable functions and a bunch of XQuery – is reminiscent of the pre-OO development of the late 80’s. But lets at least consider that if a big part of the development equation is consuming and generating angle-bracket-markup…maybe XQuery+Java Functions make more sense. I’m anxious to read more on the direction you’ve been heading in. Thanks, Ron

  4. Interesting perspective. In a computing world increasingly dominated by markup languages it only seems natural that native XML storage should also become a topic of great interest. It’s such a common pattern in development today that we perform the following transforms: 1) Get the post’ed HTML (or XHTML) 2) Map that into POJOs 3) Perform some business logic (often fairly straightforward data validation) 4) Map those POJOs to SQL for storage Then what do we do?… Basically the reverse. I’m in basic agreement with the premise that we need a transparent POJO to XQuery capability. But, I have to admit I’ve been having heretical thoughts… With XQuery being extensible using Java – maybe the role of transforms to and from POJO are reduced in importance. I mean in the end (at least with the current state of web-oriented development) it seems the goal is to consume or produce markup in some form or fashion. Ok, I’ll be the first to admit – creating applications whos implementation looks more like a bag of globally accessable functions and a bunch of XQuery – is reminiscent of the pre-OO development of the late 80’s. But lets at least consider that if a big part of the development equation is consuming and generating angle-bracket-markup…maybe XQuery+Java Functions make more sense. I’m anxious to read more on the direction you’ve been heading in. Thanks, Ron

  5. Ron, I agree. I’ve found that there are times when unmarshalling the data into an OO model is an overkill for simple operations that include things like validation and are basically CRUD based operations without a lot of business logic. I think in the initial application design these should be evaluated and as you said, direct XQuery operations on such data, with maybe embedded java callbacks would do the trick. XQuery also has stored procedures that might be sufficient enough for some business logic operations, without the use of java callbacks. I think in that scenario, as service interface that simply delegates to some helper class responsible for constructing and running XQuery operations on the data is a great design decision. There are many cases, when applying business logic is better served in the OO domain model, and though the need for a true XQuery/Object persistence framework. You can take a look at XQOM within the next few weeks. It’s something we’ve been working on for a while now and I plan on releasing as open source shortly (www.xquerynow.com/XQOM). There is not much there now, but check back in a few weeks if you’re interested. Also, my next few blog posts will focus on the features of such framework and its use cases. Thanks. Ilya Sterin

  6. Ron, I agree. I’ve found that there are times when unmarshalling the data into an OO model is an overkill for simple operations that include things like validation and are basically CRUD based operations without a lot of business logic. I think in the initial application design these should be evaluated and as you said, direct XQuery operations on such data, with maybe embedded java callbacks would do the trick. XQuery also has stored procedures that might be sufficient enough for some business logic operations, without the use of java callbacks. I think in that scenario, as service interface that simply delegates to some helper class responsible for constructing and running XQuery operations on the data is a great design decision. There are many cases, when applying business logic is better served in the OO domain model, and though the need for a true XQuery/Object persistence framework. You can take a look at XQOM within the next few weeks. It’s something we’ve been working on for a while now and I plan on releasing as open source shortly (www.xquerynow.com/XQOM). There is not much there now, but check back in a few weeks if you’re interested. Also, my next few blog posts will focus on the features of such framework and its use cases. Thanks. Ilya Sterin

  7. I’ve also read your post on TheServerSide.com, and I have to say, you’ve got some very very good points. What intriges me is the role of some other new tecnologies/specifications like MOF and OCL in the scenary you propose. I mean, MOF/UML defines structures and basic integrity rules. OCL defines advanced/complex integrity rules. OCL also defines queries for retriving objects. These sound really the same purpose of XML Schema and XQuery. (Of course I am ignoring other uses and features of both OMG stuff and W3C stuff) Basically, to the sole purpose of defining data structures wouldn’t MOF/UML + OCL be the same as XML Schema + XQuery ? I would like to hear your opnion on that. Also if you know any discussion on that specific topic, it would be nice to be pointed there. Again, very nice work. Pedro

  8. I’ve also read your post on TheServerSide.com, and I have to say, you’ve got some very very good points. What intriges me is the role of some other new tecnologies/specifications like MOF and OCL in the scenary you propose. I mean, MOF/UML defines structures and basic integrity rules. OCL defines advanced/complex integrity rules. OCL also defines queries for retriving objects. These sound really the same purpose of XML Schema and XQuery. (Of course I am ignoring other uses and features of both OMG stuff and W3C stuff) Basically, to the sole purpose of defining data structures wouldn’t MOF/UML + OCL be the same as XML Schema + XQuery ? I would like to hear your opnion on that. Also if you know any discussion on that specific topic, it would be nice to be pointed there. Again, very nice work. Pedro

  9. Pedro, these specs are not too new:-) and they never really materialized as they maybe should of. I really am not familiar with these specs, other than their definitions. Conceptually, they should allow achieving the same in the OO world. I don’t know their limitations though, sorry. I mean, in reality, we could use a generic OO model spec, like MOF, combined with OCL, though allowing for a portable domain model that can be persisted. With OCL (again, I’m not familiar with the details of the spec), you can query the objects in a generic way. I’d be interested to see if there are some OO database implementations based on these standards. I mean, it would be nice if we can just persist and communicate in the same generic specification format, though where XML shines these days. If you saw some of my going back and forth with Constantin on therserverside.com, you’ll see that although he strongly stands by his opinions, we don’t really disagree. XML is simply great due to its adoption by the industry. Of course it’s a nicely designed spec in itself, though not very efficient in many scenarios. But your requirements would dictate whether you are willing to trade efficiency for portability. I can imagine that a persistence model designed around MOF/OCL would have the same efficiency issues, but again, if we can bypass a level of impedance mismatch, but again, wide industry adoption would be needed, where we can not only persist OO models using these specifications, but also allow interoperability between different platforms and languages, with support by all major vendors. Just my .02 cents.

  10. Pedro, these specs are not too new:-) and they never really materialized as they maybe should of. I really am not familiar with these specs, other than their definitions. Conceptually, they should allow achieving the same in the OO world. I don’t know their limitations though, sorry. I mean, in reality, we could use a generic OO model spec, like MOF, combined with OCL, though allowing for a portable domain model that can be persisted. With OCL (again, I’m not familiar with the details of the spec), you can query the objects in a generic way. I’d be interested to see if there are some OO database implementations based on these standards. I mean, it would be nice if we can just persist and communicate in the same generic specification format, though where XML shines these days. If you saw some of my going back and forth with Constantin on therserverside.com, you’ll see that although he strongly stands by his opinions, we don’t really disagree. XML is simply great due to its adoption by the industry. Of course it’s a nicely designed spec in itself, though not very efficient in many scenarios. But your requirements would dictate whether you are willing to trade efficiency for portability. I can imagine that a persistence model designed around MOF/OCL would have the same efficiency issues, but again, if we can bypass a level of impedance mismatch, but again, wide industry adoption would be needed, where we can not only persist OO models using these specifications, but also allow interoperability between different platforms and languages, with support by all major vendors. Just my .02 cents.

  11. Just curious to know. Have you considered Fast Infoset as an alternate transport for plain XML?

  12. Just curious to know. Have you considered Fast Infoset as an alternate transport for plain XML?

  13. Pedro, we haven’t yet looked at it, as currently our requirements are fulfilled with standard XML transport. The only network latency that we currently encounter is streaming of XML data from server to browser. But because the browsers currently have no support of Fast Infoset, that’s out of the question. Plus browsers support data compression either way, so we could compress the XML, and I’m not positive about the performance differences between Fast Infoset vs. compression, since I think the biggest plus side would be network latency reduction, which can be handled by compression probably just as efficiently. The other performance enhancement to Fast Infoset is the fact that it transports the actual serialized infoset, encoded as Fast Infoset specification, though I can see where the parser would benefit from not having to actually form the infoset while parsing, but rather deserializing the one that’s streamed to it. Either way, in our scenario, I think this small performance enhancement is currently not very significant. I will investigate it further when we get some time and all other major requirements are worked out. Thanks for the posts; you really got me thinking here:-)

  14. Pedro, we haven’t yet looked at it, as currently our requirements are fulfilled with standard XML transport. The only network latency that we currently encounter is streaming of XML data from server to browser. But because the browsers currently have no support of Fast Infoset, that’s out of the question. Plus browsers support data compression either way, so we could compress the XML, and I’m not positive about the performance differences between Fast Infoset vs. compression, since I think the biggest plus side would be network latency reduction, which can be handled by compression probably just as efficiently. The other performance enhancement to Fast Infoset is the fact that it transports the actual serialized infoset, encoded as Fast Infoset specification, though I can see where the parser would benefit from not having to actually form the infoset while parsing, but rather deserializing the one that’s streamed to it. Either way, in our scenario, I think this small performance enhancement is currently not very significant. I will investigate it further when we get some time and all other major requirements are worked out. Thanks for the posts; you really got me thinking here:-)

  15. In one of your future installments, I’d like to hear your opinions about Sleepycat BDB XML. It supports XQuery 1.0 and XPath 2.0, XML Namespaces, schema validation, naming and cross-container operations and document streaming… (copied from the website). I’ve been writing a framework for rapid prototyping of RESTful projects, and I’m using BDB XML for my persistance engine. XSL trasformations are the primary view provide a simple and clean architecture. For applications with little business logic, marshalling and unmarshalling to objects, then mapped to a relational schema just seems like too much work. I want the data returned from the server in XML, so why not just store it that way? So far, the framework is basically working, and just a handful of classes to process RESTful requests, handle persistance, and a Filter for XSL transformations, if desired.

  16. In one of your future installments, I’d like to hear your opinions about Sleepycat BDB XML. It supports XQuery 1.0 and XPath 2.0, XML Namespaces, schema validation, naming and cross-container operations and document streaming… (copied from the website). I’ve been writing a framework for rapid prototyping of RESTful projects, and I’m using BDB XML for my persistance engine. XSL trasformations are the primary view provide a simple and clean architecture. For applications with little business logic, marshalling and unmarshalling to objects, then mapped to a relational schema just seems like too much work. I want the data returned from the server in XML, so why not just store it that way? So far, the framework is basically working, and just a handful of classes to process RESTful requests, handle persistance, and a Filter for XSL transformations, if desired.

  17. Don, Part 2 has a pretty thorough evaluation of XML databases. I will have it done sometime next week. Your framework sounds interesting. I fully support RESTful services and possibly a thin transaction script service layer that implements the small amounts of business logic. In your case, full blown domain model implementation probably doesn’t make sense. You have a great case for storing XML natively, your view layer is XSLT, and there is no need for relational persistence, that I can see. I’d like to hear more about your framework. Is this something you plan on releasing to Open Source? Take a look at my other entry about Ajax/XSLT views and let me know your thoughts… http://www.ilyasterin.com/enteprise_software/2006/09/ajaxxslt_views.html

  18. Don, Part 2 has a pretty thorough evaluation of XML databases. I will have it done sometime next week. Your framework sounds interesting. I fully support RESTful services and possibly a thin transaction script service layer that implements the small amounts of business logic. In your case, full blown domain model implementation probably doesn’t make sense. You have a great case for storing XML natively, your view layer is XSLT, and there is no need for relational persistence, that I can see. I’d like to hear more about your framework. Is this something you plan on releasing to Open Source? Take a look at my other entry about Ajax/XSLT views and let me know your thoughts… http://www.ilyasterin.com/enteprise_software/2006/09/ajaxxslt_views.html

  19. Hi, ilyasterin. I read your article in TSS, and I’ve seen curious that other persons have the “same” ideas about persistence in NXDs. I am concluding a framework OOXML in Java, that it will be published in next month in java.net called JNXD – Java Persistence Framework for NXDs. A different approach in my project was that it doesn´t exist XML-like-mapping to Java objects. I use XML:DB API to handle XML query requests to an NXD(exist database was already tested). XStream API to serialize/deserialize objects from/to XML documents directly, without any mapping. The performance was incredibly high. So, I hope this project turn on a lamp in many brains in the Java World community. So let be written, so let be done.

  20. Hi, ilyasterin. I read your article in TSS, and I’ve seen curious that other persons have the “same” ideas about persistence in NXDs. I am concluding a framework OOXML in Java, that it will be published in next month in java.net called JNXD – Java Persistence Framework for NXDs. A different approach in my project was that it doesn´t exist XML-like-mapping to Java objects. I use XML:DB API to handle XML query requests to an NXD(exist database was already tested). XStream API to serialize/deserialize objects from/to XML documents directly, without any mapping. The performance was incredibly high. So, I hope this project turn on a lamp in many brains in the Java World community. So let be written, so let be done.

  21. I’d be interested to see what you’re doing… Why did you go with XML:DB API? This API is going to die in favor of XQuery, and even now it’s not supported by all XML DB vendors. XQuery is and is alot richer in capabilities, etc… Also, XQJ will become the JDBC-like in the XQuery world. I’m not sure how you serialize/deserialize your result sets, if you don’t have any way of mapping your results to an object model. Do you just create some generic object model that abstracts XML behind OO? XQOM actually supports any POJO domain model you might have, so all you have to do is map the result set to the OO model you define and the result will be unmarshalled into such. Ilya

  22. I’d be interested to see what you’re doing… Why did you go with XML:DB API? This API is going to die in favor of XQuery, and even now it’s not supported by all XML DB vendors. XQuery is and is alot richer in capabilities, etc… Also, XQJ will become the JDBC-like in the XQuery world. I’m not sure how you serialize/deserialize your result sets, if you don’t have any way of mapping your results to an object model. Do you just create some generic object model that abstracts XML behind OO? XQOM actually supports any POJO domain model you might have, so all you have to do is map the result set to the OO model you define and the result will be unmarshalled into such. Ilya

  23. XML:DB API was a nice iniative, but you’re right, it’s dying. A refactoring to use XQJ is very important. Result sets retrieved by XQuery, in my project, are wrapped in a general Value Object(or Composite GOF) to abstract the domain POJO objects. But if XQuery is executed in a unique domain in the hieranrchy of XMLs documents in the NXD, so that only one type of XMLs is returned in result sets, the XML are deserialized directly to POJOs. In XStream API, the class of an object serialized is stored in XML content. And, it’s through this mechanism, that the deserialization is possible. So, only XQuery that extracts pieces of documents in more than one hierarchy results in new XML, that cannot be deserialized to a list of specific POJO. In this case, a ValueObject or Composite design pattern is used to encapsulate the result set. Another approach can be a mapping XML-POJO like your project. I persist to “no XML mapping” to POJOs.

  24. XML:DB API was a nice iniative, but you’re right, it’s dying. A refactoring to use XQJ is very important. Result sets retrieved by XQuery, in my project, are wrapped in a general Value Object(or Composite GOF) to abstract the domain POJO objects. But if XQuery is executed in a unique domain in the hieranrchy of XMLs documents in the NXD, so that only one type of XMLs is returned in result sets, the XML are deserialized directly to POJOs. In XStream API, the class of an object serialized is stored in XML content. And, it’s through this mechanism, that the deserialization is possible. So, only XQuery that extracts pieces of documents in more than one hierarchy results in new XML, that cannot be deserialized to a list of specific POJO. In this case, a ValueObject or Composite design pattern is used to encapsulate the result set. Another approach can be a mapping XML-POJO like your project. I persist to “no XML mapping” to POJOs.

Leave a comment