Thursday, February 09, 2006

Hibernate duplicates with join fetch

It's a well-documented behavior (here and here) of Hibernate that if you do a "join fetch" HQL query, then use query.list() to get the results, there will be "duplicate" objects in the result. The recommended way to get rid of these duplicates is to put them in a HashSet. Note as mentioned in the second link above, adding the distinct keyword doesn't help here - presumably because the result set in this case is already returning distinct rows.

A couple comments:

(1) Fine, I can put my objects in a HashSet, and that works. The problem is performance. If the number of results that come back from query.list() is large (which is likely to happen if I'm doing a join fetch to many tables), then populating all the objects and then culling duplicates by putting them in a HashSet takes too long. I'm not sure what the solution (other than writing my own SQL and building the objects myself) would be.

(2) It's confusing that the Hibernate documentation and comments (as well as the Hibernate in Action book) on forums imply that this issue is specific to outer join fetching. It's exactly the same with inner joins.

(3) I'm still not seeing why Hibernate has this behavior. To take the canonical Hibernate example of Items and Bids, assume I do this:

session.createQuery("from item i join fetch where i.itemId = 1").list()

and assume that the item I'm getting has 5 bids. The result from list() is 5 items, each with 5 bids. Each item is identical (which is why putting the results in a HashSet works). My (naive?) reaction to this is: if Hibernate is being clever enough to take all 5 bids and put them in each item, why can't it just return that one item? Or why isn't there a way to tell Hibernate to do this as it's iterating the result set, before it creates all the objects?