Thursday, February 09, 2006

Hibernate duplicates with join fetch

It's a well-documented behavior (here and here) of Hibernate that if you do a "join fetch" HQL query, then use query.list() to get the results, there will be "duplicate" objects in the result. The recommended way to get rid of these duplicates is to put them in a HashSet. Note as mentioned in the second link above, adding the distinct keyword doesn't help here - presumably because the result set in this case is already returning distinct rows.

A couple comments:

(1) Fine, I can put my objects in a HashSet, and that works. The problem is performance. If the number of results that come back from query.list() is large (which is likely to happen if I'm doing a join fetch to many tables), then populating all the objects and then culling duplicates by putting them in a HashSet takes too long. I'm not sure what the solution (other than writing my own SQL and building the objects myself) would be.

(2) It's confusing that the Hibernate documentation and comments (as well as the Hibernate in Action book) on forums imply that this issue is specific to outer join fetching. It's exactly the same with inner joins.

(3) I'm still not seeing why Hibernate has this behavior. To take the canonical Hibernate example of Items and Bids, assume I do this:

session.createQuery("from item i join fetch i.bid where i.itemId = 1").list()

and assume that the item I'm getting has 5 bids. The result from list() is 5 items, each with 5 bids. Each item is identical (which is why putting the results in a HashSet works). My (naive?) reaction to this is: if Hibernate is being clever enough to take all 5 bids and put them in each item, why can't it just return that one item? Or why isn't there a way to tell Hibernate to do this as it's iterating the result set, before it creates all the objects?

9 Comments:

Anonymous Anonymous said...

Another example. If you try to execute something like
"SELECT h, o FROM House h JOIN h.owner AS o JOIN FETCH o.children",
and if any owner has 12 children, using of HashSet doesen't help. You will get a set of 12 arrays of Object.

November 8, 2007 at 3:24 AM  
Anonymous Anonymous said...

Did you override ... you know ... hashCode and equals? Just asking :)

Any idea what can help for your example?

July 1, 2008 at 12:09 PM  
Blogger Kevin said...

Since my original post I learned about this method: setResultTransformer on Hibernate objects including Criteria and Query. For example:

criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);

This FAQ on hibernate.org is relevant:

http://www.hibernate.org/117.html#A12

July 1, 2008 at 12:55 PM  
Anonymous fredv said...

Just another one of those hibernate WTF's, and writing my own SQL the old way was worse because?

June 15, 2010 at 4:55 AM  
Anonymous fredv said...

Aha guys, I found it, in your JOIN FETCH query you have to say DISTINCT.

So, eg.

SELECT DISTINCT t FROM Tree t JOIN FETCH t.leaves

The distinct will avoid the duplicates problem without using a resulttransformer.

June 15, 2010 at 5:14 AM  
Anonymous John said...

Putting DISTINCT in my query worked for me too. Just like FredV's example.

April 11, 2011 at 11:07 AM  
Blogger Fabio Araujo said...

Just great!
Thanks!

July 12, 2011 at 8:48 AM  
Anonymous Anonymous said...

If I use Distinct I get an exception that it is not comparable.

June 14, 2012 at 2:42 PM  
Anonymous Anonymous said...

from hibernate FAQ: https://community.jboss.org/wiki/HibernateFAQ-AdvancedProblems#Hibernate_does_not_return_distinct_results_for_a_query_with_outer_join_fetching_enabled_for_a_collection_even_if_I_use_the_distinct_keyword

List result = session.createQuery("select distinct o from Order o left join fetch o.lineItems").list();



The last one is special. It looks like you are using the SQL DISTINCT keyword here. Of course, this is not SQL, this is HQL. This distinct is just a shortcut for the result transformer, in this case. Yes, in other cases an HQL distinct will translate straight into a SQL DISTINCT. Not in this case: you can not filter out duplicates at the SQL level, the very nature of a product/join forbids this - you want the duplicates or you don't get all the data you need.

October 2, 2012 at 6:56 AM  

Post a Comment

<< Home