In many (JPA) applications numerical ids are chosen for the surrogate key of the entities. But how do we make sure that they are not used twice? In a scenario where our application needs to scale horizontally we need a solution for that. Most developers come to the conclusion that the database should take care of that. But this is a fragile solution and in this article I want to discuss it.
Additional Database Calls
Let’s have a look what happens in the database when we use a numerical id. In this example I create a simple entity with a generated numeric primary key and save two instances of it.
When we look at the log output of
org.hibernate.SQL we see the following:
2021-07-14 16:36:40.954 DEBUG 88572 --- [ main] org.hibernate.SQL : call next value for hibernate_sequence 2021-07-14 16:36:40.973 DEBUG 88572 --- [ main] org.hibernate.SQL : insert into flat (id) values (?) 2021-07-14 16:36:40.979 DEBUG 88572 --- [ main] org.hibernate.SQL : call next value for hibernate_sequence 2021-07-14 16:36:40.980 DEBUG 88572 --- [ main] org.hibernate.SQL : insert into flat (id) values (?)
So, every save of the entity caused two round-trips to the database. One for gaining the next free id by calling the sequence and one for the actual insert. In the past I ran into performance issues because of that behavior.
The optimization we did was using a Hi/Lo algorithm:
The idea is that we request multiple ids simultaneously from the sequence - even before we actually need them. Another instance of our application would simply receive the next chunk of free ids and by that we can reduce the calls on the sequence. But this is just minimizing and not mitigating the problem.
Also, an external viewer might expect that the order of the id can be used to determine the order of creation which is not the case when we apply such a strategy.
The Collections Problem
hashcode() for JPA entities is tricky. When you load the same entity twice in a session you have two instances representing the same row of data. The default implementation of
hashcode() compares the object identity. Now, when we use these objects in a
Set we’d have both objects in there. To avoid that we usually implement
hashcode() by comparing the id of the object.
info I recommend reading the discussion about equals and hashcode in Hibernates’ documentation
But here it becomes tricky with ids generated in the database. We don’t have an id until we saved the object in the database. In Javas’ collection API we must base the implementation of
hashcode() on immutable fields for the lifetime of the Collection. Otherwise, we produce strange behavior. In this example you can see that the set “lost” the object because we changed the id:
The problem above derives from the fact that we let Hibernate create the id of our entity. Don’t do that. Of course, we could introduce an
IdGenerator class that takes care of the id generation at the construction of an instance, but we’d also need to make sure to synchronize this generation over all instances of our application somehow. The better approach is to use an
UUID. Why and which consequences that has, I’ll discuss in another post.