Loading...

The Inevitable Consequence of a Numerical Id

JPA
July 15, 2021
4 minutes to read
Share this post:

In many (JPA) applications numerical ids are chosen for the surrogate key of the entities. But how do we make sure that they are not used twice? In a scenario where our application needs to scale horizontally we need a solution for that. Most developers come to the conclusion that the database should take care of that. But this is a fragile solution and in this article I want to discuss it.

Additional Database Calls

Let’s have a look what happens in the database when we use a numerical id. In this example I create a simple entity with a generated numeric primary key and save two instances of it.

@Entity
class Flat {
	@Id @GeneratedValue var id: Long? = null
}

interface FlatRepository : JpaRepository<Flat, Long>

@Service
class FlatService(
	private val repository: FlatRepository
) {

	@PostConstruct 
	fun createFlat() { 
		repository.save(Flat())
		repository.save(Flat())
	} 
}

When we look at the log output of org.hibernate.SQL we see the following:

2021-07-14 16:36:40.954 DEBUG 88572 --- [main] org.hibernate.SQL : call next value for hibernate_sequence
2021-07-14 16:36:40.973 DEBUG 88572 --- [main] org.hibernate.SQL : insert into flat (id) values (?)
2021-07-14 16:36:40.979 DEBUG 88572 --- [main] org.hibernate.SQL : call next value for hibernate_sequence
2021-07-14 16:36:40.980 DEBUG 88572 --- [main] org.hibernate.SQL  : insert into flat (id) values (?)

So, every save of the entity caused two round-trips to the database. One for gaining the next free id by calling the sequence and one for the actual insert. In the past I ran into performance issues because of that behavior.

The optimization we did was using a Hi/Lo algorithm :

@Entity
class Flat {
	@Id
	@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "hilo_sequence_generator")
	@GenericGenerator(
		name = "hilo_sequence_generator",
		strategy = "sequence",
		parameters = [
			Parameter(name = "sequence_name", value = "hilo_sequence"),
			Parameter(name = "initial_value", value = "1"),
			Parameter(name = "increment_size", value = "3"),
			Parameter(name = "optimizer", value = "hilo")
		]
	)
	var id: Long? = null
}

The idea is that we request multiple ids simultaneously from the sequence - even before we actually need them. Another instance of our application would simply receive the next chunk of free ids and by that we can reduce the calls on the sequence. But this is just minimizing and not mitigating the problem.

Also, an external viewer might expect that the order of the id can be used to determine the order of creation which is not the case when we apply such a strategy.

The Collections Problem

Implementing equals() and hashcode() for JPA entities is tricky. When you load the same entity twice in a session you have two instances representing the same row of data. The default implementation of equals() and hashcode() compares the object identity. Now, when we use these objects in a Set we’d have both objects in there. To avoid that we usually implement equals() and hashcode() by comparing the id of the object.

But here it becomes tricky with ids generated in the database. We don’t have an id until we saved the object in the database. In Javas’ collection API we must base the implementation of equals() and hashcode() on immutable fields for the lifetime of the Collection. Otherwise, we produce strange behavior. In this example you can see that the set “lost” the object because we changed the id:

@Test
fun `Changing the id fails in a Set`() {
	val flat = Flat()
	val set = HashSet<Flat>().apply { add(flat) }
	
	Assertions.assertThat(set.contains(flat)).isTrue
	
	flat.id = 1 //We set the id, like Hibernate would do it when saving the entity
	
	Assertions.assertThat(set.contains(flat)).isFalse
}

The problem above derives from the fact that we let Hibernate create the id of our entity. Don’t do that. Of course, we could introduce an IdGenerator class that takes care of the id generation at the construction of an instance, but we’d also need to make sure to synchronize this generation over all instances of our application somehow. The better approach is to use an UUID. Why and which consequences that has, I discuss in this post .

Have you heard of Marcus' Backend Newsletter?

New ideas. Twice a week!
Top