When NOT to use Django's JSONField
Like many other engineers I do a lot of code review every week. At my current place of work we require at least 1 approval before work is allowed to be merged. Last week I received a code review request which involved adding a JSONField to an API. The interaction was memorable and ended in the merge request being closed in favour of a different approach. I wanted to record it here to refer back to it in the future.
The project is a straight forward django rest framework backend and a VueJS frontend. We've created a REST-ful API that uses JSON to do simple CRUD operations on a few django models. The new feature I was reviewing aims to add a field to track modifications to several fields.
Simply speaking the feature can be described as;
Given I am a user
And the form has value '10'
When I save the form with value '20'
And I reload the page
And I press the 'Reset' button
Then the form has value '10'
In a previous iteration we already made it so the form update uses the API to update the model in the backend and thus when the form is reloaded the new value is displayed. However when the 'Reset' button is clicked the original value of '10' is not known anymore, it's been overwritten with value '20'.
The opened merge request was proposing the following addition to the model:
original_values = models.JSONField(blank=True, null=True)
The idea being that when one of the resetable fields would be updated
we'd save the original value into original_values
and be able to
retrieve it from there when the reset button is clicked. At first
glance it seemed like a good idea because the frontend could manage the
values, however after thinking and talking about it I identified
several problems.
Firstly, its contents would be implicit and not strictly defined. Every
time you're reading the field you need conditionals to check if the key
you're looking for exists, if it doesn't the field was not updated, if
it does you can reset it with that value. Further there's an implicit
link between the field name and the JSON key, and if that field name
would change then you'd better also update all of the original_values
that have that key or otherwise your original value is still lost.
Secondly, the JSONField implementation does not have validation out of the box, because it is freeform. You'd have to manually write validation logic to make sure fields that are numerical are not updated with non-numerical types. The same applies to the key names, only a restricted set is allowed to be used lest the key/values never be used to actually reset a field.
Lasty, it makes reasoning about the history of a field fragmented and
surprising. When a user saves the form 3 times do you 'reset' to the
first value or the previous value of the field? How do you know that
original_values
was not touched or updated along the way?
In the end we decided that since we're only resetting 3 fields to their
original values (the very first value of the field) we'd be better off
adding non-editable mirror fields instead (prefixed with original_
).
Those are explicit, come with validation and are easy to reason about.
And honestly 3 extra columns in the database is a good trade-off
against implicitness and surprise.