Avocado defines an abstract class named
Lexicon. It is a common practice when normalizing a data model to break out repeated finite sets of terms within a column into their own table. This is quite obvious for entities such as books and authors, but less so for commonly used or enumerable terms.
id | name | birth_month ---+------+------------ 1 Sue May 2 Joe Jun 3 Bo Jan 4 Jane Apr ...
The above shows a table with three columns
There are some inherent issues with
- Months have an arbitrary order which makes it very difficult to order the rows by
birth_monthsince they are ordered lexicographically by default
- As the table grows (think millions) the few bytes of disk space each repeated string takes up starts having a significant impact
- The cost of querying for the distinct months within the population gets increasingly more expensive as the table grows
- As the table grows, the cost of table scans increases since queries are acting on strings rather than an integer (e.g. a foreign key)
Although the above example is somewhat contrived, the reasons behind this type of normalization are apparent.
To implement, subclass and define the
from avocado.lexicon.models import Lexicon class Month(Lexicon): label = models.CharField(max_length=20) value = models.CharField(max_length=20)
A few of the advantages include:
- Define an arbitrary
orderof the items in the lexicon
- Define an integer
codewhich is useful for downstream clients that prefer working with a enumerable set of values such as SAS or R
- Define a verbose/more readable label for each item
- For example map Jan to January
In addition, Avocado treats Lexicon subclasses specially since it is such a common practice to use them. They are used in the following ways:
- Performing an
initwill create a
DataFieldinstance for the primary key of the Lexicon
orderfield will be used whenever appropriate for ordering the lexicon items
labelfield will be used when accessing
f.labels()and for free-texting searches using
codefield will be used when accessing
Lexicon class also comes with an extra method on it's manager called
reorder which reorders the items in the lexicon and updates the
order value of each item with the new sort index. This is generally only necessary if items are added to the set and the ordering needs to be updated. The method takes the same arguments as
key can also be a string corresponding to a built-in key function.
Performance Note: The entire lexicon is loaded into memory, sorted, and each item is saved. This should rarely every be an issue assuming your the lexicon is not millions of items in size.
Built-in Key Functions
- This relies on the
valuefield for each object and attempts to coerce it to a float (in case numbers are represented as strings..) and falls back to itself if a
- This relies on the