Working with the Django admin and legacy databases, pt.3

Django    2009-05-01

Part 1 | Part 2

After you've run inspectdb and done all your syncing and basic admin setup, take a peek at the models you generated - depending on the state of your legacy db, you've probably wound up with something that looks like this (column names have been changed to protect the innocent, although I'm not sure that the so-called architects who created these tables deserve anyone's protection):

    class MyUsers(models.Model):
        username = models.CharField(max_length=90, db_column='Username') # Field name made lowercase.
        password = models.CharField(max_length=450, db_column='Password') # Field name made lowercase.
        email = models.CharField(max_length=150, db_column='Email') # Field name made lowercase.
        first_name = models.CharField(max_length=300, db_column='First_Name') # Field name made lowercase.
        last_name = models.CharField(max_length=300, db_column='Last_Name') # Field name made lowercase.
        birthday = models.DateField(db_column='U_Birthday') # Field name made lowercase.
        company = models.CharField(max_length=450, db_column='Company') # Field name made lowercase.
        title = models.CharField(max_length=450, db_column='Title', blank=True) # Field name made lowercase.
        address = models.CharField(max_length=450, db_column='Address') # Field name made lowercase.
        address2 = models.CharField(max_length=450, db_column='Address2') # Field name made lowercase.
        city = models.CharField(max_length=225, db_column='City') # Field name made lowercase.
        state = models.CharField(max_length=225, db_column='State') # Field name made lowercase.
        zip = models.CharField(max_length=75, db_column='Zip') # Field name made lowercase.
        member_number = models.IntegerField(db_column='Member_Number') # Field name made lowercase.
        member_status = models.CharField(max_length=24, db_column='Member_status') # Field name made lowercase.
        display_name = models.CharField(max_length=765)
        active = models.CharField(max_length=30)
        couponcode = models.CharField(max_length=150, db_column='couponCode') # Field name made lowercase.
        last_modified = models.DateTimeField()
        subscribed_on = models.DateTimeField()
        created = models.DateTimeField()
        email_verify = models.CharField(max_length=30, blank=True)
        priority_group = models.CharField(max_length=3, blank=True)
        class Meta:
            db_table = u'My_Users'

For the record, my traversal produced upwards of 80 model classes, all in similar states of disarray - I had a lot of cleanup to do just to get the admin to stop barfing every time I fired it up. Here's a sort of informal checklist of the things you should look over:

Primary Keys

First things first - wherever you have a model with a primary key named "id", just delete it, e.g.:

    class NewsContent(models.Model):
        id = models.IntegerField()
        title = models.TextField(blank=True)
        ...

Django already assumes that your table has an "id" column and defaults to an automatic primary key field - you don't need to define it in the class.

    class NewsContent(models.Model):
        title = models.TextField(blank=True)
        ...

On the other hand, if you have a primary key field that's not named "id", you need to pass "primary_key=True" as a field option:

        member_number = models.IntegerField("user id", primary_key=True, db_column='Member_Number')

What if there's no primary key?

Here's something I hope none of you run into - a few tables without primary keys:

    class NewsPriority(models.Model):
        tags = models.ForeignKey(NewsTags)
        content = models.ForeignKey(NewsContent)
        rank = models.IntegerField()

I need to complete this project with as few db changes as possible - any change to one database means a change across several dozen more.

But in this case I don't have a choice:

    "Each model requires exactly one field to have primary_key=True."

In an ideal world, my primary key field for the model above would be the 'tags' and 'content' concatenated. Last week, I re-opened a discussion about the possibility of creating concatenated primary keys on the Django users group - if you have any ideas, that'd be a great place for feedback.

In the meantime, we're just going to add an "id" column to this table - and add a 'unique_together' to the model's Meta class:

        class Meta:
            db_table = u'news_priority'
            unique_together = (("tags", "content"),)

max_length

Here's something to be on the lookout for: I found that I had to adjust the max_length values on every CharField - they were set to values 3x the length set for the varchar columns in my tables. I don't know if that's a bug in the inspectdb script - I haven't looked - or if the Django developers are just trying to keep us on our toes. Just something to be aware of.

Changing field types

In a few cases, fields that could have been booleans output as other field types:

    # type = SMALLINT, inserting '0' or '1' programmatically
    approved = models.IntegerField(null=True, blank=True)
    active = models.IntegerField(null=True, blank=True)

    # type = enum, values of '0' or '1'
    approved = models.CharField(max_length=3, blank=True)
    active = models.CharField(max_length=3, blank=True)

    approved = models.BooleanField(default=False)
    active = models.BooleanField(default=False)

I also found a few places where the script gave me TextFields when what I really wanted were CharFields:

    title = models.TextField(blank=True)
    title = models.CharField(max_length=255)

The lesson here: Go through your model classes with a fine-toothed comb and make sure they're in line with what's actually in your table definitions.

ForeignKey relationships

ForeignKey (and other) relationships have to be defined - inspectdb won't do that for you. You might need to shuffle the order of your model classes - the model you're creating a relationship with needs to have been defined already.

    class MediaType(models.Model):
        name = models.CharField(max_length=135)
        class Meta:
            db_table = u'media_type'

    class MediaContent(models.Model):
        media_name = models.CharField(max_length=255)
        type = models.ForeignKey(MediaType)
        class Meta:
            db_table = u'media_content'

Stripping "_id" off of some of the field names:

    class NewsContentType(models.Model):
        name = models.CharField(max_length=50)
        class Meta:
            db_table = u'news_content_type'

    class NewsContent(models.Model):
        title = models.CharField(max_length=255)
        source = models.CharField(max_length=255)
        body = models.TextField()
        content_type_id = models.ForeignKey(NewsContentType)

becomes:

        content_type = models.ForeignKey(CmsContentType)

And in some cases you might want to have a field default to a relationship with Django's User model:

    from django.contrib.auth.models import User

from:

        author_id = models.IntegerField(null=True, blank=True)
        approver_id = models.IntegerField(null=True, blank=True)

to:

        author = models.ForeignKey(User, editable=False)
        approver = models.ForeignKey(User, editable=False)

Neatening things up for admin presentation

Add __unicode__() methods and, in some cases, verbose names to the Meta classes:

    class NewsContent(models.Model):
        title = models.CharField(max_length=255)
        source = models.CharField(max_length=255)
        body = models.TextField()

        def __unicode__(self):
            return u'%s | %s' % (self.title, self.source)

        class Meta:
            db_table = u'news_content'
            verbose_name = 'Content'
            verbose_name_plural = 'Content'

Other field options

Verbose field names (optional, but if you use them they have to be the first argument passed in - see the documentation here):

    birthday = models.DateField("date of birth", db_column='Birthday') 

Other db changes

As I mentioned, we're trying to keep db changes to a minimum. At the same time, we are looking at this as an opportunity to tighten things up wherever the impact will be negligible.

We have a lot of tables that include user identifiers - things like author_id and approver_id. We're removing those - maintaining those columns with their existing user id values would have meant a data migration headache (either creating new admin users and reassigning their id's to match old data, or changing the old data to match the new user ids). Besides, transaction records with user_id's are saved to django_admin_log - on the rare occasion that we need to look for the source of a content change, we can find it there.

On the database side, I did find some tables with id columns that were not set as primary keys and didn't autoincrement, so that had to change.

One additional thing, and this is just a personal preference: As I've found changes I know I'll need to make to tables, I've been leaving detailed notes in the docstrings for each model. I'm working on a set of test databases now, but eventually we will need to do some migration work before we move this into production - those notes'll come in handy then.

Next up: the really fiddly stuff

At this point, I've got a lovely demo working on top of just one database. I've created a few test groups and user accounts with specific permissions - the idea is to allow clients limited access to their site content.

But we have more than one client, and the idea is to be able to run a single instance of the admin for all of them. So now we have to solve the multitenancy problem. In a perfect world, we'd have one set of model classes (maybe not even abstracts) and their corresponding admin classes, one master db handling auth and site data for all the rest. We need multi db. Or we need to be able to change db connections at runtime (yeah, I know, no).

It's starting to sound like Django might not be the right solution for a project of this scope, as much as I'd like it to be. But I'm pretty happy with what we've got so far - it'll be useful for other projects going forward.

So I'll let you know how the multitenancy thing goes - and if you have any ideas for how to approach this, by all means, feel free to chime in.