Customizing a Django RSS Feed

Mar 09, 2015 Django Django1.3

The Framework: Django 1.3 (not my choice!)

The Mission: Create an RSS feed that includes an extra field for full story content, plus a few additional fields for images

I know. It's 2015 and we're still on 1.3. But the changes to Django syndication since 1.3 haven't been that dramatic, so you might find this useful if you're in the same boat.

ADDING CUSTOM FIELDS

Let's start with the methods needed to add some custom content to a <content:encoded> element.

The Django feed library comes with a set of standard elements for which you must define the content: <title>, <link>, and <description> for the feed, and then of course <title>, <link>, and <description> for individual feed items.

In our use case, we have a feed containing a list of news stories. We're already sending a truncated version of each story's content to <description>, but we want to add an additional field - <content:encoded> - to return the story's full content.

To add an additional element (or two or three), there are a few places you'll need to update - two (possibly three) standard feed methods and whatever custom method(s) you need to populate the new elements.

In this code sample, follow the trail from item_extra_kwargs() to item_your_custom_field() to add_item_elements().

from django.contrib.syndication.views import Feed
from django.utils.feedgenerator import Rss201rev2Feed

class ExtendedRSSFeed(Rss201rev2Feed):
    """
    Create a type of RSS feed that has content:encoded elements.
    """
    def root_attributes(self):
        attrs = super(ExtendedRSSFeed, self).root_attributes()
        # Because I'm adding a <content:encoded> field, I first need to declare
        # the content namespace. For more information on how this works, check
        # out: http://validator.w3.org/feed/docs/howto/declare_namespaces.html
        attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/'
        return attrs
    
    def add_item_elements(self, handler, item):
        super(ExtendedRSSFeed, self).add_item_elements(handler, item)

        # 'content_encoded' is added to the item below, in item_extra_kwargs()
        # It's populated in item_your_custom_field(). Here we're creating
        # the <content:encoded> element and adding it to our feed xml
        if item['content_encoded'] is not None:
            handler.addQuickElement(u'content_encoded', item['content_encoded'])

    ...

class YourFeed(Feed):
    feed_type = ExtendedRSSFeed

    ....

    def item_extra_kwargs(self, item):
        # This is probably the first place you'll add a reference to the new
        # content. Start by superclassing the method, then append your
        # extra field and call the method you'll use to populate it.
        extra = super(YourFeed, self).item_extra_kwargs(item)
        extra.update({'content_encoded': self.item_your_custom_field(item)})
        return extra
    
    def item_your_custom_field(self, item):
        # This is your custom method for populating the field.
        # Name it whatever you want, so long as it matches what
        # you're calling from item_extra_kwargs().
        # What you do here is entirely dependent on what your
        # system looks like. I'm using a simple queryset example,
        # but this is not to be taken literally.
        obj_id = item['my_item_id']
        query_obj = MyStoryModel.objects.get(pk=obj_id)
        full_text = query_obj['full_story_content']
        return full_text

This generates a feed that looks something like this:

<?xml version="1.0" encoding="utf-8"?> <rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0"> <channel> <title>News List</title> <link>http://www.example.com/</link> <description>List Description</description> <item> <title>News List Story 1</title> <link>http://www.example.com/news/story-one/</link> <description>This is the story description.</description> <content:encoded>This is the story description. This is the full content of the story. (Not a very exciting story, I know.)</content:encoded> </item> </channel> </rss>

My actual use case called for me to extend from a feed that already existed, leaving that original feed intact and only including the new element in the new feed. Here's how you'd do that:

class YourFeed(Feed):
    feed_type = ExtendedRSSFeed

    ....

    def item_extra_kwargs(self, item):
        extra = super(YourFeed, self).item_extra_kwargs(item)
        extra.update({'content_encoded': self.item_your_custom_field(item)})
        return extra
    
    def item_your_custom_field(self, item):
        return None

class YourNewFeed(YourFeed):

    def item_your_custom_field(self, item):
        ...
        return full_text

So in the original feed, 'content_encoded' comes back as None and <content:encoded> never appears. It is only generated for the new feed.

CDATA?

The customer requesting this new feed actually asked for html wrapped in a CDATA section. I never did figure out how to do that with the Django syndicator alone - the CDATA tag always came out encoded, there didn't seem to be any way around that. And every blog post I found lead me back to this old bug ticket - https://code.djangoproject.com/ticket/15936 - which suggests just ditching the CDATA section and letting Django handle the encoding. I tried that, but it didn't pass the W3C feed validator - more on that later.

CUSTOM TEMPLATES

One of the things we tried along the way was a custom template for the CDATA content. That didn't work for creating a CDATA section, as ultimately there was no way to prevent the tag from being encoded. But I didn't find many clear posts about how to do this so I thought I'd share an outline of the attempt here:

from django.template import loader, Context, TemplateDoesNotExist

...

    def item_extra_kwargs(self, item):
        extra = super(ListDetailRSS, self).item_extra_kwargs(item)

        # Define a template - give it any name, the one below is just an example.
        # The path will obviously depend on your settings.
        content_encoded_template = 'feeds/list_detail_content_encoded.html'
        try:
            # Use the Django template loader to get the template
            content_encoded_tmp = loader.get_template(content_encoded_template)
            # Set the field value as template context
            content_encoded = content_encoded_tmp.render(
                        Context({'myobj': self.item_your_custom_field(item)}))
            # Then update your extra kwargs with the rendered template
            # instead of the original value returned from your custom method
            extra.update({'content_encoded': content_encoded})
        except TemplateDoesNotExist:
            # And if you don't have a template, just use the content as
            # returned from your custom method
            extra.update({'content_encoded': self.item_your_custom_field(item)})

    return extra

Your template can be as simple as this:

    {{ myobj }}

This can be useful if you want to customize your value by wrapping some text around it or maybe apply template filters before it's rendered.

WRAPPING ELEMENTS INSIDE OTHER ELEMENTS

Our new <content:encoded> element is supposed to have a few other fields inside it. What I'm showing you here ultimately didn't work for us (see the encoding section below), but I did learn a thing or two about how to wrap elements inside other elements in ways that aren't covered in the Django documentation (I should get on adding that, right?).

    def add_item_elements(self, handler, item):
        super(ExtendedRSSFeed, self).add_item_elements(handler, item)

        if item['content_encoded'] is not None:
            # <content:encoded> is going to wrap around some other elements,
            # so instead of using handler.addQuickElement() we're going to
            # use startElement() (and then end it later)
            handler.startElement(u"content:encoded", {})

            # handler.characters() fills in content between the tags, e.g.:
            # <content:encoded>This is where the content goes.</content:encoded>
            handler.characters(item['content_encoded'])

            # And close the element, ba-bam.
            handler.endElement(u"content:encoded")

If you wanted to apply attributes to the element itself, that empty dict you set at startElement() would look like this instead:

    if item['content_encoded'] is not None:
        handler.startElement(u"content:encoded", {'my-attribute': 'my-value'})

And here's the wrapping around other elements bit:

    if item['content_encoded'] is not None:
        handler.startElement(u"content:encoded", {})
        handler.characters(item['content_encoded'])

        # Suppose we have a photo to go along with this story
        if item['media'] is not None:

            handler.startElement(u'figure', {'type': 'image/jpeg'})
            handler.startElement(u'image', {
                'src': item['media']['src'],
                'caption': item['media']['caption']
            })
            handler.endElement(u'image')
            handler.endElement(u'figure')

        handler.endElement(u"content:encoded")

ENCODING - OR, DOUBLE ENCODING

Back to that old bug ticket. We ultimately decided to follow the sage advice to forget about CDATA, even though the suggested code didn't work exactly as described (whether that's because of our old version of Django, or our version customizations, I don't know, but I never had time to research it).

Instead, we had to ... double encode? Or rather, escape, then let Django encoding do its thing.

After all that work to wrap elements one inside the other, our feed still wasn't validating. So instead of creating them as elements, we just converted the tags to strings:

    if item['content_encoded'] is not None:
        handler.startElement(u"content:encoded", {})
        handler.characters(item['content_encoded'])

        if item['media'] is not None:
            figure = '<figure type="image/jpeg">'
            figure += '<image src="%s" caption="%"></image>' % \
                    (item['media']['src'], item['media']['caption'])
            figure += '</image></figure>'
            # Don't forget to stick that string in the middle of
            # the <content:encoded> element:
            handler.characters(figure)

        handler.endElement(u"content:encoded")

<content:encoded> &lt;p&gt;&amp;lt;p&amp;gt;This is the story description.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;This is the full content of the story.&amp;lt;/p&amp;gt; &amp;lt;p&amp;gt;(Not a very exciting story, I know.)&amp;lt;/p&amp;gt;&lt;/p&gt; &amp;lt;figure type="image/jpeg"&amp;gt; &amp;lt;image src="/media/photos/2014/05/07/050714.jpg" caption="Test Photo"&amp;gt; &amp;lt;/figure&amp;gt; </content:encoded>

Ugly, yes, but it almost worked. At least it failed in a different way.

After some trial and error, I found that ultimately I had to do some xml-specific escaping. I wound up using a method out of SAX utilities, applied it to the story content as it was being returned from my custom method, and also to the string for that <figure> tag inside <content:encoded>.

from xml.sax.saxutils import escape

...

    def add_item_elements(self, handler, item):

        ...

        if item['content_encoded'] is not None:
            handler.startElement(u"content:encoded", {})
            handler.characters(item['content_encoded'])

            # Suppose we have a photo to go along with this story
            if item['media'] is not None:

            figure = '<figure type="image/jpeg">'
                ...
                handler.characters(escape(figure))

            handler.endElement(u"content:encoded")

    ...

class YourNewFeed(YourFeed):

    def item_your_custom_field(self, item):
        ...
        return escape(full_text)

What that returns looks slightly uglier. But guess what? It validates.

<content:encoded> &amp;lt;p&amp;gt;&amp;amp;lt;p&amp;amp;gt;This is the story description.&amp;amp;lt;/p&amp;amp;gt; &amp;amp;lt;p&amp;amp;gt;This is the full content of the story.&amp;amp;lt;/p&amp;amp;gt; &amp;amp;lt;p&amp;amp;gt;(Not a very exciting story, I know.)&amp;amp;lt;/p&amp;amp;gt;&amp;lt;/p&amp;gt; &amp;lt;figure type="image/jpeg"&amp;gt; &amp;lt;image src="/media/photos/2014/05/07/050714.jpg" caption="Test Photo"&amp;gt; &amp;lt;/figure&amp;gt; </content:encoded>

THE COMPLETE PICTURE

from xml.sax.saxutils import escape

from django.contrib.syndication.views import Feed
from django.utils.feedgenerator import Rss201rev2Feed

class ExtendedRSSFeed(Rss201rev2Feed):

    def root_attributes(self):
        attrs = super(ExtendedRSSFeed, self).root_attributes()
        attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/'
        return attrs
    
    def add_item_elements(self, handler, item):
        super(ExtendedRSSFeed, self).add_item_elements(handler, item)

        if item['content_encoded'] is not None:

            handler.startElement(u"content:encoded", {})
            handler.characters(item['content_encoded'])

            if item['media'] is not None:
                figure = '<figure type="image/jpeg">'
                figure += '<image src="%s" caption="%"></image>' % \
                        (item['media']['src'], item['media']['caption'])
                figure += '</image></figure>'
                handler.characters(escape(figure))

            handler.endElement(u"content:encoded")

    ...

class YourFeed(Feed):
    feed_type = ExtendedRSSFeed

    ....

    def item_extra_kwargs(self, item):
        extra = super(YourFeed, self).item_extra_kwargs(item)
        extra.update({'content_encoded': self.item_your_custom_field(item)})
        extra.update({'media': self.item_your_custom_media_field(item)})
        return extra
    
    def item_your_custom_field(self, item):
        return None

    def item_your_custom_media_field(self, item):
        return None

class YourNewFeed(YourFeed):

    def item_your_custom_field(self, item):
        obj_id = item['my_item_id']
        query_obj = MyStoryModel.objects.get(pk=obj_id)
        full_text = query_obj['full_story_content']
        return escape(full_text)

    def item_your_custom_media_field(self, item):
        obj_id = item['my_item_id']
        query_obj = MyStoryModel.objects.get(pk=obj_id)
        photo = query_obj['photo']['url']
        caption = query_obj['photo']['caption']
        return {'src': photo, 'caption': caption}

<?xml version="1.0" encoding="utf-8"?> <rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0"> <channel> <title>News List</title> <link>http://www.example.com/</link> <description>List Description</description> <atom:link href="http://www.example.com/" rel="self"></atom:link> <item> <guid isPermaLink="false">http://www.example.com/guid/</guid> <title>News List Story 1</title> <link>http://www.example.com/news/story-one/</link> <description>This is the story description.</description> <content:encoded> &amp;lt;p&amp;gt;&amp;amp;lt;p&amp;amp;gt;This is the story description.&amp;amp;lt;/p&amp;amp;gt; &amp;amp;lt;p&amp;amp;gt;This is the full content of the story.&amp;amp;lt;/p&amp;amp;gt; &amp;amp;lt;p&amp;amp;gt;(Not a very exciting story, I know.)&amp;amp;lt;/p&amp;amp;gt;&amp;lt;/p&amp;gt; &amp;lt;figure type="image/jpeg"&amp;gt; &amp;lt;image src="/media/photos/2014/05/07/050714.jpg" caption="Test Photo"&amp;gt; &amp;lt;/figure&amp;gt; </content:encoded> </item> </channel> </rss>