Subscribe to my RSS feed RSS

Posted: May 25th, 2008

Blog updates for App Engine, Caching data and non-normalized database work.

I made a number of improvements to this blog engine over the last couple of weeks. Most note-able was performance improvements to pre-render and cache HTML. There has been a lot of discussion and movement in the web architecture community towards not-traditional Database designs, not normalizing data, and storing caclulated info, and even moving to non-relational database designs. All because Internet scale data-bases built on transactional systems just seem fraught with problems and forces you to build around the db instead of around the needs of the web apps. The Google App Engine article on High Scalability gives a lot of great insight into App Engine which i had been discovering separately.

In order to incorporate a lot of the improvements possible out of Google App Engine Datastore, you have to start to do things different, here are what i did to help build a better google app engine based app:

  • OnChange Handler: Since we are often pre-rendering, and caching html, we need to get hooked into the events which signify when we need to make a change. Digging into the Google Datastore API, i did some reverse engineering to try to figure out how I can figure out when something is changing. What i came up with is to provide a Base model which i inherit from which allows me to write OnChange methods per attribute.
    from google.appengine.ext.db import Model as DBModel
    
    class BaseModel(db.Model):
        def __init__(self, parent=None, key_name=None, _app=None, **kwds):
            self.__isdirty = False
            DBModel.__init__(self, parent=None, key_name=None, _app=None, **kwds)
        
        def __setattr__(self,attrname,value):
            """
            DataStore api stores all prop values say "email" is stored in "_email" so
            we intercept the set attribute, see if it has changed, then check for an
            onchanged method for that property to call
            """
            if (attrname.find('_') != 0):
                if hasattr(self,'_' + attrname):
                    curval = getattr(self,'_' + attrname)
                    if curval != value:
                        self.__isdirty = True
                        if hasattr(self,attrname + '_onchange'):
                            getattr(self,attrname + '_onchange')(curval,value)
            
            DBModel.__setattr__(self,attrname,value)
      
    
    Then, on my Entry class (blog entry), i implemented a method whenever an entry changes from published to draft and back, and updated the archive.
    class Entry(BaseModel):
        author = db.UserProperty()
        blog = db.ReferenceProperty(Blog)
        published = db.BooleanProperty(default=False)
        content = db.TextProperty(default='')
        # more cut out
        
        def published_onchange(self,curval,newval):
            """
            Gets called every time published status changes
            """
            if self.entrytype == 'post':
                my = self.date.strftime('%b-%Y') # May-2008
                archive = Archive.all().filter('monthyear',my).fetch(10)
                if curval == False and newval == True:
                    # add to archive
                    if archive == []: # new month
                        archive = Archive(blog=self.blog,monthyear=my)
                    else: 
                        archive = archive[0]
                    archive.entrycount += 1
                    archive.put()
                    self.blog.entrycount += 1
                else:
                    # remove from archive
                    if archive and archive[0]:
                        archive = archive[0]
                        archive.entrycount -= 1
                        if archive.entrycount == 0:
                            archive.delete()
                        else:
                            archive.put()
                    self.blog.entrycount -= 1
                
                self.blog.save()
      
    
  • Pre-Render HTML: On the previous entry, we are storing some tables (Archive) which are purely used for convenience. They are never called at runtime, and in fact the Archive table is then rendered into html and stored again. Here is an admin entry "POST" update that then calls to update the cache. The Cache is actually a Blog entity which is used on every single request. It is half way between a Cache, configuration and context in normal web apps.
    class AdminEntry(BaseController):
        def post(self,entrytype='post',key=None):
            #update entry from form post
            entry.save()
            rebuild_cache(self.blog)
    
    def rebuild_cache(blog):
        """
        Pre-Render's and cache's html in blog object.  Everything
        in here doesn't change very often, so we can update it at point of change
        """
        pages = Entry.all().filter("entrytype =", "page").filter("published =", True).fetch(20)
        archives = Archive.all().order('-date').fetch(10)
        recententries = Entry.all().filter('entrytype =','post').filter("published", True).order('-date').fetch(10)
        links = Link.all().filter('linktype =','blogroll')
        template_vals = {'recententries':recententries,'pages':pages,
                'links':links,'archives':archives}
    
        path = os.path.join(os.path.dirname(__file__), 'views/sidebar.html')
        blog.sidebar = template.render(path, template_vals)
        path = os.path.join(os.path.dirname(__file__), 'views/topmenu.html')
        blog.topmenu = template.render(path, template_vals) 
        blog.save()
        
    

These tricks helped make writing App engine even easier, moving more to an event driven model that updates the cache. You can find all the code from this app at the Github code repository.