Personal tools

Oct 16, 2009

Fast content import

For one of our customers we have implemented a prototype for fast importing data to Plone. Instead of creating thousands of Archetype-based objects - we are creating only brains. Real objects will be created manually by users. The difference is huge.

The task was simple - import to Plone 36.000 documents (with all children it gives the total of 120.000 objects). Simply creating it using invokeFactory will take too much time (aprox. 14 hours). Why not create just brains and let users decide which document fully migrate? This is what we did.

We started with import script which creates brains in portal_catalog using directly catalog_object method:

pc = getToolByName(self.context, 'portal_catalog')
pc.catalog_object(dummy_object, dummy_object.path)

 

dummy_object is a simple python object with all metadata that we need later for portal_catalog query:

>>> pp dummy_object.__dict__
{'title': u'dummy title',
'id': u'simple_id',
'review_state': 'private',
'path': '/plone/importfolder/ToBeMigrated_simple_id'}

 

After lunching the script (it took 19 minutes) we had all the brains (36.000) in portal_catalog with proper metadata/index updated. Now we need to allow users to see them. So we created traverser for our importfolder:

def __bobo_traverse__(self, REQUEST, name):
    if name.startswith('ToBeMigrated'):
       view = getMultiAdapter((self, self.REQUEST), name='to_be_migrated')
       view.setBrainId(name)
       return view
    return super(ImportFolder, self).__bobo_traverse__(REQUEST, name)
 

which returns a simple BrowserView with the brain.id. The view has a template which informs end-user that this document need to be migrated. If he decided to import it - system will start import process for selected brain.id.
The rest is pure Plone folder_contents view which lists my 'dummy' brains in import_folder (cause the path in the brain is correct). Simple and much faster ;-)

 

 
Filed under: , , ,
comments powered by Disqus