Aug 282009

In this first part of what I expect to be a few posts about Alfresco System Receiver (ASR) optimizations, I’ll talk specifically about XML Metadata Extractors for Alfresco Web Content Management (WCM).  So what is an XML metadata extractor, and why should you care about it?  Let’s put it in the context of a diagram:


In the diagram above we see that the WCM authoring environment is configured with web forms.  This allows business users to enter content into the system; an article to be published for example.  To do so, the user does not have to be skilled in web related technologies, such as HTML; they simply fill out a form with the content to be published.  Once their content is entered it is saved as XML, submitted to the staging sandbox, and ultimately deployed, in this case to an ASR.  The ASR is seen as being configured with an XML Metadata Extractor and a DM content model, which defines aspects that will be applied to the deployed content.  So for an article content type on the authoring side, there would be an article aspect defined in the content model on the ASR.  The XML metadata extractor is used to extract content from the deployed content (the article) and store is according to the aspect defined in the DM content model.  As such, the content delivered via the web form can be indexed by Lucene, enabling optimized search performance on retrieval.

The problem with this approach is that the ASR is likely serving a live production web site that may have thousands (or more) visitors:


As such, it is less than ideal to have the ASR execute the processing required to extract metadata from the XML content on the ASR itself.  Wouldn’t it be better if that was done on the authoring server?  You bet it would.  Hence the first ASR optimization; perform XML metadata extraction in the authoring environment:

xmlmetadataextractor-optimizedBy configuring the XML metadata extraction to occur on the authoring environment, we save some cycles on the ASR, which is certainly a good thing if we’re using it to back a high traffic web site.

One last note: for assistance setting up XML metadata extraction for WCM, see this page on the Alfresco wiki.

  One Response to “Alfresco System Receiver (ASR) Optimizations, Part 1”

  1. [...] parts 1 and 2 of this series, I discussed how to optimize XML Metadata Extraction in your Alfresco WCM [...]

 Leave a Reply



You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>