Aug 312009
 

In parts 1 and 2 of this series, I discussed how to optimize XML Metadata Extraction in your Alfresco WCM environment, and how (and why) to disable permissions checking, respectively.  Besides those two optimizations, there are several other things that you can do to tweak an Alfresco System Receiver (ASR) for better performance, which I will discuss now.

First, optimize your JVM settings!  By default, Alfresco ships with a maximum heap size of 512MB of RAM.  As such, under high load, an ASR can max out a heap of that size and thus serve content a bit more slowly.  So you definitely want to increase the maximum heap size to as high as you can.  Also, depending on how you set up your minimum and maximum heap sizes, you may want to adjust the size of newly allocated heap space using the -XX:NewSize command line option.  There are a few other things you can adjust as well regarding your JVM like enabling hotspot pre-compilation, and the size of the stack, amongst others.  See the JVM Tuning page on the Alfresco wiki for more details.  Ultimately for my testing on my Macbook Pro running Alfresco on Windows XP via VMWare Fusion, I used the following settings (in alfresco.bat):

set JAVA_OPTS=-Xms768m -Xmx1536m -Xss1m -XX:MaxPermSize=128m -Xcomp -Xbatch -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:NewSize=384m -XX:CMSInitiatingOccupancyFraction=80 -server

Speaking of virtualization, that reminds me.  Don’t virtualize!  For best performance, use the actual hardware!  Virtualization adds overhead.  I have not yet tested ASR performance without virtualization to be able to prove the impact that it has, but I will, and when I do, you can be sure that I’ll update this post, so come back sometime soon!

Anyway, now that you’ve tuned the JVM, what about the database?  Increasing the size of the database connection pool will enable your ASR to handle more concurrent users.  Therefore, in custom-repository.properties in your extension directory (Alfresco 3.x) or in alfresco-global.properties (Alfresco 3.2+), you can set the following:

db.pool.initial=10
db.pool.max=350

In addition to increasing the database thread pool, you should increase the tomcat thread pool as well.  This can be done by modifying the following line in alfresco/tomcat/conf/server.xml:

<Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
maxThreads="350" minSpareThreads="4"/>

Now that we’ve tuned some parts of the server, let’s turn off some things we don’t need.  For example, you should remove the share web application (share.war) that is installed with Alfresco by default, as well as Web Studio (studio.war) and the mobile web application (mobile.war) if you’re running Alfresco Community.  Be sure to remove (or just move elsewhere) all of the unneeded .war files as well as their exploded directories.  The only application you need on an ASR is alfresco.war.  Additionally, you may want to download a clean copy of alfresco.war (making sure you have the correct version for your installation), and be sure NOT to apply the SharePoint Protocol AMP file to it.  No sense running another listener that no one will ever use!  Also, you typically want to lock down an ASR so that content is only accessed by your web application via the web scripts you’ve authored to expose your web content.  Therefore, you don’t need to be running virtual filesystems such as CIFS for shared drives, and FTP.  This can be done very simply, by setting the following in custom-repository.properties (3.1) or alfresco-global.properties (3.2):

ftp.enabled=false
cifs.enabled=false

Another very useful exercise is to optimize your web scripts.  For example, if you have a page in your web application that has three content areas managed by Alfresco, don’t have the web application call three separate web scripts, each of which returns only content related to that area.  Network hops are expensive, so they should be minimized to just one hop per page type.  In this scenario, performance would likely be improved by having the web application call a single web script per page type which returns all of the content to be displayed by that page at that time.

That leads to the next point: build caching into your web application, or as a layer between your web application and Alfresco, and be sure to determine and implement a cache update strategy.  If the web application can simply hit a cache that is local to it while the cache is updated asynchronously via some other process, the web application will fly as compared to making calls to the ASR every single time a user views a page.

To conclude, here’s a handy list of steps you can take to optimize ASR performance, from my estimation of decreasing impact:

  1. Cache
  2. Turn permissions checking off
  3. Optimize your web scripts per page type on your web site
  4. Don’t virtualize
  5. Tune your JVM for performance
  6. Increase the database connection pool
  7. Increase the tomcat thread pool
  8. Remove/disable things you don’t need
    • CIFS Server
    • FTP Server
    • Share
    • Web Studio
    • Mobile
    • SharePoint Protocol (SPP) support
  9. Configure XML metadata extraction so that it occurs on your authoring server, not the ASR

If you have additional suggestions, please share, I’d love to hear them!

Aug 312009
 

In my first post in this series, I discussed how and where to set up XML Metadata extraction in an Alfresco WCM environment to optimize Alfresco System Receiver (ASR) performance.  Though this is a useful optimization, it’s probably not the #1 most important thing you can optimize.  ASRs are designed to enable the retrieval of content, so optimizing that process is going to provide the most impactful results.

One of the major features Alfresco brings to the table as an ECM system is the ability to secure content by user, group, or role (or some combination thereof).  As such, when searches are carried out on behalf of a user, results are pruned by default by the Alfresco PermissionService, thus only returning results that the authenticated user has read access to.

In the case of an ASR though, content is typically retrieved by a single calling application using a single login.  This very common scenario does not require “pruning” of content that the calling application is trying to retrieve.  Avoiding the unnecessary (and expensive) calls to the database for permission checking, particularly as the resulting content set grows in size, would yield significant savings in response time.  Therefore, it is very simple in Alfresco to turn off permissions checking via configuration, which will turn off permission checking repository wide.  Note that this should be done with extreme care – not all use cases are created the same, so be sure to fully evaluate your specific requirements before taking this action.

In my tests, the response time with permission checking turned off was 33% faster on average over comparative tests for 1, 2, 3, 4, 5, 10, 20, and 110 concurrent users.  See beautiful chart reflecting these numbers:

picture-2

To disable permission checking for your entire repository, simply rename “alfresco/tomcat/shared/classes/alfresco/extension/unsecured-public-services-security-context.xml.sample” to “alfresco/tomcat/shared/classes/alfresco/extension/unsecured-public-services-security-context.xml” in your installation.  Note that I found a defect in 3.1.1 Enterprise and 3.2 Community regarding this issue.  You will need to use the “unsecured-public-services-security-context.xml.sample” file attached to this issue: https://issues.alfresco.com/jira/browse/ETHREEOH-2604 instead of the one that the installer lays down for you.

Aug 282009
 

In this first part of what I expect to be a few posts about Alfresco System Receiver (ASR) optimizations, I’ll talk specifically about XML Metadata Extractors for Alfresco Web Content Management (WCM).  So what is an XML metadata extractor, and why should you care about it?  Let’s put it in the context of a diagram:

xmlmetadataextractor

In the diagram above we see that the WCM authoring environment is configured with web forms.  This allows business users to enter content into the system; an article to be published for example.  To do so, the user does not have to be skilled in web related technologies, such as HTML; they simply fill out a form with the content to be published.  Once their content is entered it is saved as XML, submitted to the staging sandbox, and ultimately deployed, in this case to an ASR.  The ASR is seen as being configured with an XML Metadata Extractor and a DM content model, which defines aspects that will be applied to the deployed content.  So for an article content type on the authoring side, there would be an article aspect defined in the content model on the ASR.  The XML metadata extractor is used to extract content from the deployed content (the article) and store is according to the aspect defined in the DM content model.  As such, the content delivered via the web form can be indexed by Lucene, enabling optimized search performance on retrieval.

The problem with this approach is that the ASR is likely serving a live production web site that may have thousands (or more) visitors:

xmlmetadataextractor-site

As such, it is less than ideal to have the ASR execute the processing required to extract metadata from the XML content on the ASR itself.  Wouldn’t it be better if that was done on the authoring server?  You bet it would.  Hence the first ASR optimization; perform XML metadata extraction in the authoring environment:

xmlmetadataextractor-optimizedBy configuring the XML metadata extraction to occur on the authoring environment, we save some cycles on the ASR, which is certainly a good thing if we’re using it to back a high traffic web site.

One last note: for assistance setting up XML metadata extraction for WCM, see this page on the Alfresco wiki.