svnpubsub for Confluence sites

Anyone on a PMC at Apache likely saw a message sent out about a month ago from infrastructure@ that mentioned the mandate about project websites (and dist areas) change over to using svnpubsub by the end of the year. If you missed the email, it’s also nicely mentioned on the project-site page. For many of the newer projects, using the Apache CMS is definitely the easiest way to achieve that. However, for some of the older projects, this was a bit of an issue. Various projects at Apache have adopted various technologies from forest to anika to maven-site-plugin to Confluence to, well, lots of things. Migrating all that content to the CMS could be a lot of work that the projects just don’t have the time or resources to pursue.

Luckily, Apache does have some excellent people on the infrastructure team that have worked pretty hard to make migrating to just svnpubsub a bit easier. Joe Schaefer, in particular, has worked very hard to update the various buildbot scripts and cms scripts and such to allow it to support external builders (see their blog) to build the actual site content. As long as the technology to build the site can accept an “output directory” flag to output the site, it’s not a hard migration. Thus, projects can remain on their current technology choice as long as they have a build script to build it.

However, that still left out the sites that are using Confluence, such as Apache CXF and Apache Camel. Those projects don’t have a build script. They relied on a proprietary (and pretty buggy and annoying) plugin to Confluence to to render the pages to a local directory on the confluence host, then a series of multiple rsyncs from various personal crontabs to get the pages to the live site. It worked, but it was very slow causing several hours of delays between changes and it appearing on the live site. In any case, the Confuence based sites needed a solution to migrate to svnpubsub.

A while ago, I noticed Confluence has a soap interface. It’s a crappy, ancient, rpc/encoded interface, but it’s at least a usable interface. The interface provides methods to get the page information, render content to HTML, etc… Basically, everything we need to render the site externally. Thus, I used Apache CXF to create a simple program that would act as an external builder to render a site, grabbing all the attachments, applying a template, etc…. With that, plugging it into the svnpubsub infrastructure at Apache is easy. The program also uses Velocity as the template just like the autoexport plugin so migrating an existing template is relatively trivial.

However, by using an external program, I was able to make it MUCH MUCH better than the crappy autoexport plugin that Confluence currently uses. This includes:

  • Caching information and rendering changes. The program keeps a cache of page information and can detect just the pages that change and only renders those. Helps with performance.
  • It ALSO keeps track of which pages use {include} and {children} tags and can proper re-render those if the included page changes or the children structure changes. This is a big step above the autoexport plugin that would require a complete site regenerate for these things
  • HTML cleanup – the generated HTML is run through tagsoup as well as a custom listener that make an attempt to clean up the generated HTML. Confluence generates very poor HTML with all kinds of validation errors and such. The cleanup allows many pages to actually pass the w3c validator for HTML 4.1 transitional.
  • Link fixups – along with the HTML cleanup, it also will fix various links. Confluence generates HTML that kind of assumes it’s living on the confluence host. When copied to the live sites, those links break. This is handles as well as is adding nofollow attributes to links outside Apache.
  • Faster publishing – with all the rsyncs removed, making a change in confluence and getting it “live” is much faster. At worst, it’s less than one hour to the next buildbot build. However, if you need to, any developer can checkout the site and run the mvn script to force it immediately.

In any case, both Apache CXF and Apache Camel now have their main websites migrated over to using this new process to generate their sites from Confluence. If other projects would like to try it out, I’d suggest doing an svn checkout of the Camel website area from http://svn.apache.org/repos/asf/camel/website/ and taking a look.

Leave a Comment

Your email address will not be published. Required fields are marked *