One of the new features in Apache CXF 2.7.x that I worked hard on was the introduction of support for WS-Discovery. WS-Discovery is basically a standard way for a service to announce when it’s available as well as standard way to probe the network for services that meet certain criteria and have the services that meet that criteria provide a response. Most ESB’s now have some sort of registry component or locator component or similar that provide a similar need. However, they are generally more proprietary in nature and, in many cases, will only work with services deployed in or managed by that ESB. WS-Discovery is completely standards based (OASIS) and is completely independent of any ESB, application server, etc…

So, how does it work? If the CXF WS-Discovery jars/bundles are available when a service starts, CXF will automatically register a ServerLifecycleListener onto the Bus. When the service starts, that listener will send a WS-Discovery “HELLO” message out on the network using the SOAP over UDP spec. When the service stops, it will send a “BYE” message out. Most users don’t need those messages, but if you do have an application that needs to keep track of services that are available, you could listen for them. The CXF WS-Discovery listener will also start an internal WS-Discovery service that will listen for SOAP/UDP “PROBE” requests on the network, process those requests to see if the service matches it, and respond with information (such as the address URL) if it does. This is all automatic. All that is needed is to add the WS-Discovery jars.

CXF also provides an API for probing the network for services. It’s only slightly documented right now, but you can easily look at the source for WSDiscoveryClient. Basically, some simple code like:

WSDiscoveryClient client = new WSDiscoveryClient();
List references 
    = client.probe(new QName("http://cxf.apache.org/hello_world/discovery",
                             "Greeter"));
client.close();
        
//loop through all of them and have them greet me.
GreeterService service = new GreeterService();
for (EndpointReference ref : references) {
     Greeter g = service.getPort(ref, Greeter.class);
     System.out.println(g.greetMe("World"));
}

would use the WSDiscoveryClient to probe the network for all the services that can provide the “Greeter” service and then calls off to each one. It’s very simple.

The main problem with the WS-Discovery implementation in CXF 2.7.0 through 2.7.4 was that it only implemented WS-Discovery 1.1 as that is the actual OASIS standard that I looked at. However, there are many devices out there that only will respond to WS-Discovery 1.0 probes. In particular, any of the IP cameras that implement the ONVIF specification will only respond to 1.0. Thus, in 2.7.5, I updated the code to also handle WS-Discovery 1.0. The WSDiscoveryClient object has a setVersion10() method on it to change the probes over to WS-Discovery 1.0. With support for WS-Discovery 1.0, you can now use CXF to probe for any devices on the network that meet the ONVIF standard. No proprietary registry or anything required.

That’s pretty cool.

Now that the WS-Discovery stuff in CXF is fairly well tested and is known to work, I expect more of the downstream consumers of CXF to start integrating it into product offerings. I’m hoping to work on getting the Talend locator updated to use it. However, with the next (5.3.1) version of Talend ESB (due next month), you’ll be able to just “feature:install” the cxf-ws-discovery feature into the ESB and have the above all work. I also see that JBoss has already started integrating it into their application server.

I’ve spent quite a bit of time the last several weeks doing some performance tuning and profiling and such on the Talend ESB and decided to share some things I’ve learned.

How this all started: Asankha Perera contacted me in early July as they started preparing for round 6 of their ESB Performance benchmarks as they ran into a security related issue: Talend’s ESB was now rejecting the WS-Security messages due to the nonce cache that was added to prevent replay attacks. Since the benchmark is essentially a replay attack (sends the same message over and over again), Talend’s ESB was throwing an exception and blocking the benchmark from running. This is on top of the strict timestamp checking that Talend’s ESB has always done (and caused them issues last time as they had to regenerate the secure messages). From what I can tell, Talend’s ESB was the only one they needed to turn OFF various security things like this which likely means it’s the most secure of the ESB’s for WS-Security “out of the box”. Not too surprising given Talend’s excellent security folks. Anyway, as part of the report, he sent along some preliminary numbers for the other tests to look at. My initial look showed that the numbers were actually lower than round 5 which concerned me, which is why I dug in.

The initial investigation showed a configuration change was needed. The configs in the bitbucket repo were really for Talend ESB 5.0 and needed some minor updates for 5.1 to get the thread pool back up from 25 threads to the 300 threads the test calls for. That pretty much got the test results back up to round 5 levels, so I really could have left it at that, but I decided to take a little time and play some more.

Some things I discovered:

  1. System.getProperty(..) is synchronized – DON’T put this on a critical path. My very first “kill -3″ during processing of the various small messages had over 180 threads (of the 300) blocked in here. I did some more investigation and found two major causes in the Talend ESB:
    • Bug/Regression in Woodstox – Woodstox was calling this for every Reader and Writer that was created. That’s 4 times per request for the proxy cases. I logged a bug with them (now resolved in their latest release) and downgraded Woodstox for the time being.
    • DocumentImpl constructor – for some reason, Sun/Oracle added a call to System.getProperty into the DocumentImpl constructor for the DOM implementation built into the JDK. CXF caches SOAP headers in a DOM so this added 2 more calls per request. Grabbing the latest xercesImpl and forcing that to be used solved this issue.

    Getting those fixed definitely helped reduce a bit of the contention.

  2. The next choke point I found was in CXF’s handling of the thread default Bus. We were getting and setting the thread default bus several times per request, but each of those calls was in a synchronized block. Re-engineering how that is handled eliminated that.
  3. Next up was the JMX metrics in Camel. Updating the JMX stats in Camel is in a big synchronized block (which then calls a bunch of other synchronized methods on other objects). This is on my TODO to re-look at, but for the purpose of this test, I just turned off the JMX metrics. Most likely, just using Atomic values would allow removing the sync block. Not really sure though.
  4. The final major “synchronized” block I hit is in the HTTP keep-alive cache in the JDK. Nothing I can do about this one short term. Longer term, I’ve started working on a new Apache HTTP Components based HTTP transport (with the help of Oleg Kalnichevski) that may help, but not something for right now.

The above updates helped a little, but not really much, maybe a couple %. The main reason is that with 300 threads, even if 150 of them are blocked, there are still plenty of threads left able to do work. Removing the blocks just saved some time on context switches and cache hits and such.

That then got me looking into other things, primarily into Camel. I quickly discovered the XML handling in Camel is pretty poor. Eventually I’ll need to really look into to that, but for the short term, I was able to bypass much of it. The first thing I saw was that in SOME cases, the requests coming from CXF (which was passed from CXF to Camel as a streaming StaxSource) were being parsed into a DOM. What’s worse, due to the poor XML handling in Camel, the DocumentFactory and parser factories and such were being created to do so. That involves the SPI stuff in the JDK which involves a System property (see above) and a classpath search for stuff in /META-INF/services which is fairly slow in OSGi. Adding some extra type converters into Camel avoided all of that and provided a big boost.

I then started looking at some of the specific tests. For the content based routing, the original configs we had used XPath. However, the XPath stuff in Camel is again plagued by the poor XML handling which forced a DOM again. Ended up changing the test to use XQuery which performed much better.

The other thing I ended up doing was switch from PAYLOAD mode to MESSAGE mode for the CXF component for the tests that could be handled that way. This is a huge benefit. For the direct proxy and transport header cases, this allowed complete bypassing of all XML parsing. That’s huge.

Finally, I did some testing with using Apache Tomcat for the backend service instead of the toolbox thing that the esbperformance.org folks originally specified. For Talend, using Tomcat helped significantly. We had a bunch of timeouts and other errors with the toolbox thing, and performance was a lot better with Tomcat. Since using Tomcat is likely more “realistic”, I argued to change the tests to use that (and provided some optimized Tomcat configs). This likely helped all the ESB’s results. You’re welcome!

For the most part, things ended up pretty good. If you look at the results graph at the bottom of http://esbperformance.org/display/comparison/ESB+Performance, Talend’s ESB came out fairly good. One important measurement is the number of failures/timeouts: 0. Talend ESB and the UltraESB’s were the only ones to accomplish that. That alone is pretty cool. But the performance also ended up fairly good. I certainly have a lot more things to look at going forward, but for just a small amount of work, the results ended up quite good.

I finally received the video of my presentation that I gave at CamelOne 2012 in Boston about the new features in Apache CXF. A little editing later (note for speakers: making sure the speaker agreement provides you with copies of any videos and complete rights to do anything you want with them is a good idea. Major thanks to the organizers for agreeing to that), a not-so-quick upload to YouTube, and voila:

If you are having trouble viewing the video, click here to go directly to YouTube.

With the release of Apache CXF 2.6.1 last week, the CXF community decided to create the 2.6.x-fixes branch and move trunk to target 2.7. One of the first things we’ve started working on for 2.7 is dropping support for Java5. This is a topic that has come up many times over the last year or so, but it’s something we’ve tried to delay as long as possible as we do have users that still use Java 5. However, we’ve finally reached a tipping point where the community feels we need to move forward with Java6/Java7 only.

There are a few reasons:

  • External deps – CXF uses a bunch of party deps like Jetty, Spring, WSS4J, GWT, ActiveMQ, etc… Some of these have already gone Java6 only (Jetty 8, ActiveMQ and GWT for example) so CXF has been locked down to some of the older versions. To continue getting enhancements and fixes for some of those libraries, we need to update.
  • WSS4J 2.0 – WSS4J 2.0 which will introduce much faster and scalable ws-security processing has already decided to go Java6. It’s unknown yet whether WSS4J 2.0 will be ready for CXF 2.7, but to start working on the integration would require Java6.
  • Build system – Java 6+ contains a bunch of the dependencies that CXF needs “built in”. Things like SAAJ, Activation, jaxb-api, etc… We had some profiles in the poms to detect the JDK and add those deps if running on Java5. Removing those profiles removed hundreds of lines from the poms.
  • Simplify some code. There are a couple places in CXF where we had to resort to using some reflection to call off to Java6 API’s if running on Java6 and provide an alternative path if not. We are now able to cleanup some of that.
  • Deployment environments going Java6 – the popular environments that CXF is deployed into have also gone Java6 only or will be shortly. That include Jetty 8, Tomcat 7, Karaf 2.3/3.0 (coming soon), JBoss, etc… Even Karaf 2.2 which does support Java5 has a few features that only work with Java6.
  • Quite honestly, none of the CXF developers use Java5 on a daily basis anymore. We’ve all moved onto using Java6 (or even Java 7). Thus, the primary Java5 testing is now done by Jenkins and the developers only fix things if Jenkins finds an issue. Dropping Java5 simplifies life for the CXF developer community.

In anycase, this is something that’s been a long time coming, but it’s not something that will affect people today. In 5 months or so when CXF 2.7. is released, people will need to decide what to do then. However, keep in mind CXF 2.6.x will continue to support Java5 and will likely be supported for at least another year. The clock is now ticking though. Tick…. Tick…. Tick…. :-)

I just uploaded my slide deck from my presentation about Apache CXF‘s new features that I presented at CamelOne in Boston to Slideshare.

I will hopefully be getting a full video of it sometime soon. When I do, I’ll post it here.

I haven’t looked at the Apache Jenkins instance leader board in a few months. It’s kind of a silly and meaningless stat, but I broke the CXF build a couple times this week and decided to click on it to see if I had dropped by much. I was VERY surprise, but very impressed with what I saw:

Jenkins Leaderboard

ALL 10 of the top 10 on the leader board are CXF committers. Every one of them. There is definitely something to be said for having a community that encourages frequent commits as well as a build system that is easy to use and promotes testing, but also a general mantra instilled in the community of “Don’t break the build”. (and fix it quickly if you do)

Another interesting note is that 7 of the top 14 are Talend committers. Unfortunately, the other 2 Talend committers are near the bottom of the list (negative even). I guess I need to start pushing them harder to move up the list… ;-)

Anyone on a PMC at Apache likely saw a message sent out about a month ago from infrastructure@ that mentioned the mandate about project websites (and dist areas) change over to using svnpubsub by the end of the year. If you missed the email, it’s also nicely mentioned on the project-site page. For many of the newer projects, using the Apache CMS is definitely the easiest way to achieve that. However, for some of the older projects, this was a bit of an issue. Various projects at Apache have adopted various technologies from forest to anika to maven-site-plugin to Confluence to, well, lots of things. Migrating all that content to the CMS could be a lot of work that the projects just don’t have the time or resources to pursue.

Luckily, Apache does have some excellent people on the infrastructure team that have worked pretty hard to make migrating to just svnpubsub a bit easier. Joe Schaefer, in particular, has worked very hard to update the various buildbot scripts and cms scripts and such to allow it to support external builders (see their blog) to build the actual site content. As long as the technology to build the site can accept an “output directory” flag to output the site, it’s not a hard migration. Thus, projects can remain on their current technology choice as long as they have a build script to build it.

However, that still left out the sites that are using Confluence, such as Apache CXF and Apache Camel. Those projects don’t have a build script. They relied on a proprietary (and pretty buggy and annoying) plugin to Confluence to to render the pages to a local directory on the confluence host, then a series of multiple rsyncs from various personal crontabs to get the pages to the live site. It worked, but it was very slow causing several hours of delays between changes and it appearing on the live site. In any case, the Confuence based sites needed a solution to migrate to svnpubsub.

A while ago, I noticed Confluence has a soap interface. It’s a crappy, ancient, rpc/encoded interface, but it’s at least a usable interface. The interface provides methods to get the page information, render content to HTML, etc… Basically, everything we need to render the site externally. Thus, I used Apache CXF to create a simple program that would act as an external builder to render a site, grabbing all the attachments, applying a template, etc…. With that, plugging it into the svnpubsub infrastructure at Apache is easy. The program also uses Velocity as the template just like the autoexport plugin so migrating an existing template is relatively trivial.

However, by using an external program, I was able to make it MUCH MUCH better than the crappy autoexport plugin that Confluence currently uses. This includes:

  • Caching information and rendering changes. The program keeps a cache of page information and can detect just the pages that change and only renders those. Helps with performance.
  • It ALSO keeps track of which pages use {include} and {children} tags and can proper re-render those if the included page changes or the children structure changes. This is a big step above the autoexport plugin that would require a complete site regenerate for these things
  • HTML cleanup – the generated HTML is run through tagsoup as well as a custom listener that make an attempt to clean up the generated HTML. Confluence generates very poor HTML with all kinds of validation errors and such. The cleanup allows many pages to actually pass the w3c validator for HTML 4.1 transitional.
  • Link fixups – along with the HTML cleanup, it also will fix various links. Confluence generates HTML that kind of assumes it’s living on the confluence host. When copied to the live sites, those links break. This is handles as well as is adding nofollow attributes to links outside Apache.
  • Faster publishing – with all the rsyncs removed, making a change in confluence and getting it “live” is much faster. At worst, it’s less than one hour to the next buildbot build. However, if you need to, any developer can checkout the site and run the mvn script to force it immediately.

In any case, both Apache CXF and Apache Camel now have their main websites migrated over to using this new process to generate their sites from Confluence. If other projects would like to try it out, I’d suggest doing an svn checkout of the Camel website area from http://svn.apache.org/repos/asf/camel/website/ and taking a look.

A long time ago (ok, only 5.5 years ago), when Apache CXF was fist being spun up, we made a decision to try and separate the “API” and the “implementation” of the various CXF components into separate module. This was a very good idea for the most part. However, we did this prior to really investigating OSGi requirements (OSGi was just beginning to become well known). Thus, we didn’t take into account the OSGi split-package issue. The common pattern we used was to put the interfaces in cxf-api, and the “impl” in the exact same package in cxf-rt-core. In OSGi, that’s a big problem (without using fragments).

A few years ago, when users started really pushing the idea of using CXF in OSGi, we had to come up with a solution. I’ll admit we kind of took the “easy road” at the time. As part of the CXF build, we already were creating an “uber bundle” which contained all of the individual CXF modules squashed together into a single jar. We primarily did that to simplify classpath issues, dependencies, etc… for users (single jar to deal with, single pom dependency, etc…). When looking at options for OSGi, we saw that bundle as an “easy” path to OSGi. By adding OSGi manifest entries, CXF can be brought up in OSGi. Since ALL the packages are in that jar, no split-package issues.

For the most part, that setup has worked “OK”, but it’s really not ideal. CXF has a wide range of functionality, much of it may not be required by every user. The bundle ended up importing a LOT of packages. The current bundle imports about 240 packages. Thus, we had to go through and mark many of the imports “optional”. Over 160 of those imports are marked optional. (and likely some of the others could as well) This type of setup, while it “works”, is really less than ideal. It doesn’t work well with things like the OBR resolvers. It requires extra refreshes to add functionality that may be required by new bundles. It requires a lot of extra dependencies to be loaded upfront to reduce refreshes. Etc…

Over the past few weeks, Christian Schneider and I have been re-evaluating the split-package issue in CXF to try and come up with a better solution. An initial evaluation back in December found 25 packages that were split across jars. I have spent quite a bit of time the last couple weeks looking at each of those 25 split packages to figure out what can be done. We did hit some challenges and I ended up talking quite a bit with Christian about possible solutions, but I’m really happy to say that as of my commits this morning, all 25 are now resolved. All of the CXF individual modules are loadable as individual OSGi bundles. A couple of very simple test cases are working.

What kind of problems did we hit:

  • One of the main issues as mentioned above was an interface in API and then some sort of implementation in rt-core. For the most part, users always looked up the implementation via the bus.getExtension(..) call so the impl is really “private”. In those cases, the impl was moved to a new package in rt-core.
  • Abstract base classes in rt-core – similar to above. We had the interface in api, but an abstract base class in rt-core. There were then subclasses of that base class in the other modules. This really showed that the abstract base classes should have been part of the API, which is exactly what we did. Moved the class to API.
  • Promote some stuff to API – we did have a bunch of utility things in rt-core that was easily promoted to api. Things like our CachedOutputStream, mime attachment support, etc… that were easy to just promote up. They are heavily used in several submodules so it made sense to promote them anyway.
  • Pushed some optional stuff out of API – particular WS-Policy support. We did have some ws-policy related stuff in API that we could not resolve the split-package issue by promoting up. Thus, we pushed down into the cxf-rt-ws-policy. This may have an end user impact as they may need to have the cxf-rt-ws-policy module on their classpath to compile their code. However, they would have needed it on their classpath anyway to run the code so impact should be very minimal. The good thing about this is that this then reduces the imports in cxf-api by not requiring things like neethi for the api module. For JAX-RS users, this is good.

There were other issues caused by promoting stuff into API, but for the most part, they ended up being resolvable by loading an implementation via a bus extension. That seems to have cleaned things up quite a bit.

Anyway, it’s been a very challenging few weeks as we worked through all the split package issues. To be honest, the end result isn’t “ideal”. There is definitely more stuff in API than I’d really like to have there, but we were able to maintain a high degree of compatibility with existing applications. That’s important. Maybe when we do a “3.0″ we can re-look at that, but for now, compatibility is very important. With the split package issue resolved, Christian and I are embarking on a new mission to examine the OSGi manifests of all the individual jars to adjust import version ranges, check for optionals, update the karaf features file, etc… One advantage of the big bundle was we only needed to do that in one place. Now we have many more touch points. However, the end result should definitely be a better experience for end users. I’m very excited about that.

Apache Camel in OSGi

November 3rd, 2011 1 Comment

After writing my post about setting up Apache CXF in OSGi, I’ve had a couple people ask about how to do the same with Apache Camel. Camel is a bit more complex due to all the libraries that it may pull in depending on which components you use, but it’s not really that much more complex.

As with CXF, the easiest way to get it up and running would be to grab a distribution that has it ready to go. Again, I’d recommend either the Talend Integration Factory or Talend ESB. Both have Apache Camel, Apache CXF and Apache ActiveMQ pre setup to run and everything well configured. The more popular Camel components are pre-installed with their dependencies making it fairly easy to get started.

That said, setting up your own instance is really not that hard.

  1. Follow steps 1-3 from my post about setting up Apache CXF in OSGi. This gets Karaf setup and ready.
  2. From the Karaf command line, run:
    features:addurl mvn:org.apache.activemq/activemq-karaf/5.5.0/xml/features
    features:addurl mvn:org.apache.cxf.karaf/apache-cxf/2.5.0/xml/features
    features:addurl mvn:org.apache.camel.karaf/apache-camel/2.8.2/xml/features
    

    That will add the ActiveMQ, CXF, and Camel features into Karafs feature resolver.

  3. If you plan on using JMS, it’s recommended to install the ActiveMQ features first. From the Karaf command line, run:
    features:install activemq
    features:install activemq-spring    (if you use spring)
    
  4. Next up would be CXF if you plan on using the CXF components. Again, from the Karaf command line, run:
    features:install cxf
    
  5. Finally, install the Camel components you are interested in using:
    features:install camel-core
    features:install camel-spring
    features:install camel-jms
    features:install camel-cxf
    etc...
    

    You can get a list of the Camel features by doing a “features:list | grep camel” from the Karaf command line.

That’s pretty much all there is to it. It’s not hard. :-) It’s now all setup! To try it out, I’d once again grabbing the samples package from Talend Integration Factory. There are several examples in there on how to do various Camel things, all running in OSGi.

This morning, I sent out an announcement that Apache CXF 2.5.0 is now released! Over 6 months of work went into this release to expand the feature set, stabilize it, and get it ready for release.

There are two major categories of new “stuff” in this release:

  1. Normal “framework” features. These features expand the capabilities of CXF as a Services Framework and are more in line with the traditional CXF enhancements. In 2.5.0, these include things like:
    • Enhanced WS-RM interopability
    • WS-MetadataExchange support
    • OAuth support
    • Enhanced security options for REST
    • Enhanced JMX instrumentation
    • Enhanced OSGi and Karaf integration
  2. New “Services”. This is a brand new area for CXF. With 2.5.0, the scope of CXF now expands to include some enterprise level services that are related to CXF. At this point, they include:
    • Security Token Service
    • WS-Notification Service

I’m very excited about the second category of enhancements. Traditionally, CXF has always been “just a framework” that can be used to build your services. CXF never really provided any services “out of the box”. I’m hoping that by providing some enterprise level services out of the box, it will become even easier for people to develop and deploy CXF based applications in their enterprise. Both of the new services, while “new” for CXF, are really enterprise ready. The WS-Notification Service is a port (mostly done by Guillaume Nodet) of the ServiceMix WS-Notification service which has been deployed and used in production in many places. By removing the JBI dependencies and re-writing the communications to use pure JAX-WS API’s, the WS-Notification service is now much more usable by a wider audience. The STS was developed by Talend for some of our customers and has been deployed into production. Colm has done an amazing job with the STS to get it to meet a wide range of requirements. Definitely check out his blog for more information about the capabilities of the STS.

Anyway, this is another big release for Apache CXF. Definitely give it a spin and let us know how we’re doing!