10 Years of Apache CXF

On April 16th 2008, Apache CXF graduated from the Apache incubator and officially became a top level project. At that time, CXF was mostly just a “soap stack”, although it did have a few additional bindings, such as as CORBA binding thanks to the contributions from IONA/Progress. Since then, the scope has expanded quite a bit. Top notch REST/JAX-RS support was added. Several new security specs were implemented. New services were added to CXF that uses the base framework. Things like WS-Discovery, a full STS, WS-Notification, and an XKMS implementation were added. The REST/JAX-RS implementation was enhanced to support several different API specs (wadl, swagger 2, openapi, etc…), various search providers, several types of “reactive” integrations, etc… We’ve recently received contributions to implement the MicroProfile client API’s.

Two sub-projects are now part of CXF as well. DOSGi is an implementation of the Distributed OSGi spec. Fediz is a security framework to implement WS-Federation Passive Requestor Profile and other specs to help secure your web application.

Over the last 10 years, we’ve done over 200 releases. We have 43 committers, 25 of which are PMC members. Many “mature” projects such as CXF have had to start thinking about entering the attic. Keeping a high level of involvement and forward progress for 10 years is not easy. CXF continues to receive new ideas, new features, fixes, etc… very often, constantly providing new releases for users to use. Part of this is due to the high level of adoption of CXF. Many companies rely on CXF for their WebService and REST needs. Companies like Talend, RedHat, WSO2, IBM, Savoir Technologies, Tomitribe, etc… rely heavily on CXF. Their users need CXF to “just work” as well as keep up with new technologies, specs, and features. Several of these companies pay some of their employees to contribute and support CXF. Tons of thanks go out to those companies for helping make CXF the success that it has become.

It’s been an exciting 10 years and I look forward to seeing what the future reveals.

Oracle and JAVA EE

Those of you in the “Java EE” may have already seen the announcement from Oracle that was posted yesterday concerning the future of Java EE. This is potentially very exciting news, particularly for the various Apache projects that implement some of the Java EE specs. Since Apache CXF implements a couple of the specs (JAX-WS and JAX-RS), I’m looking forward to seeing where Oracle goes with this.

For those that don’t know, several years ago, I spent a LOT of time and effort reviewing contracts, the TCK licenses, sending emails and proposals back and forth with Oracle’s VP’s and Legal folks in an attempt to allow Apache to license some of the TCK’s (Technology Compatibility Kit) that the Apache projects needed. In order to claim 100% compliance with the spec, the projects need to have access to the TCK to run the tests. Unfortunately, Apache and Oracle were never able to agree on terms that would allow the projects to have access AND be able to act as an Apache project. Thus, we were not able to get the TCK’s. Most of the projects were able to move on and continue doing what they needed to do, but without the TCK’s, that “claim of compliance” that they would like is missing.

I’m hoping that with the effort to open up the Java EE spec process, they will also start providing access to the TCK’s with an Open Source license that is compatible with the Apache License and Apache projects.

Apache CXF and WS-Discovery

One of the new features in Apache CXF 2.7.x that I worked hard on was the introduction of support for WS-Discovery. WS-Discovery is basically a standard way for a service to announce when it’s available as well as standard way to probe the network for services that meet certain criteria and have the services that meet that criteria provide a response. Most ESB’s now have some sort of registry component or locator component or similar that provide a similar need. However, they are generally more proprietary in nature and, in many cases, will only work with services deployed in or managed by that ESB. WS-Discovery is completely standards based (OASIS) and is completely independent of any ESB, application server, etc…

So, how does it work? If the CXF WS-Discovery jars/bundles are available when a service starts, CXF will automatically register a ServerLifecycleListener onto the Bus. When the service starts, that listener will send a WS-Discovery “HELLO” message out on the network using the SOAP over UDP spec. When the service stops, it will send a “BYE” message out. Most users don’t need those messages, but if you do have an application that needs to keep track of services that are available, you could listen for them. The CXF WS-Discovery listener will also start an internal WS-Discovery service that will listen for SOAP/UDP “PROBE” requests on the network, process those requests to see if the service matches it, and respond with information (such as the address URL) if it does. This is all automatic. All that is needed is to add the WS-Discovery jars.

CXF also provides an API for probing the network for services. It’s only slightly documented right now, but you can easily look at the source for WSDiscoveryClient. Basically, some simple code like:

WSDiscoveryClient client = new WSDiscoveryClient();
List references 
    = client.probe(new QName("http://cxf.apache.org/hello_world/discovery",
//loop through all of them and have them greet me.
GreeterService service = new GreeterService();
for (EndpointReference ref : references) {
     Greeter g = service.getPort(ref, Greeter.class);

would use the WSDiscoveryClient to probe the network for all the services that can provide the “Greeter” service and then calls off to each one. It’s very simple.

The main problem with the WS-Discovery implementation in CXF 2.7.0 through 2.7.4 was that it only implemented WS-Discovery 1.1 as that is the actual OASIS standard that I looked at. However, there are many devices out there that only will respond to WS-Discovery 1.0 probes. In particular, any of the IP cameras that implement the ONVIF specification will only respond to 1.0. Thus, in 2.7.5, I updated the code to also handle WS-Discovery 1.0. The WSDiscoveryClient object has a setVersion10() method on it to change the probes over to WS-Discovery 1.0. With support for WS-Discovery 1.0, you can now use CXF to probe for any devices on the network that meet the ONVIF standard. No proprietary registry or anything required.

That’s pretty cool.

Now that the WS-Discovery stuff in CXF is fairly well tested and is known to work, I expect more of the downstream consumers of CXF to start integrating it into product offerings. I’m hoping to work on getting the Talend locator updated to use it. However, with the next (5.3.1) version of Talend ESB (due next month), you’ll be able to just “feature:install” the cxf-ws-discovery feature into the ESB and have the above all work. I also see that JBoss has already started integrating it into their application server.

Talend ESB Performance Tuning

I’ve spent quite a bit of time the last several weeks doing some performance tuning and profiling and such on the Talend ESB and decided to share some things I’ve learned.

How this all started: Asankha Perera contacted me in early July as they started preparing for round 6 of their ESB Performance benchmarks as they ran into a security related issue: Talend’s ESB was now rejecting the WS-Security messages due to the nonce cache that was added to prevent replay attacks. Since the benchmark is essentially a replay attack (sends the same message over and over again), Talend’s ESB was throwing an exception and blocking the benchmark from running. This is on top of the strict timestamp checking that Talend’s ESB has always done (and caused them issues last time as they had to regenerate the secure messages). From what I can tell, Talend’s ESB was the only one they needed to turn OFF various security things like this which likely means it’s the most secure of the ESB’s for WS-Security “out of the box”. Not too surprising given Talend’s excellent security folks. Anyway, as part of the report, he sent along some preliminary numbers for the other tests to look at. My initial look showed that the numbers were actually lower than round 5 which concerned me, which is why I dug in.

The initial investigation showed a configuration change was needed. The configs in the bitbucket repo were really for Talend ESB 5.0 and needed some minor updates for 5.1 to get the thread pool back up from 25 threads to the 300 threads the test calls for. That pretty much got the test results back up to round 5 levels, so I really could have left it at that, but I decided to take a little time and play some more.

Some things I discovered:

  1. System.getProperty(..) is synchronized – DON’T put this on a critical path. My very first “kill -3” during processing of the various small messages had over 180 threads (of the 300) blocked in here. I did some more investigation and found two major causes in the Talend ESB:
    • Bug/Regression in Woodstox – Woodstox was calling this for every Reader and Writer that was created. That’s 4 times per request for the proxy cases. I logged a bug with them (now resolved in their latest release) and downgraded Woodstox for the time being.
    • DocumentImpl constructor – for some reason, Sun/Oracle added a call to System.getProperty into the DocumentImpl constructor for the DOM implementation built into the JDK. CXF caches SOAP headers in a DOM so this added 2 more calls per request. Grabbing the latest xercesImpl and forcing that to be used solved this issue.

    Getting those fixed definitely helped reduce a bit of the contention.

  2. The next choke point I found was in CXF’s handling of the thread default Bus. We were getting and setting the thread default bus several times per request, but each of those calls was in a synchronized block. Re-engineering how that is handled eliminated that.
  3. Next up was the JMX metrics in Camel. Updating the JMX stats in Camel is in a big synchronized block (which then calls a bunch of other synchronized methods on other objects). This is on my TODO to re-look at, but for the purpose of this test, I just turned off the JMX metrics. Most likely, just using Atomic values would allow removing the sync block. Not really sure though.
  4. The final major “synchronized” block I hit is in the HTTP keep-alive cache in the JDK. Nothing I can do about this one short term. Longer term, I’ve started working on a new Apache HTTP Components based HTTP transport (with the help of Oleg Kalnichevski) that may help, but not something for right now.

The above updates helped a little, but not really much, maybe a couple %. The main reason is that with 300 threads, even if 150 of them are blocked, there are still plenty of threads left able to do work. Removing the blocks just saved some time on context switches and cache hits and such.

That then got me looking into other things, primarily into Camel. I quickly discovered the XML handling in Camel is pretty poor. Eventually I’ll need to really look into to that, but for the short term, I was able to bypass much of it. The first thing I saw was that in SOME cases, the requests coming from CXF (which was passed from CXF to Camel as a streaming StaxSource) were being parsed into a DOM. What’s worse, due to the poor XML handling in Camel, the DocumentFactory and parser factories and such were being created to do so. That involves the SPI stuff in the JDK which involves a System property (see above) and a classpath search for stuff in /META-INF/services which is fairly slow in OSGi. Adding some extra type converters into Camel avoided all of that and provided a big boost.

I then started looking at some of the specific tests. For the content based routing, the original configs we had used XPath. However, the XPath stuff in Camel is again plagued by the poor XML handling which forced a DOM again. Ended up changing the test to use XQuery which performed much better.

The other thing I ended up doing was switch from PAYLOAD mode to MESSAGE mode for the CXF component for the tests that could be handled that way. This is a huge benefit. For the direct proxy and transport header cases, this allowed complete bypassing of all XML parsing. That’s huge.

Finally, I did some testing with using Apache Tomcat for the backend service instead of the toolbox thing that the esbperformance.org folks originally specified. For Talend, using Tomcat helped significantly. We had a bunch of timeouts and other errors with the toolbox thing, and performance was a lot better with Tomcat. Since using Tomcat is likely more “realistic”, I argued to change the tests to use that (and provided some optimized Tomcat configs). This likely helped all the ESB’s results. You’re welcome!

For the most part, things ended up pretty good. If you look at the results graph at the bottom of http://esbperformance.org/display/comparison/ESB+Performance, Talend’s ESB came out fairly good. One important measurement is the number of failures/timeouts: 0. Talend ESB and the UltraESB’s were the only ones to accomplish that. That alone is pretty cool. But the performance also ended up fairly good. I certainly have a lot more things to look at going forward, but for just a small amount of work, the results ended up quite good.

Video of my Apache CXF Presentation from CamelOne

I finally received the video of my presentation that I gave at CamelOne 2012 in Boston about the new features in Apache CXF. A little editing later (note for speakers: making sure the speaker agreement provides you with copies of any videos and complete rights to do anything you want with them is a good idea. Major thanks to the organizers for agreeing to that), a not-so-quick upload to YouTube, and voila:

If you are having trouble viewing the video, click here to go directly to YouTube.