Today I was checking my standalone vRealize Orchestator (vRO) appliance in my 24×7 Lab environment for scheduled task activities but something didn’t seem to be right. All scheduled tasks appeared to be running without problems, but there was no workflow output.
So what kind of schedule tasks is this Orchestrator running you might ask? This Orchestrator is responsible for maintaining my environment. Think about tasks like cleaning up snapshots, removing mounted CD-ROM media and adding new Virtual Machines to my backup schedule.
Let’s analyze the problem:
To check what was going-on I created a Workflow with a Scriptable Task that outputs text (the Workflow is displayed in the screenshot below). After realizing that vRO was not able to output any information to the screen it was time to start troubleshooting.
To start troubleshooting I connected with SSH to the vRealize Orchestrator appliance and started looking at the log files. Because vRO consists out of a lot of smaller sub systems it’s important to find the right log file.
- First I ran a command to show all log files with the latest change date. It appeared that the “catalina.out” was very busy. Every time I ran the command the file had a new timestamp.
ls -ltur /var/log/vmware/vco/app-server/
- So lets look in the log file to figure out who is logging to this file:
tail -f /var/log/vmware/vco/app-server/catalina.out
- It appeared that Apache Lucene was generating a continuous stream of messages/errors.
Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene6 0' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current cl asspath supports the following names: [Lucene62]
... 103 more
19:30:49,588 WARN StubbornRetrier:38 - Retry 1/6 failed because of: Could not load codec 'Lucene60'. Did you forget to add lucene-backward-codecs.jar?
19:30:49,609 WARN StubbornRetrier:38 - Retry 5/6 failed because of: Could not load codec 'Lucene60'. Did you forget to add lucene-backward-codecs.jar?
19:30:49,611 WARN StubbornRetrier:38 - Retry 3/6 failed because of: Could not load codec 'Lucene60'. Did you forget to add lucene-backward-codecs.jar?
As I explained before vRO consists out of multiple smaller systems and Apache Lucene is one of them. So what is Apache Lucene exactly?
Apache LuceneTM is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
When searching on the log file error I directly found a VMware KB to resolve this issue.
- Start a SSH session with the vRO Appliance.
- Login with the root account.
- Stop the vRO server service.
service vco-server stop
- Run the following command to remove the old lucene logs, this might be a single file or multiple files.
rm -rf /var/log/vco/app-server/scripting.log_lucene*
- Start the vRO server service.
service vco-server start
To confirm the procedure/fix I ran the test workflow again. The screenshot is listed below and shows output to the screen. This is the expected behavior so we can conclude that the fix solved the problem.
Based on the information already available on the internet it seems that the upgrade from vRealize Orchestrator 7.3.1 to 7.4 is the root cause of the problem. The positive side of the problem is that the vRealize Orchestrator engine is just doing its work. In my lab environment I discovered the problem three weeks after the upgrade. To clarify: all my workflows that were running in these three weeks were not experiencing any problems.