Monday, February 23, 2009

OC4J Containers getting restarted in OAS

In the last weeks I have been called by a number of customers having problems with their newly installed Oracle Application Server, or rather Oracle SOA Suite.
I found more than once an issue with the timeouts on response times of these OC4J Containers.
All of the situations I was faced with were SOA Suites that had just been taken into production. The behaviour differed a little from case to case, but nevertheless the solution to the problem was found in the same solution for each of them.
OPMN manages the OC4J containers in Application Server (10g, that is). Because of this, it tries to ping all of the containers in order to check whether they are still alive. If the ping doesn't get returned quickly enough, it restarts the OC4J container and writes an error into the opmn.log file located under $ORACLE_HOME/opmn/logs. This can lead to various problems: If OPMN restarts the HTTP Server, the website will temporarily be unavailable which can be rather disturbing. If other OC4J containers get restarted, it can lead to various other errors like HTTP-500 (Internal Server Error) or other problems.
The solution is to tune the OC4J container ping parameters in the opmn.xml file found under $ORACE_HOME/opmn/conf:

For the http server, look for the tag process-type id="HTTP_Server" module-id="OHS". Add the following after this tag:

<process-set id="HTTP_Server" restart-on-death="true" numprocs="1">
<start timeout="300" retry="3"/>
<stop timeout="300"/>
<restart timeout="300" retry="3"/>
<ping timeout="60" interval="600"/>
</process-set>

Next, look for the tag within the HTTP Server definition and add the following lines:

<category id="ping-parameters">
<data id="ping-url" value="/"/>
</category>
<category id="restart-parameters">
<data id="reverseping-timeout" value="345"/>
<data id="no-reverseping-failed-ping-limit" value="3"/>
<data id="reverseping-failed-ping-limit" value="6"/>
</category>

For all other OC4J containers in the file, insert the same lines you added between the tags for the HTTP Server.
This will increase the timeout for the response of the OC4J container, giving it a little more time.

2 comments:

  1. Hi,

    Appreciate your post on this particular issue. We are having same issue in our prod and would like to know more on this. I have few questions as below:

    1) Do we need to use all the parameters mentioned in OPMN.xml?

    2) Can these parameters also been used for other containers?

    3) Can you please provide us with the default values for all the parameters you have mentioned with the recommended values for those parameters?

    Regards

    ReplyDelete
  2. Thanks for your comment, Anonymous. To answer your questions (as far as I can):
    1. I wouldn't recommend removing parameters from opmn.xml. They are there for specific reasons. They pretty much all can be tuned, however.
    2. The parameters that I specified in my post are usable for all OC4J containers in opmn.xml. Not all parameters can be used for all containers, I guess.
    3. Default values are not all known by me. I know that the
    reverseping-timeout I think defaults to 300
    no-reverseping-failed-ping-limit defaults to 1
    reverseping-failed-ping-limit defaults to 3

    Regards,
    Arnoud
    PS, you can contact me through LinkedIn if you want to have more info.

    ReplyDelete