|
Oracle® Application Server 10g Best Practices
10g (9.0.4) Part No. B12223-01 |
|
|
|
|
This chapter describes management and monitoring best practices for Oracle Application Server. It includes the following topics:
Section 2.2, "Oracle Process Manager and Notification Server Best Practices"
Section 2.3, "Distributed Configuration Management Best Practices"
This section describes best practices for Oracle Enterprise Manager. It features the following topics:
Section 2.1.1, "Select the Framework Options That Best Suit Your Needs"
Section 2.1.2, "Monitor and Diagnose Performance Bottlenecks and Availability Problems"
Section 2.1.3, "Monitor Application Performance During Application Development or Test Cycles"
Section 2.1.4, "Monitor Rate and Aggregated Performance Metrics"
Section 2.1.6, "Monitor End-User Response Times of Web Pages"
Section 2.1.7, "Monitor the Availability of a Web Application"
Section 2.1.8, "Proactively Monitor Web Application Transactions"
Section 2.1.10, "Use the Host Home Page to Help Diagnose Performance Issues"
Section 2.1.11, "Use Alerts and Notifications to Proactively Monitor System Availability"
Section 2.1.13, "Use Clusters for Application Deployment and Configuration Management"
Section 2.1.14, "Use the Deployment Wizard to Deploy Applications"
Section 2.1.16, "Use Job System to Periodically Back Up Your Configuration"
Section 2.1.17, "Managing Both Oracle Application Server and the Oracle Database"
There are ways to deploy Oracle Enterprise Manager in order to give you the flexibility to select the configuration that best suits your needs. If you are working in a simple development or test environment, or if you have a single Oracle Application Server 10g instance to manage, you can use Oracle Enterprise Manager Application Server Control (Application Server Control), which is available with any Oracle Application Server middle-tier installation. Application Server Control allows you to directly access all the pages for managing and monitoring the instance.
In a production environment, you typically manage a wider variety of software and hardware components. For example, you need to manage the databases and host computers that support your Web applications. For your production environment, you should use Oracle Enterprise Manager Grid Control. The Grid Control Console provides you with a central location from which you can manage your Oracle Application Server instances, your databases, and your entire Oracle environment. Oracle Enterprise Manager Grid Control also supports sharing of information between administrators.
Once you have set up Oracle Enterprise Manager Grid Control to monitor for availability and performance issues, you will be alerted when a problem is detected. If Oracle Enterprise Manager detects that an application server component is unavailable, you can use Application Server Control to check the status of the component and restart it if desired. If a performance issue was detected, with a component or application, you can drill down to the component home page and view detailed performance and diagnostic information. You can also drill down from the Oracle Application Server Containers for J2EE (OC4J) home page to find applications, modules, and methods. Using these drill downs, you can diagnose and resolve performance issues.
During application development and testing, you can use the Application Server Control to monitor the application's resource usage and identify bottlenecks. For example, during a performance or load test you can view memory and CPU use for the Oracle Application Server instance overall and for the application. You can also drill down to find sessions, modules, EJBs, and methods that may be bottlenecks in the application.
Application Server Control home pages and drill downs include rate and aggregated performance data that are not available via command line or other tools. For example, you can use Oracle Enterprise Manager to view average processing time for a HTTP request, allowing you to zero in on specific requests that may be slow.
Oracle Enterprise Manager also displays performance information, such as average processing time for a servlet for the most recent 5 minutes, in addition to averages since startup. This allows you to more easily diagnose problems in real-time.
Oracle Enterprise Manager provides comprehensive diagnostics that enable you to quickly pinpoint J2EE performance problems within the middle tier. To determine performance bottlenecks within your application, use the Web Application Page Performance page to identify the slowest URLs by OC4J processing time. Each URL is broken down by JSP, servlet, EJB and JDBC processing times. By traversing through the invocation paths of the URL call stack down to the SQL statement level, you can quickly identify the source of the bottlenecks causing application slowdowns. Use the application correlation feature to determine whether other system level problems have attributed to performance bottlenecks.
In addition, you can trace all invocation paths of a Web application starting at the transaction level on an on-demand basis, and diagnose performance problems across all tiers: from the network, through the middle tier (including JSP servlet, EJB, and JDBC times) down to the SQL statement level.
The mentioned features are enabled through configuration of Oracle Application Server through Oracle Enterprise Manager. Refer to the Oracle Enterprise Manager Advanced Configuration Guide for information on how to configure and enable this option.
To monitor the actual performance of your Web application as experienced by your end-users, use the End-User Performance Monitoring feature for Web applications. The Application Service Level Management feature of Oracle Enterprise Manager allows you to view and analyze the actual page response times for all URLs accessed by all your end-users. You can assess the impact of a performance problem on your end user base, or view page performance data by visitor, domain, region, or Web server, or by a combination of these axes. Also, you can highlight the monitoring of the most critical pages of your Web application by setting up a Watch List.
The End-User Performance Monitoring option requires configuration of the OracleAS Web Cache to instrument end-user performance data. Refer to the Oracle Enterprise Manager Advanced Configuration Guide for information on how to configure and enable this option.
The definition of Web application availability varies depending on the application itself. With Oracle Enterprise Manager, you have the flexibility to define what constitutes the availability of your Web application. Web application availability is defined by a designated availability transaction and the key representative user communities of the application. Availability of a Web application is determined by the monitoring of the availability transaction from various user communities (also known as beacons) at specified intervals. Alerts are generated to immediately inform you of when your Web application is considered down. For more information, refer to the Oracle Enterprise Manager Advanced Configuration Guide.
Oracle Enterprise Manager provides a proactive approach to monitoring Web applications through transaction performance monitoring. Synthetic business transactions are created using the transaction recorder, and are then replayed and monitored at specified intervals from key representative user communities called beacons. Measure the response times of key business transactions from various geographical user communities using this feature. Use transaction performance monitoring to:
isolate server-side problems from network delays
profile how much time is spent connecting to the server
document its first byte time
time spent serving HTML and non-HTML content
Alerts will notify you when transaction response time thresholds have been exceeded. Refer to the Oracle Enterprise Manager Advanced Configuration Guide on how to configure and enable this option.
Applications that access the database using SQL can be tuned using the Oracle Tuning Pack. Oracle Tuning Pack offers a cost-effective and easy-to-use solution that automates the entire application tuning process. Automatic SQL tuning is exposed through two new Oracle 10g Database components: SQL Tuning Advisor and SQL Access Advisor. Both components are seamlessly integrated with Oracle Enterprise Manager Grid Control and Database Control.
The SQL Tuning Advisor takes one or more SQL statements as input and applies the automatic SQL tuning process on it. The output of the SQL Tuning Advisor is in the form of recommendations, along with a rationale for each recommendation and its expected benefit.
The SQL Access Advisor provides comprehensive advice on how to optimize schema design in order to maximize application performance.
These two SQL advisors automate all manual tuning techniques currently practiced and form the core of automatic tuning solution.
The Application Server Control home page not only displays critical performance data and resource usage for the application server instance, it also includes a link to information for the host. For example, if your application server is performing poorly you can first drill down to the related Host home page to determine if the underlying problem is due to resource problems with the host and other processes, or to services running on the computer.
Oracle Enterprise Manager Grid Control allows you to monitor your systems for specific conditions, such as loss of service or poor performance. When such a condition exists, Oracle Enterprise Manager generates an alert, which displays automatically on the appropriate Oracle Enterprise Manager home pages. In addition, you can also be notified via email or Web page. Minimally, you should set up Oracle Enterprise Manager Grid Control to alert you when your critical or production application servers are unavailable.
You can also configure Grid Control to notify specific administrators when an event condition occurs. This simplifies cooperation between administrators who share responsibility for the same systems.
When you edit the configuration of Oracle Application Server components including Oracle HTTP Server, OC4J, or OPMN, you should do so using Application Server Control. Oracle Enterprise Manager ensures that your configuration changes are updated in the repository. If you edit these configuration files manually, you must use the DCM command-line utility, dcmctl, to notify the DCM repository of the changes.
Using Oracle Application Server clusters simplifies management and maintenance of your application servers. Clustering enforces consistent configurations across all members of the cluster. If you want to make a configuration change in every instance, you only need to make the change once. The clustering mechanism ensures that the new configuration is propagated to all members.
Similarly, clustering also enforces consistency of deployed applications across all application server instances. If you wish to deploy a new application or update an existing deployment on every application server instance in the cluster, you only need to deploy or update the application once. The clustering mechanism ensures that the application is properly deployed to all members.
A simple way to deploy an application is to use the Oracle Enterprise Manager deployment wizard, which can be accessed from the Application Server Control. The wizard walks you systematically through all the essential deployment options to ensure that your application is deployed correctly.
In some cases, you may want to deploy an application during off-hours or at a certain scheduled time. You can use the Oracle Enterprise Manager job system to schedule a deployment to occur at a selected time. Simply create a script containing the DCM command-line dcmctl deploy application command and schedule the script via the Oracle Enterprise Manager job system.
Periodically you should back up your application server configuration. By saving your configurations, you can restore the backed up settings if you ever need to undo configuration changes made. You can use the DCM command-line utility's dcmctl createArchive instance command in a script to save the configuration and application information for an application server instance. You can then schedule the backup script to run periodically using the Oracle Enterprise Manager job system. This ensures that backups of your configurations are taken on a regular basis.
If you plan to manage both your Oracle Application Server instances and your Oracle database from the same management console, install the latest version of Oracle Enterprise Manager 10g Grid Control. This will ensure that you have the most up-to-date functionality for managing both types of targets.
This section describes Oracle Process Manager and Notification (OPMN) Server best practices. It includes the following topics:
Section 2.2.2, "Never Start or Stop OPMN Managed Components Manually"
Section 2.2.3, "Review stdout and stderr Logs If A Component Does Not Start"
Section 2.2.4, "Increase Timeout For Components That Take A Long Time To Start or Stop"
Section 2.2.5, "Set Retry to High Values For Components Running on an Overloaded System"
Section 2.2.6, "Leverage Additional Logging to Aid in Debugging"
The OPMN server should be started as soon as possible after turning on the host. OPMN must be running whenever OPMN-managed components are turned on or off. OPMN must be the last service turned off whenever you reboot or turn off your computer.
Oracle Application Server components managed by OPMN should never be started or stopped manually. Do not use command line scripts or utilities from previous versions of Oracle Application Server for starting and stopping Oracle Application Server components. Use the Application Server Control or the opmnctl command line utility to start or stop Oracle Application Server components.
The standard output (stdout) and standard error (stderr) of OPMN managed processes are reported in the log file in available in the ORACLE_HOME/opmn/logs directory. OPMN creates a log file for each component and assigns a unique concatenation of the Oracle Application Server component with a number. For example, the standard output log for OracleAS Web Cache may be WebCache~WebCacheAdmin~1. The process specific console logs are the first and best resource for investigating problems related to starting and stopping components.
The stdout and stderr log files are reused and appended to when a component is restarted so these files can contain output from multiple invocations of a component.
The time it takes to execute an opmnctl command is dependent on the type of Oracle Application Server process and available computer hardware. Because of this the time it takes to execute an opmnctl command may not be readily apparent. For example, the default start time out for OC4J is approximately five minutes. If an OC4J process does not start-up after an opmnctl command, OPMN will wait approximately an hour before timing out and aborting the request.
Increase the start element timeout attribute for the component that takes a long time to start. Set the timeout in the opmn.xml file at a level that will allow OPMN to wait for process to come up. This functionality is also available with the startproc command. Similarly increase the stop element timeout attribute in opmn.xml for the component that takes a long time to stop.
Pings occur periodically between OPMN and the components that it manages to ensure that each component is not unresponsive and is capable of servicing requests. Ping failures result in a certain number of retry attempts and multiple failures in a row result in a restart of the component. On overloaded systems, it may be necessary to increase the number of retry attempts made before restarting the component.
Both <start> and <restart> elements, for each OPMN managed component in the ORACLE_HOME/opmn/conf/opmn.xml configuration file, accept an attribute named retry, which is the number of times to retry a ping attempt before a component is considered hung. Reasonable default retry values have been chosen, but when necessary, explicitly set this attribute in the appropriate element to a value greater than what is needed for the component to be pinged successfully.
OPMN provides different levels of logging. In a typical production mode, the log level should be set to a minimum. However, the following are steps should be performed, prior to contacting technical support, when having a problem related to OPMN:
Set both log levels, the level attribute of <log-file> element for both <notification-server> and <process-manager> elements in the opmn.xml file, to 8 or 9.
Execute the $ORACLE_HOME/opmn/bin/opmnctl debug command and save the output to a file.
Save a copy of all logs in the $ORACLE_HOME/opmn/logs directory.
The file at this log level contains valuable information to assist in debugging.
OPMN is configured at installation with default start order dependencies, which allows you to start all of the components in an instance in a specific order with a single command. But if a specific component requires that other components and services are up and running before it starts, you can configure additional dependencies according to the environment.
You can configure OPMN to execute your own custom event scripts whenever a particular component starts, stops, or crashes. You will find it useful to use one or more of the following event types:
pre-start: OPMN runs the pre-start script after any configured dependency checks have been performed and passed, and before the Oracle Application Server component starts. For example, the pre-start script can be used for site-specific initialization of external components.
pre-stop: OPMN runs the pre-stop script before stopping a designated Oracle Application Server component. For example, the pre-stop script can be used for collecting Java Virtual Machine stack traces prior to stopping OC4J processes.
post-crash: OPMN runs the post-crash script after the Oracle Application Server component has terminated unexpectedly. For example, a user could learn of component crashes by supplying a script or program to be executed at post-crash events which sends a notification to the administrator's pager."
Refer to the Oracle Process Manager and Notification Server Administrator's Guide for a sample pre-start event script.
OPMN has the ability to manage arbitrary daemon processes that are not part of your Oracle Application Server installation. Even more sophisticated process management services can be created by supplying the opmn.xml file the optional paths to scripts for stopping, restarting, and pinging the daemon process.
Here is a simple example of an opmn.xml configuration for a custom component. The following lines load and identify the custom process module:
<module path="%ORACLE_HOME%/opmn/lib/libopmncustom.so"> <module-id id="CUSTOM" /> </module>
The following lines represent the minimum configuration for a custom process:
<ias-component id="Custom"> <process-type id="Custom" module-id="CUSTOM"> <process-set id="Custom" numprocs="1"> <module-data> <category id="start-parameters"> <data id="start-executable" value="Your start executable here" /> </category> </module-data> </process-set> </process-type> </ias-component>
For complete configuration details refer Oracle Process Manager and Notification Server Administrator's Guide.
This section describes best practices for Distributed Configuration Management (DCM). It contains the following topics:
Section 2.3.1, "Use Distributed Configuration Management Archiving"
Section 2.3.2, "Specify a Single Instance in a Cluster as the Management Point"
Section 2.3.3, "Do not Perform Concurrent Administration Operations"
Section 2.3.4, "Do not Run updateConfig Concurrently with any Other Configuration Operation"
Section 2.3.6, "Use High Availability Features for Infrastructure Repository"
The DCM archive feature provides a convenient and easy means of managing snapshots of the DCM managed portions of Oracle Application Server system configuration. Archives are useful for staging changes, recovering from errors, and to provision DCM managed configuration information associated with one Oracle Application Server instance to another.
DCM managed system configuration includes configuration for a farm, clusters, Oracle HTTP Server, OPMN, OC4J, and JAZN. For OC4J, in addition to configuration information related to the container itself, DCM manages all deployed J2EE applications.
If you use DCM clusters, DCM assures that any change to the configuration is automatically distributed to all members of the cluster. As an alternative to using clusters, an archive of a staged configuration can be applied manually to non-clustered instances in a farm.
A hybrid staging solution is to first stage and test changes to a non-clustered instance, archive the changes, and finally apply the archive to a DCM cluster. These changes are then automatically propagated to all members of the cluster.
For example, to create an archive prior to deploying a new J2EE application named "foo" use the command:
dcmctl createArchive -arch PriorToDeployingFoo -comment "prior to foo deploy V1"
When using createArchive, it is a good practice to use an archive name and a corresponding comment that identifies the version of configuration that the archive is associated with.
Oracle Application Server instances grouped in a cluster can be managed, as a single point of administration, using Application Server Control or dcmctl on any instance in the cluster. One instance should be used as the administrative point for the entire cluster.
Specifying a single instance in a cluster as the management point ensures that operations are executed in the correct order and are properly serialized.
When changing a instance specific configuration, (for example port numbers, host names or virtual hosts), on a particular instance in the cluster, you must ensure that there are no other administrative changes occurring concurrently in the cluster. You want to avoid conflicting changes to configuration settings. The result may be an unusable configuration.
Concurrent administration within a cluster is strongly discouraged. If multiple administrative operations are issued at the same time in a cluster, this can lead to errors and associated error messages.
Do not run the dcmctl updateConfig command concurrently with any other dcmctl commands or Application Server Control configuration operations from multiple Oracle Application Server instances in a farm or cluster. If updateConfig is being executed concurrently with other configuration operation, there is a risk of conflicting changes being placed in the metadata repository. This could leave the configuration stored in the metadata repository in a non-functional state and could require a restore from the archive.
When using a File-Based Repository, you should stop and then start Application Server Control after issuing the following dcmctl commands:
joinCluster
joinFarm
leaveCluster
leaveFarm
Use following commands to restart Application Server Control:
% emctl stop iasconsole
% emctl start iasconsole
The infrastructure repository houses all the configuration information for the Oracle Application Server instances in a farm. This information is critical during startup, since DCM ensures that the local configuration of any node is synchronized with the configuration in this central repository. Therefore, it is a good idea to employ the high availability features for the infrastructure instance.
However, it is also important to understand that the database-based repository (in the case of a J2EE and Web Cache installation) is used for management operations and OracleAS Single Sign-On. Thus, if a site is not using single sign-on capabilities, then the repository is primarily required to be up when performing configuration management operations such as deploying new applications or, joining a, or moving from a cluster.
The following are best practices when using dcmctl:
Always use -d and -v options with dcmctl commands.
By default, the dcmctl script is configured for programmatic usage. Instead of displaying lengthy messages that can differ across releases and languages, error codes are displayed, such as ADMN-90605. Scripting tools can use these error codes to perform different activities based upon the result of commands.
Unfortunately a message like ADMN-906005 does not mean much by itself. In order to see an explanation of the error code, the -d and -v switches should be used whenever possible.
Always use dcmctl getreturnstatus to determine whether a command failed after timeout
Long running operations will often timeout but continue to execute asynchronously. This is indicated by dcmctl with an ADMN-906005 error code:
Using the dcmctl deployApplication command with the -v option as an example, the following message will be displayed.
"The specified command "deployApplication", is being executed asynchronously. The maximum wait time of n seconds has been reached. This operation will continue to execute to completion. Use the "getReturnStatus" command to determine if/when the operation completes successfully."
Once this timeout message is received, you can invoke the dcmctl getReturnStatus command periodically until the operation has completed.
Use dcmctl shell mode for multiple commands.
When you need to perform a number of dcmctl commands, the dcmctl shell or the dcmctl command file options should be used. See the Distributed Configuration Management Reference Guide for complete documentation on dcmctl.
Each initialization of dcmctl requires creation of a Java Virtual Machine and the parsing of a number of XML documents. This initialization only has to occur once if using a dcmctl shell versus multiple times if executing a set of dcmctl commands individually.
This section describes Dynamic Monitoring Services (DMS) best practices. It includes the following topics:
It is a good practice to monitor Oracle Application Server regularly. Monitoring Oracle Application Server and obtaining performance data can assist you in tuning the system and debugging applications with performance problems.
Refer to Oracle Application Server 10g Performance Guide for available monitoring tools.
Run the dmstool command with the -dump option periodically, such as every 15 to 20 minutes, to capture and save a record of performance data for your Oracle Application Server installation. If you save performance data over time, it can assist you if you need to analyze system behavior to improve performance or if problems occur. Using dmstool -dump reports all the available metrics on the standard output. The -dump option also supports the format=xml query. Using this query at the end of the command line supplies the metric output in XML format.
Consider instrumenting applications with DMS metrics. Adding performance instrumentation to Java applications will help developers, system administrators and support analysts understand system performance and monitor system status. DMS instrumentation refers to the process of inserting DMS calls into application code. Using the DMS API is a simple and efficient way to enable your application to measure, collect, and save performance information.
To create DMS metrics, developers add calls that notify DMS when events occur, when important intervals begin and end, or when pre-computed values change their state. At runtime, DMS stores performance information, called DMS metrics, in memory and allows you to save or view the metrics.
Refer to Oracle Application Server 10g Performance Guide for available monitoring tools.
Carefully consider the requirements for new metrics when you add DMS instrumentation. It is important to add a sufficient number of metrics to validate that your code is behaving as desired but not so much that the useful statistics become buried in too much detail. As a guide, try to observe the following rules when you add DMS metrics:
Add metrics only to provide an overview of the time the system spends in your block of code or module. You do not need to collect performance data for every method call, or for every distinct phase of your code or module.
When your code calls external code that you do not control, and you expect that this could take a significant amount of time, add a PhaseEvent Sensor to track the start and the completion of the external code.
The DMS metrics are organized in a tree, with leaf nodes acting as Sensor metrics and branching nodes acting as Nouns. Define DMS Nouns to organize Sensors and their associated metrics. Use Noun types for Nouns that directly contain Sensors. When a Noun contains only Nouns, and does not directly contain Sensors, AggreSpy displays the Noun type as a metric table, with no metrics.
Follow the guidelines for defining DMS name for users viewing DMS metric reports. This allows users to easily understand metrics across applications and across Oracle Application Server components. In applying the naming convention rules, try to be as clear as possible, if there is a conflict, you might need to make an exception.
In general, try to use only alphanumeric and underscore characters for naming and avoid using the "/" character.
Refer to Oracle Application Server 10g Performance Guide for different naming conventions.
Use the following coding recommendations for working with DMS:
When you create a new Noun or Sensor (PhaseEvent, Event, or State), its full name must not conflict with names in use by Oracle built-in metrics, or by other applications.
Avoid frequently creating and destroying Nouns and Sensors.
The DMS API calls are thread safe; they provide sufficient synchronization to prevent races and access bugs.
Avoid creating PhaseEvents which do not measure a section of code that is expensive under some set of conditions.
Be sure all PhaseEvents are stopped. Put the PhaseEvents start() in a try block with the stop() in the finally block.
Avoid creating any DMS Sensor or Noun more than once. You should define Sensors and Nouns during static initialization, or in the case of a Servlet, in the init () method.
Assign a type for each Noun. Nouns with no type specified are not shown in the Spy or AggreSpy display.
You should test and verify the accuracy of the metrics that you add to Java applications. Use the dmstool and the other available DMS monitoring tools to verify and test new metrics. Try the following to validate new metrics:
Do expected metrics appear in the display?
Do unexpected metrics appear in the display?
Verify that you have only added the metrics that you planned to add.
Are the metric values you see within reasonable ranges?
For example, a size of pool metric should never report a negative value.
Are metric values accurate?
This can be difficult to test; however, if an alternate means of measuring a particular metric is available then use it to verify metric values. For example, you can verify an Event Sensor count metric by examining records that you write to a log file or to the console.
When integrating DMS instrumentation with an existing package or when implementing a new feature, consider insulating a previously working system.
For example, you could include an option to enable and disable new DMS metrics.