Team LiB   Previous Section   Next Section

10.3 Killing Sniped Sessions

A sniped session occurs when a user has exceeded his idle time. The situation has been noted in the database, and the user's actual database session has been suspended. However, the user is still consuming a dedicated server resource that has not yet been allocated to someone else. This situation can have an adverse effect on overall database performance.

10.3.1 Limiting Resource Consumption

Oracle provides the ability to limit resource consumption via the user PROFILE, a collection of limits holding resource hogs in check. We've found that some of these limits are very useful, particularly IDLE_TIME. The IDLE_TIME limit disconnects user sessions if they remain unused for too long. When a session is disconnected in this manner, Oracle changes the status of the session to SNIPED in the V$SESSION system view. We've used this limit effectively — especially in data warehouse applications where a session may be consuming large swathes of memory even it is when idling.

Here's how you create a PROFILE called IDLE_LIMIT with an IDLE_TIME of 1 minute.[4] We'll assign it to scott:

[4] You can create an idle limit of 1 minute, but Oracle interprets it as 2 or 3 minutes, the lowest IDLE_TIME value it recognizes. The actual value depends upon the OS platform.

SQL> create profile idle_limit limit idle_time 1;
SQL> alter user scott profile idle_limit;

This following displays all non-default profile parameters:

SELECT *
  FROM DBA_PROFILES
 WHERE PROFILE != 'DEFAULT'
   AND LIMIT != 'DEFAULT';
  
PROFILE    RESOURCE_NAME                    RESOURCE LIMIT
---------- -------------------------------- -------- ----------
IDLE_LIMIT IDLE_TIME                        KERNEL   1
  
1 row selected.

This next example shows all users with nondefault profile assignments:

SELECT B.USERNAME, A.RESOURCE_NAME, A.LIMIT
  FROM DBA_PROFILES A, DBA_USERS B
 WHERE A.PROFILE = B.PROFILE
   AND A.PROFILE != 'DEFAULT'
   AND A.LIMIT != 'DEFAULT';
  
USERNAME   RESOURCE_NAME   LIMIT
---------- --------------- ---------------
SCOTT      IDLE_TIME       1
SCOTT_DUP  IDLE_TIME       1
  
2 rows selected.

When a session has been idle for longer than IDLE_TIME, Oracle changes the session status to SNIPED. The user typically notices this session suspension in the following way when he tries to run some more SQL commands, perhaps after a very long lunch break:

SQL> select USER from dual;
select USER from dual
*
ERROR at line 1:
ORA-02396: exceeded maximum idle time, please connect again

Even though the session has timed out, memory resources are still being consumed, as you can see in Example 10-12. The ps -fp18471 command shows that the Oracle dedicated session server is still in place. The SQL used to select this information from the V$SESSION view is shown in Example 10-13.

Example 10-12. Status of timed-out session in V$SESSION
13:24:31 SQL> /
                                SRVR
USERNAME SID SERIAL# STATUS      PID LOGON TIME        IDLE TIME
-------- --- ------- -------- ------ ----------------- -----------
SCOTT     16    1321 INACTIVE  18471 02/17/02 13:23:53 00:00:00:41
  
1 row selected.
  
13:24:33 SQL> /
                                SRVR
USERNAME SID SERIAL# STATUS      PID LOGON TIME        IDLE TIME
-------- --- ------- -------- ------ ----------------- -----------
SCOTT     16    1321 SNIPED    18471 02/17/02 13:23:53 00:00:02:10
  
1 row selected.
  
13:26:02 SQL> !ps -fp18471
UID      PID  PPID C STIME TTY    TIME CMD
oracle 18471 18470 0 13:23 ?  00:00:00 oraclets01 (DESCRIPTION=(LOCAL=YES) 
  
13:26:11 SQL>
Example 10-13. Displaying status of sessions
SELECT S.USERNAME, S.SID, S.SERIAL#, S.STATUS,
       TO_CHAR(LOGON_TIME, 'MM/DD/YY HH24:MI:SS') LOGON_TIME,
       SUBSTR('0'||TRUNC(LAST_CALL_ET/86400),-2,2)  || ':'  ||
       SUBSTR('0'||TRUNC(MOD(LAST_CALL_ET,86400)/3600),-2,2) || ':' ||
       SUBSTR('0'||TRUNC(MOD(MOD(LAST_CALL_ET,86400),3600)/60),-2,2)||
       ':' ||
       SUBSTR('0'||MOD(MOD(MOD(LAST_CALL_ET,86400),3600),60),-2,2)  
       IDLE_TIME
  FROM V$SESSION S, V$PROCESS P
 WHERE S.USERNAME = 'SCOTT' AND P.ADDR(+) = S.PADDR
 ORDER BY USERNAME, SID;

Notice in Example 10-12 that this is a database using dedicated server processes. Avoid killing sniped sessions with the kss.pl script in a database using Oracle's Multi-Threaded Server (MTS), because in such situations the script will disconnect all sessions that are using the same MTS dispatcher.

On a busy database with frequently created sessions, lapsed memory resource consumption may be tolerable. Sniped sessions are reused by newly logged-in sessions, minimizing resource wastage. However, on databases with infrequently created sessions, snipes can remain with us for quite some time. We've seen this happen in data warehouses with plenty of memory wastage, so we decided to go snipe hunting[5] — and lo, the kss.pl (Kill Sniped Sessions) script was born.

[5] For information on snipe hunting, please see http://www.snipehunter.com.

To make use of Oracle PROFILE allocations, your database must have the following parameter set in the INIT.ORA file: RESOURCE_LIMIT = TRUE. Using a PROFILE will have no effect otherwise. This parameter can also be set at runtime with the command:

SQL> ALTER SYSTEM SET RESOURCE_LIMIT = TRUE;

When removing a session from Oracle, the ALTER SYSTEM KILL SESSION command is often sufficient. In the toolkit, we've taken it one step further, though, and we actually kill[6] the session's dedicated server process. We then use ALTER SYSTEM KILL SESSION if the session still exists with a status of KILLED.

[6] On Unix, we kill the Oracle thread using kill -9. On Win32 we use orakill.

So why the literal overkill? On numerous occasions over the years — and through several versions of Oracle — we've run into serious problems when killing sessions. Nearly every time, the standard method works just fine, but every once in a while it fails. The sessions may possess a status of KILLED, but they're never actually removed. When this happens, and the session is holding a vital table lock, it seems that nothing less than a plasma cannon[7] is sufficient to remove these poltergeist sessions (or a database bounce, but we'd rather avoid going there). We've seen this happen on every version of Oracle from 7.0.16 through to 8.1.6 and on both Unix and Win32. It's sporadic enough that we've never learned how to reproduce it, but regular enough so we've learned how to cope with it. We cope by killing the dedicated server process on Unix (or the thread on Win32). We've never experienced a hanging session using this method. Remember, though, that this method is inappropriate for Multi-Threaded Servers (MTS), where you would end up disconnecting a great many other sessions by killing the MTS dispatcher.

[7] For definitive information on plasma cannons, the following web site may be helpful: http://www.schlockmercenary.com/d/20000829.html

Enough rationale. Now it's time to set up kss.pl and run it. Because this script runs as a daemon on Unix and a service on Win32, we'll show you how to set it up for both.

10.3.2 Installing kss.pl on Unix

Installing this script on Unix systems is easy. Simply make sure the kss.conf file is in the right place. If it is still uninstalled, copy it in like this from the toolkit installation directory:

$ cp /u01/build/PDBA-1.00/routine_tasks/kss.conf $PDBA_HOME

That's it — we're done! Running the kss.pl daemon is equally simple. Here's the command we used on our Linux server; Table 10-6 summarizes the command-line options:

$ kss.pl -machine sherlock -database ts01 -username system

Without the password server (see Chapter 9), add the password argument:

$ kss.pl -machine sherlock -database ts01 -username system \
     -password manager

Table 10-6. Command-line options — kss.pl

Option

Description

-machine

Server where the target database resides

-database

Target database

-username

DBA account

-password

DBA password (optional if password server in use)

The kss.pl script must run as the owner of the Oracle processes. This is necessary to enable the use of the kill command on dedicated Oracle server processes.

10.3.3 Installing kss_NT.pl on Win32

Follow these steps on Win32 systems:

  1. Make sure that the kss.conf configuration file is installed in PDBA_HOME, as with the other configuration files described previously in this chapter.

  2. Install kss_NT.pl as a Win32 service using kss_service.pl.

  3. If you are using the password server (see Chapter 9), install kss_NT.pl like this:

    DOS> C:\Perl\site\lib\PDBA\util\kss_service.pl \
            -machine database_server -database database_name \
            -username DBA account
  4. Without the password server, you need to include the relevant password:

    DOS> C:\Perl\site\lib\PDBA\util\kss_service.pl \
            -machine database_server -database database_name \
            -username DBA_account -password DBA_password
  5. We used the following to install kss_NT.pl on our Windows 2000 server:

    DOS> C:\Perl\site\lib\PDBA\util\kss_service.pl \
            -machine mycroft -database ts20 -username system
  6. To start the server you need to navigate to the Services administration application. The specifics will vary, depending on your Win32 platform:

    • Windows 2000: Click through Start figs/U2192.gif Settings figs/U2192.gif Control Panel, double-click on Administrative Tools, and double-click on Services. Scroll down to the Oracle_SID_kss_monitor service and highlight it. Click on Action figs/U2192.gif Start from the menu to start the service.

    • Windows NT: Click through Start figs/U2192.gif Setting figs/U2192.gif Control Panel, double-click on Services. Scroll down to Oracle_SID_kss_monitor, highlight it with the mouse, and click the Start button.

    The service appears on Windows 2000, as shown in Figure 10-2.

Figure 10-2. The kss service on Windows 2000
figs/pdba_1002.gif

10.3.4 Configuring kss.pl

The configuration file for the kss.pl script requires little editing. The contents of kss.conf are shown in Example 10-14. Note the following about this example:

  • The only parameter that should be edited is sleepTime. The default is 180, which is the number of seconds between each snipe check. Avoid setting it too low; if you do, the monitor will consume unnecessary resources, just as snipes do.

  • The remaining parameters (killSql, snipeSql and killCmd) should stay as they are for the foreseeable future. They work fine for both Unix and Win32. (The killCmd parameter uses PDBA->osname to determine whether the Unix kill or Win32 orakill commands should be used. The others will only need changing if Oracle itself changes significantly.)

Example 10-14. kss.conf
package kss;
  
use PDBA;
use File::Spec;
use vars qw(%config);
  
%config = (
   sleepTime => 180,
   killSql => q(ALTER SYSTEM KILL SESSION '<<SID>>,<<SERIAL>>'),
   snipeSql => Q( SELECT S.USERNAME USERNAME, S.SID SID, S.STATUS STATUS,
                         S.SERIAL# SERIAL, P.SPID SPID
                    FROM V$SESSION S, V$PROCESS P
                   WHERE S.USERNAME IS NOT NULL
                     AND P.ADDR(+) = S.PADDR
                     AND S.STATUS = 'SNIPED'
                   ORDER BY USERNAME, SID 
   ),
   killCmd => PDBA->osname(  ) eq 'unix' 
      ? q(/bin/kill -9 <<PID>> >/dev/null 2>&1)
      : File::Spec->catfile( PDBA->oracleHome, 'bin','orakill') 
        . q{ <<ORACLE_SID>> <<PID>> },
);
1;

After starting the Oracle_ts20_kss_monitor service, on Win32, we created an IDLE_LIMIT profile with a one-minute threshold for IDLE_TIME. The profile was then assigned to scott. Example 10-15 shows the action taken by kss_NT.pl after scott's session timed out. At marker 20020217184600 the thread for scott's session is first killed with Oracle's orakill utility, then terminated with ALTER SYSTEM KILL SESSION.

Example 10-15. scott session cleaned up by kss_NT.pl
20020217184355:attempting to load Win32::Daemon
20020217184356:password retrieved for user system
20020217184356:Service running
20020217184359:SCANNING
20020217184359:SLEEP: 30
20020217184429:Service running
20020217184430:SCANNING
20020217184430:SLEEP: 30
20020217184500:Service running
20020217184500:SCANNING
20020217184500:SLEEP: 30
20020217184530:Service running
20020217184530:SCANNING
20020217184530:SLEEP: 30
20020217184600:Service running
20020217184600:SCANNING
20020217184600:STATUS:SCOTT:8:9:1384
20020217184600:OSKILL:SCOTT:8:9:1384:D:\oracle\ora81\bin\orakill ts20 1384 
20020217184600:DBKILL:SCOTT:8:9:1384:alter system kill session '8,9'
20020217184600:SLEEP: 30
20020217184630:Service running
20020217184630:SCANNING
20020217184630:SLEEP: 30
20020217184700:Service running
    Team LiB   Previous Section   Next Section