To kill a mocki..., er, errant PROCESS...
Let's say for instance that you have a login id of "goofus".
You've been working merrily along in poorly written program, or your favorite design tool,
and find yourself wedged uncomfortably between a funny looking screen
and the back of your chair.
Panic sets in. Then frustration. Then you reach for the phone to call
your trusty and faithful sysadmins... BUT WAIT, no need to track down the
dynamic duo... YES, YOU CAN DO IT YOURSELF!!! And you can even do it
without hurting yourself or your workstation. (However, you may lose
any unsaved work, but you will lose it if you call the admins too... :-(
step one: IDENTIFY the PROBLEM and look for the culprit PID
In the following example we will first look to see who is logged into
the machine, and for how long, and what processes they are running.
You'll also find out what a PID is.
-----------------------------------------------------
using the "w" command
goofus@smart99 [/home/goofus]
>--> w
12:26pm up 139 days, 3:24, 5 users, load average: 2.08, 2.00, 1.72
User tty login@ idle JCPU PCPU what
goofus console 8:59am195:20 1 1 /usr/sbin/getty console console
goofus pts/0 8:59am 29 -csh
goofus pts/1 8:59am 75:27 /bin/csh
goofus ttyp1 9:32am671:30 77:15 77:15 /applications/Kludge-soft/SuchADisaster/bin/Such
goofus ttyp2 12:26pm w
|
-----------------------------------------------------
The "w" command indicates that this workstation has been up and running
for 139 days, don't try this with NT though, it won't happen.
Looking at the user activity, we see we're logged into the console (at the
machine vs. logged in over the network), that we have a C shell open,
and an application running called Such Designer. We can also see that
we just ran the "w" command.
The load average being over 1.xx indicates that something is taking up CPU
time and loading the machine down more than expected. This might indicate
that there is a defunct process on the system, sometimes they become
"zombies", something that won't die. (If the kill -9 command doesn't
kill it, and it remains after you log out and log back in, it's a "zombie"
and the machine will have to be rebooted.) It could also mean that there
is a CPU intensive application running at this time. We will need to
look at the specific processes to determine more.
Typically an application that is poorly written or unusual user actions may result
in a process "losing it's marbles" and going off on it's own. In most cases
UNIX will clean up this errant process and continue, but sometimes it requires
the user to log completely out and back in. If that doesn't terminate the process
admin intervention in the form of a reboot is required.
-----------------------------------------------------
using the "ps" command
goofus@smart99 [/home/goofus]
>--> ps -ef | grep goofus
goofus 1753 1678 0 Jun 22 ? 0:03 /usr/dt/bin/dtsession
goofus 1796 1753 0 Jun 22 ? 0:00 /usr/dt/bin/dtterm -session dta07960
goofus 1795 1753 0 Jun 22 ? 0:02 /usr/dt/bin/dtterm -session dta07959 -C -ls -name Console -t
goofus 1785 1 0 Jun 22 ? 0:00 /usr/dt/bin/ttsession -s
goofus 2372 1 0 Jun 24 ? 0:00 /usr/bin/X11/hpterm -sb -sl 500 -iconic -title >SuchADisaste
goofus 2414 2373 1 Jun 24 ? 0:00
goofus 2603 1753 0 12:08:14 ? 0:00 /usr/dt/bin/dtexec -open 0 -ttprocid 1.tRxBV 01 1785 1342177
goofus 1799 1796 0 Jun 22 pts/1 0:00 /bin/csh
goofus 2373 2372 206 Jun 24 ttyp1 77:21 /applications/Kludge-soft/SuchADisaster/bin/Such
goofus 1800 1795 0 Jun 22 pts/0 0:00 -csh
goofus 2201 1 0 Jun 23 ? 0:00 /bin/sh -c hlpsrv
goofus 2413 2410 0 Jun 24 ? 0:00 Licensex 2373 smart99:0.0 62914574
goofus 2202 2201 0 Jun 23 ? 0:00 hlpsrv
goofus 2604 2603 47 12:08:14 ? 1:35 /usr/dt/bin/dtscreen -mode rotor
goofus 2410 2373 0 Jun 24 ? 0:00 /bin/sh -c Licensex 2373 smart99:0.0 62914574
goofus 2632 2608 0 12:26:37 ttyp2 0:00 grep goofus
|
-----------------------------------------------------
TO READ the output from ps -ef:
UID PID PPID other stuff... COMMAND
-------------------------------------------------------
goofus 2372 1 0 Jun 24 ? 0:00 /usr/bin/X11/hpterm -sb -sl 500 -iconic -title >SuchADisaste
goofus 2414 2373 1 Jun 24 ? 0:00
goofus 2603 1753 0 12:08:14 ? 0:00 /usr/dt/bin/dtexec -open 0 -ttprocid 1.tRxBV 01 1785 1342177
goofus 1799 1796 0 Jun 22 pts/1 0:00 /bin/csh
goofus 2373 2372 206 Jun 24 ttyp1 77:21 /applications/Kludge-soft/SuchADisaster/bin/Such
|
Look at the COMMAND column and see if you can spot the offending process or a defunct process.
Then look for the PID.
note:
PID = Process ID
PPID = Parent Process ID
UID = User ID
|
In the example above PID 2414 is the defunct process, it's PPID is 2373. 2373 is SuchADisaster.
Therefore, if you're workstation is "wedged" and SuchADisaster is frozen, it is the offending
process. You will need to kill it.
step two: surgically extract and terminate the offending PID
It should only be necessary to kill the master process for SuchADisaster, in this case PID 2372.
>--> kill 2372
Verify that it has been removed from the process table by repeating the ps command used earlier.
If you are using a POSIX shell (bourne, korn or posix shell) you only need to hit the escape
key and the letter "k" to scroll through your history file until you see the "ps -ef | grep goofus"
command, then hit return. If you're using C shell, shame on you, you have to retype the
command all over, or use your secret decoder ring to extract the appropriate numbered history command.
>--> ps -ef | grep goofus
At this point PID's 2372, 2414 and 2373 should not be displayed. If they are, you can repeat
the kill command using the "dash nine" option to force the kill.
>--> kill -9 2372 2414 2373
Repeat the ps command. If the PID's remain, try logging out and logging back in. If the
PID's remain, then contact your sys admins. They may attempt to kill the PID's as root, otherwise
they will need to reboot the workstation.
|