> Hmm. Maybe adjust the code to panic the machine when this
> situation occurs, then see if you can get a kernel dump out
> of it.
Looks like I'll be doing that next. Any help available from anyone in
looking at that? I'm not big into reading kernel dumps :-)
> As to the load issue... that sounds like a classic priority
> inversion problem. Check the 'nice' of all the processes in
> the system and see if some nice'd-down processes are hogging
> the cpu. 'ps axlww' in a big window.
Hmmm. I did just notice something. I run setiathome everywhere using a
little daemon that punts it down to idprio etc. I just tried to kill them
and they didn't, and I looked again and it's because they're running at
0.0%, so then I idprio -t -<pid>'d them, and when I did that to the first
one, my login session froze for the better part of a minute. It remained
pingable but apparently unresponsive. Then it recovered. The second one
went as expected.
> Also look at the user cpu verses system cpu percentage to see
> where the cpu is going.
Here's top, any hints? (note: the names have been changed to protect the
innocent)
last pid: 3145; load averages: 13.60, 13.97, 14.05 up 18+14:27:19 13:26:35
63 processes: 15 running, 47 sleeping, 1 stopped
CPU states: 4.5% user, 0.0% nice, 94.8% system, 0.6% interrupt, 0.0% idle
Mem: 142M Active, 656M Inact, 145M Wired, 47M Cache, 112M Buf, 14M Free
Swap: 2048M Total, 56K Used, 2048M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND
78128 useruser 63 0 34696K 33896K RUN 0 83:26 31.30% 31.30% nit
78596 useruser 64 0 18716K 17896K RUN 0 79:59 31.10% 31.10% nit
78959 useruser 64 0 15872K 14728K RUN 0 79:30 29.93% 29.93% nit
57493 use 63 0 6412K 5804K RUN 1 601:36 13.43% 13.43% perl
99887 useruser 63 0 14200K 10420K CPU1 1 3:26 13.09% 13.09% perl
99918 use 64 0 1060K 656K RUN 1 2:26 11.33% 11.33% funny
2059 useruser 63 0 2220K 1424K RUN 1 0:59 11.18% 11.18% grep
507 use 63 0 1060K 656K RUN 1 1:47 9.52% 9.52% funny
1363 use 61 0 1060K 632K RUN 0 0:57 8.98% 8.98% funny
2555 use 63 0 1060K 632K RUN 1 0:34 8.30% 8.30% funny
3127 use 62 0 1060K 596K RUN 0 0:02 9.38% 6.10% funny
1028 root 2 0 964K 572K select 1 42:59 2.73% 2.73% syslogd
3104 use 2 0 1060K 576K sbwait 0 0:01 1.55% 1.22% funny
2945 root 35 0 1996K 1148K CPU0 1 0:02 0.93% 0.93% top
3106 use 2 0 1060K 656K sbwait 1 0:01 1.12% 0.88% funny
3145 use 2 0 1060K 596K sbwait 0 0:00 9.00% 0.44% funny
99230 nobody 37 52 16556K 16424K RUN 1 182.4H 0.00% 0.00% setiathome
21867 nobody 37 52 16556K 16428K RUN 0 171.6H 0.00% 0.00% setiathome
966 root 2 0 1648K 744K select 0 4:27 0.00% 0.00% ntpd
945 bind 2 0 3300K 2608K select 0 2:53 0.00% 0.00% named-dns
1047 root 10 0 1228K 848K nanslp 0 1:22 0.00% 0.00% mon
893 root 10 0 1004K 652K nanslp 0 1:19 0.00% 0.00% cron
57483 use 2 0 896K 400K sbwait 0 1:09 0.00% 0.00% wont
895 root 2 0 2224K 1172K select 0 0:50 0.00% 0.00% sshd
57488 use 2 0 4600K 4096K sbwait 1 0:48 0.00% 0.00% perl
57496 use 2 0 896K 504K accept 1 0:42 0.00% 0.00% mrdata
5828 root 2 0 2308K 1688K select 1 0:24 0.00% 0.00% sshd
950 nobody 2 0 928K 380K select 0 0:10 0.00% 0.00% identd
887 daemon 2 0 904K 540K sbwait 0 0:07 0.00% 0.00% rwhod
72401 userus 3 0 2628K 2244K ttyin 1 0:05 0.00% 0.00% zsh
25432 root 2 0 2348K 1728K select 0 0:04 0.00% 0.00% sshd
72398 root 2 0 2308K 1412K select 0 0:03 0.00% 0.00% sshd
2014 userxx 3 0 2792K 2304K ttyin 0 0:03 0.00% 0.00% lynx
1676 root 2 0 2316K 1636K select 0 0:03 0.00% 0.00% sshd
25551 useruser 3 0 1484K 1068K ttyin 0 0:02 0.00% 0.00% tcsh
2778 root 36 0 1996K 1144K STOP 0 0:02 0.00% 0.00% top
1206 root 28 0 2308K 1632K RUN 0 0:02 0.00% 0.00% sshd
98311 useruser 10 0 640K 280K wait 0 0:01 0.00% 0.00% sh
2777 root 2 -20 1992K 1152K select 1 0:01 0.00% 0.00% top
1248 root 18 0 1384K 992K pause 1 0:00 0.00% 0.00% tcsh
1679 userxx 18 0 2400K 2064K pause 1 0:00 0.00% 0.00% zsh
1207 jgreco 18 0 1380K 992K pause 1 0:00 0.00% 0.00% tcsh
5904 useruser 3 0 1452K 1040K ttyin 1 0:00 0.00% 0.00% tcsh
1008 root 2 0 3320K 2156K select 1 0:00 0.00% 0.00% snmpd
99852 useruser 10 0 1028K 600K wait 0 0:00 0.00% 0.00% bash
98623 mailnull -6 0 2524K 1780K piperd 0 0:00 0.00% 0.00% sendmail
998 nobody 10 52 896K 492K wait 1 0:00 0.00% 0.00% setidaemon
2768 root 10 0 628K 268K wait 1 0:00 0.00% 0.00% sh
98280 useruser 10 0 628K 268K wait 1 0:00 0.00% 0.00% sh
2762 root 10 0 636K 276K wait 1 0:00 0.00% 0.00% sh
98293 useruser 10 0 640K 280K wait 0 0:00 0.00% 0.00% sh
99850 useruser 10 0 628K 268K wait 1 0:00 0.00% 0.00% sh
1036 root 3 0 948K 456K ttyin 1 0:00 0.00% 0.00% getty
1038 root 3 0 948K 456K ttyin 1 0:00 0.00% 0.00% getty
1039 root 10 0 636K 232K wait 0 0:00 0.00% 0.00% sh
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Markenitg Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message