作者DRLai (蘇打)
看板Linux
標題[問題] 請協助解析一下ECC 記憶體的錯誤
時間Tue Dec 2 16:18:21 2008
最近Server在使用時出現類似下方的訊息
Message from syslogd@ at Sat Nov 29 20:19:28 2008 ...
luna kernel: EDAC MC0: UE row 7, channel-a= 2 channel-b= 3 labels "-":
(
Branch=1 DRAM-Bank=2 RDWR=Write RAS=8 CAS=0 FATAL Err=0x4)
Message from syslogd@ at Sat Nov 29 20:19:31 2008 ...
luna kernel: EDAC MC0: UE row 3, channel-a= 2 channel-b= 3 labels "-":
(
Branch=1 DRAM-Bank=1 RDWR=Read RAS=8180 CAS=0 FATAL Err=0x4)
Message from syslogd@ at Sat Nov 29 20:19:32 2008 ...
luna kernel: EDAC MC0: UE row 1, channel-a= 2 channel-b= 3 labels "-":
(
Branch=1 DRAM-Bank=3 RDWR=Read RAS=5916 CAS=0 FATAL Err=0x4)
Message from syslogd@ at Sat Nov 29 20:19:33 2008 ...
luna kernel: EDAC MC0: UE row 7, channel-a= 2 channel-b= 3 labels "-":
(
Branch=1 DRAM-Bank=2 RDWR=Write RAS=8 CAS=0 FATAL Err=0x4)
DRAM-Bank會顯示從0~3都有
主機板總共插了16條記憶體
排的順序如下
DIMM 4D
DIMM 4C
DIMM 4B
...
DIMM 1A
數字表示BANK(分1~4)
Branch分0,1 (BANK1,2 屬於Branch0, BANK3,4屬於Branch1)
我看了很久不知道錯誤訊息是否有對應到DIMM的位置
能請各位網友協助一下嗎?
基本上,錯誤訊息都是寫Branch=1,接著後面的DRAM-Bank會變
其他訊息大同小異
我找不到到底是錯哪條orz
--
▊ ◥ thePainter. ◤ ▎
▊ ◣◢
◣ ◤ ◣
◤ ▎
▊ ◥◤ ◣
◤ ◤ ▎ http://www.wretch.cc/blog/myelf
▊ ◥ ◢ ◤ ◤ ◤
▎ Wretch@BBS -> P_myelf
▊ ◢◤ thePainter. ◣ ▎ φthePainter.
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 140.138.145.197
→ psboy:比較笨的辦法直接用memtest 拔掉一半記憶體去測 沒問題就測 12/02 20:41
→ psboy:另外一半 有問題就將原本的那一半再拔一半去測 總會找到 xD 12/02 20:42