Monday, August 16, 2010

IOS: %SYS-SP-3-CPUHOG: RFSS_server_action

Scenario:
A Cat6K throws the following syslog messages:
Jul 18 01:48:12.362 EDT: %SYS-SP-3-CPUHOG: Task is running for (4000)msecs, more than (2000)msecs (0/0),process = RFSS_server_action.

 
-Traceback= 4045D2CC 4045F5F8 4045F504 4047F45C 4047ED38 4047F31C 40481F5C 40489F04 4048A3CC 4048AF5C 40485DE4 4048B1AC 404816A8 402E41D8 40451534 4029A764

 
Jul 18 01:48:14.366 EDT: %SYS-SP-3-CPUHOG: Task is running for (2000)msecs, more than (2000)msecs (1/0),process = RFSS_server_action.

 
-Traceback= 4045D2A8 4045F5F8 4045F504 4047F45C 4047ED38 4047F31C 40481F5C 40489F04 4048A3CC 4048AF5C 40485DE4 4048B1AC 404816A8 402E41D8 40451534 4029A764

 
Jul 18 01:48:18.370 EDT: %SYS-SP-3-CPUHOG: Task is running for (2000)msecs, more than (2000)msecs (2/1),process = RFSS_server_action.

 
-Traceback= 4045D2A8 4045F5F8 4045F504 4047F45C 4047ED38 4047F31C 40481F5C 40489F04 4048A3CC 4048AF5C 40485DE4 4048B1AC 4048B504 404817C0 402E440C 40451660

Description:
The traceback shown indicates a problem with writing into the flash disk. Running the privileged-mode command "dir disk1:" will cause your login session to apparently hang for a few minutes. After that the logs will be filled up with a new batch of the above %SYS-SP-3-CPUHOG syslogs and traceback messages.
 
------------------ show disk1: all ------------------
172683264 bytes available (83296256 bytes used)
******** ATA Flash Card Geometry/Format Info ********
ATA CARD GEOMETRY
 Number of Heads: 16
 Number of Cylinders 978
 Sectors per Cylinder 32
 Sector Size 512
 Total Sectors 500736
 
ATA CARD FORMAT
 Number of FAT Sectors 245
 Sectors Per Cluster 8
 Number of Clusters 62495
 Number of Data Sectors 500596
 Base Root Sector 598
 Base FAT Sector 108
 Base Data Sector 630 %
 
Error show disk1: (TF I/O failed in data-in phase)
 
Workaround/Resolution:
  1. Reseat the compact flash card.
  2. If error still occurs, reformat the flash card.
  3. If error still occurs, replace the flash card.

Monday, July 5, 2010

FortiOS v3.00 MR5 - CPU Usage Too High

Problem:

Fortigate 3600 running version 3.00 MR5 Patch 2 keeps sending high CPU trap SNMP traps to the SNMP trap servers. CPU utilization is confirmed to be high, based from the output of “get system performance status” or from the GUI. From “diag sys top”, confirmed that the “merged_daemons” process is using 99% of the total CPU, then shortly goes down to 14%.


Cause:
This is due to bug documented below:

0062617: race condition in flgd can cause merged_daemons to spin
The merged_daemons was constantly in the 'R' state and consuming 99% of CPU (when top is first started, the usage will display as 99% -- the usage will decrease to 14% while top is running).

Fix: Build: 0566


Workaround:
Restart merged_daemons as follows:
  • Enter diag sys top and take note of the PID of merged_daemons
  • Enter diagnose sys kill 11 [pid]
Note that merged_daemons may still climb back up to 99%.


Resolution/Workaround:
Upgrade to FortiOS MR6 or later.

Monday, January 4, 2010

IOS: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt

Dec 18 09:54:43.989 JST: %EARL_L3_ASIC-SP-STDBY-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt
Dec 18 09:54:43.993 JST: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal interrupt Packet Parser block interrupt

Description
These messages are indicating that the switch has received an invalid packet which contained a Layer 3 IP checksum error. These packets are normally being dropped silently within older IOS. In some IOS releases, the switch informs of this condition to warn users that there is (are) devices outside sending IP packets with checksum errors and/or with wrong length.

See CSCdz10360 (Need a CLI to be able to disable L3 error checking in HW) regarding this enhancement.

Workaround
These messages are purely informational. You may either:


  1. SPAN all the Vlans and look at layer3 IP source address then remove the device generating invalid packets (unfortunately the switch doesn't track the IP address. The only way is to sniff every suspected Vlan to find out where those invalid packets are coming from).


  2. Configure (this is a new config option added by means of CSCdz10360):
    no mls verify ip checksum ---> to stop to check for packet checksum errors
    no mls verify ip length ---> to stop to check for packet length errors
    no mls verify ip length minimum ---> to eliminate check for IP packets that are minimum length.
    no mls verify ip same-address ---> to stop checking for packet having equal source and destination IP address.


  3. Do nothing as these are pure informational.

IOS: %ETHCNTR-3-LOOP_BACK_DETECTED : Keepalive packet loop-back detected on [chars]

Scenario
The switch reports this error message, and the port is forced to linkdown:
%ETHCNTR-3-LOOP_BACK_DETECTED : Keepalive packet loop-back detected on [chars]

Oct 2 10:40:13: %ETHCNTR-3-LOOP_BACK_DETECTED: Keepalive packet loop-back detected on GigabitEthernet0/1
Oct 2 10:40:13: %PM-4-ERR_DISABLE: loopback error detected on Gi0/1, putting Gi0/1 in err-disable state


Description
The problem occurs because the keepalive packet is looped back to the port that sent the keepalive. Keepalives are sent on the Catalyst switches in order to prevent loops in the network. Keepalives are enabled by default on all interfaces. You see this problem on the device that detects and breaks the loop, but not on the device that causes the loop.

Workaround
Issue the no keepalive interface command in order to disable keepalives. A disablement of the keepalive prevents errdisablement of the interface, but it does not remove the loop.

Permanent Fix
In Cisco IOS Software Release 12.2(x)SE-based releases and later, keepalives are not sent on fiber and uplink interfaces by default. Upgrading the IOS version to this or later images should prevent the above issue in the first place.