Cisco

The old Intel C2000 bug is still among us. Not all devices have been affected, yet. For me, this is the first. So here is my 2 cent on how to remediate the cluster with an RMA unit.

https://www.cisco.com/c/en/us/support/docs/field-notices/642/fn64228.html

Process:

Label all cables from the faulty unit, this is for you to not worry that any cables are mislocated when you are swapping the unit. I also disabled the links for the switch so that I with a calm can plug them in again without fearing that something will go wrong. Better safe than sorry.
Make sure the firmware level on the new device is the same as the existing member. Firmware up/downgrade is needed.
Backup the config of the HA member still running. Just in case something bad happens.
Give an IP on the HA link interface, from this point it should find the existing member and start replication the config.

Firmware upgrade:

I have booted the unit and linked it only with a console cable. I can now see the firmware version and it needs an upgrade.

ciscoasa> en
Password:
ciscoasa#
ciscoasa# sh version

Cisco Adaptive Security Appliance Software Version 9.8(2)20
Firepower Extensible Operating System Version 2.2(2.63)
Device Manager Version 7.5(1)

Compiled on Fri 02-Feb-18 06:10 PST by builders
System image file is "disk0:/asa982-20-lfbff-k8.SPA"
Config file at boot was "startup-config"

ciscoasa up 30 secs

Hardware:   ASA5516, 8192 MB RAM, CPU Atom C2000 series 2416 MHz, 1 CPU (8 cores                                            )
Internal ATA Compact Flash, 8000MB
BIOS Flash M25P64 @ 0xfed01000, 16384KB

I have prepared a USB disk format in FAT and downloaded the matching firmware from cisco.com. When you plug in the USB key to the ASA you should now see a disk1 where you can copy from. If you want to see disk1 content issue command “show disk1”.

We now copy over the files and make the system boot the new firmware.

ciscoasa# copy disk1:/asa9-13-1-lfbff-k8.SPA disk0:/asa9-13-1-lfbff-k8.SPA

Source filename [asa9-13-1-lfbff-k8.SPA]?

Destination filename [asa9-13-1-lfbff-k8.SPA]?

Copy in progress...CCCCC
Verifying file disk0:/asa9-13-1-lfbff-k8.SPA...
Computed Hash   SHA2: 80500c1790c76e90dde61488c3f977b8
                      69711278b6e550eeb8ea8830e19c4a23
                      8cf03fe64d1d9927d4a78e77b6090234
                      98485fbf9bc058eb3820b32e7a56f91f

Embedded Hash   SHA2: 80500c1790c76e90dde61488c3f977b8
                      69711278b6e550eeb8ea8830e19c4a23
                      8cf03fe64d1d9927d4a78e77b6090234
                      98485fbf9bc058eb3820b32e7a56f91f


Digital signature successfully validated

Writing file disk0:/asa9-13-1-lfbff-k8.SPA...

107543456 bytes copied in 26.40 secs (4136286 bytes/sec)
ciscoasa# config t
ciscoasa(config)# boot system disk0:/asa9-13-1-lfbff-k8.SPA
ciscoasa(config)# wr mem
ciscoasa(config)# reload

After reload, the system is now up and I can confirm that it has booted on the new firmware.

ciscoasa> show version

Cisco Adaptive Security Appliance Software Version 9.13(1)
SSP Operating System Version 2.7(1.107)
Device Manager Version 7.5(1)

Compiled on Mon 23-Sep-19 09:28 PDT by builders
System image file is "disk0:/asa9-13-1-lfbff-k8.SPA"
Config file at boot was "startup-config"

ciscoasa up 29 secs

Hardware:   ASA5516, 8192 MB RAM, CPU Atom C2000 series 2416 MHz, 1 CPU (8 cores)
Internal ATA Compact Flash, 8000MB
BIOS Flash M25P64 @ 0xfed01000, 16384KB

Joining the HA cluster

We now verified that the two ASA firewalls are on the correct firmware level. Now connect all the cables to the firewall, on the switch side all data links are administratively down, the HA link between the two ASA is a dedicated link. And those are the links we are now going to configure.

You can grab the lines from the existing member that are actively running. If you don’t have the failover key, you can also reset this on on the primary/existing member.

failover lan unit secondary
failover lan interface HA_FAILOVERLINK GigabitEthernet1/7
failover key ***
failover link HA_STATELINK GigabitEthernet1/8
failover interface ip HA_FAILOVERLINK 172.16.254.1 255.255.255.0 standby 172.16.254.2
failover interface ip HA_STATELINK 172.16.255.1 255.255.255.0 standby 172.16.255.2

The new ASA is now ready to contact the primary member of the cluster and start the replication. In my case, the interfaces for HA were administratively down. So we are now going to enable the link and enable failover.

interface GigabitEthernet 1/7
no shut
interface GigabitEthernet 1/8
no shut
failover

If something in the config is not ok, missing files or other is listed and you have to remediate this before you again can try to enable HA with the “failover” command. In the output beneath you can see what happens when the failover command is enabled with success.

Detected an Active mate
Beginning configuration replication from mate.
End configuration replication from mate.

You can now check failover status and see if the standby member is in ready mode. if not try giving the standby a reload.

failover reload-standby

If the standby member is now in a ready state you are now ready to do a live failover. Remember to enable the ports again on the switch side.

Conclusion

The process is not so bad as I thought. And there where no downtime involved. For me, I was missing ASDM and AnyConnect packages on the new standby node. I downloaded it from the existing primary node and then copied it to a USB disk. When the USB disk is plugged into the standby ASA I can then copy the files over the ASA flash.

copy /noconfirm disk1:/anyconnect-win-4.8.01090-webdeploy-k9.pkg disk0:/anyconnect-win-4.8.01090-webdeploy-k9.pkg
copy /noconfirm disk1:/anyconnect-macos-4.8.01090-webdeploy-k9.pkg disk0:/anyconnect-macos-4.8.01090-webdeploy-k9.pkg
copy /noconfirm disk1:/VPN_client_profile.xml disk0:/VPN_client_profile.xml
copy /noconfirm disk1:/anyconnect-linux64-4.8.01090-webdeploy-k9.pkg disk0:/anyconnect-linux64-4.8.01090-webdeploy-k9.pkg
copy /noconfirm disk1:/Management_client_profile.xml disk0:/Management_client_profile.xml
copy /noconfirm disk1:/asdm-7131.bin disk0:/asdm-7131.bin

From there on I could do a live failover and see the little “Active” light change on the physical ASA firewalls. With all traffic flowing uninterrupted. Mission accomplished.

Cisco seems to have a good track record of there products, but I must say that there ASA firewalls have seen a lot of critical bugs in the last couple of years. Both in hardware and software…

The last critical bug I was not informed about, so didn’t catch it before the customer did. Always nice when a customer calls in with the problem of there primary ASA being down. It crashed in a way that meant that it did not come up again. It needed a physical reboot.

Before having the chance to have someone onsite locate the firewall and reboot it that secondary also died. And did not come up again! Customer needs to get online again, so there was no time to get a console cable and see what the heck was going on. So I told them to do a hard reboot on both firewalls. After the ASA booted they both became active again and could see each other. Great, customer online. But why and how.

Contact with Conscia Cisco support could confirm that the exact issue has been hitting multiple customers. Due to a bug, the firmware did a memory buffer overflow when being hit by a specific udp/500 attack. Great, now we know the problem and the fix is to upgrade ASA firmware.

It’s not something I do often, and I always forget to write down to procedure, so here goes.

Upgrade procedure

Have a look at the cisco ASA upgrade guide, to see what version you and on and what is supported to go up to. I were on 9.8.2 and could go up to 9.13.x. So I did. https://www.cisco.com/c/en/us/td/docs/security/asa/upgrade/asa-upgrade/planning.html#ID-2152-0000000a
Download and upload firmware to BOTH members of the cluster
Change the boot image to the newly uploaded image
Update the secondary, make a failover
Update the primary and make a failover
Done

Uploading the images to both nodes with TFTP.

I used the portable version of Tftpd64 by Jounin, simple and works out of the box. Copied the freshly downloaded images to both nodes.

### Primary
DS-ESB-ASA5516x# copy /noconfirm tftp://10.0.2.14/asa9-13-1-lfbff-k8.SPA disk0:/asa9-13-1-lfbff-k8.SPA
DS-ESB-ASA5516x# copy /noconfirm tftp://10.0.2.14/asdm-7131.bin disk0:/asdm-7131.bin

### Secondary
DS-ESB-ASA5516x# failover exec mate copy /noconfirm tftp://10.0.2.14/asa9-13-1$
DS-ESB-ASA5516x# failover exec mate copy /noconfirm tftp://10.0.2.14/asdm-7131$

Change config to the new image

So now we will change over the config so that it will use the new boot images that we have uploaded. First, we remove the existing boot image, and afterwards, we set the new image together with the new ASDM image.

### Show current boot image
DS-ESB-ASA5516x# show running-config boot system
boot system disk0:/gf/asa982-20-lfbff-k8.SPA

### Remove existing boot image
DS-ESB-ASA5516x(config)# no boot system disk0:/gf/asa982-20-lfbff-k8.SPA

### Add new boot image that you just uploaded
DS-ESB-ASA5516x(config)# boot system disk0:/asa9-13-1-lfbff-k8.SPA

### Reload the standby node for the new firmware to take effect
DS-ESB-ASA5516x(config)# failover reload-standby

### Look at the output from show failover, check if the standby is up and verify the firmware version.

Failover and reload the second node

So now the secondary node is booted with the new firmware, time to failover to it so we can reload and have the new firmware running on the primary node. When doing the failover you might lose the SSH connection, just connect again. This time you will be connected to the second node, that is not the active node. Reload the primary, that is now standby and wait for it up come up. It will show in the console that its sending config to mate. Just like when we did it with the first reload of the standby, secondary node.

### Controlled failover to secoundary, standby node
DS-ESB-ASA5516x# no failover active
### reload the primary, standby node for firmware to take effect.
DS-ESB-ASA5516x# failover reload-standby

You are done

Now it’s only to test, and if you want, failback to the primary. But that up to you. I did not lose one ping through the upgrade process. So that cluster is indeed working as it should. While you are at it then why not also update the AnyConnect client and remember to clean up the flash so the old versions and file won’t fill it up. Enjoy your newly updated cluster.

Cisco ASA HA member swap