Cisco ASA HA member swap

The old Intel C2000 bug is still among us. Not all devices have been affected, yet. For me, this is the first. So here is my 2 cent on how to remediate the cluster with an RMA unit.

https://www.cisco.com/c/en/us/support/docs/field-notices/642/fn64228.html

Process:

  1. Label all cables from the faulty unit, this is for you to not worry that any cables are mislocated when you are swapping the unit. I also disabled the links for the switch so that I with a calm can plug them in again without fearing that something will go wrong. Better safe than sorry.
  2. Make sure the firmware level on the new device is the same as the existing member. Firmware up/downgrade is needed.
  3. Backup the config of the HA member still running. Just in case something bad happens.
  4. Give an IP on the HA link interface, from this point it should find the existing member and start replication the config.

Firmware upgrade:

I have booted the unit and linked it only with a console cable. I can now see the firmware version and it needs an upgrade.

I have prepared a USB disk format in FAT and downloaded the matching firmware from cisco.com. When you plug in the USB key to the ASA you should now see a disk1 where you can copy from. If you want to see disk1 content issue command “show disk1”.

We now copy over the files and make the system boot the new firmware.

After reload, the system is now up and I can confirm that it has booted on the new firmware.

Joining the HA cluster

We now verified that the two ASA firewalls are on the correct firmware level. Now connect all the cables to the firewall, on the switch side all data links are administratively down, the HA link between the two ASA is a dedicated link. And those are the links we are now going to configure.

You can grab the lines from the existing member that are actively running. If you don’t have the failover key, you can also reset this on on the primary/existing member.

The new ASA is now ready to contact the primary member of the cluster and start the replication. In my case, the interfaces for HA were administratively down. So we are now going to enable the link and enable failover.

If something in the config is not ok, missing files or other is listed and you have to remediate this before you again can try to enable HA with the “failover” command. In the output beneath you can see what happens when the failover command is enabled with success.

You can now check failover status and see if the standby member is in ready mode. if not try giving the standby a reload.

If the standby member is now in a ready state you are now ready to do a live failover. Remember to enable the ports again on the switch side.

Conclusion

The process is not so bad as I thought. And there where no downtime involved. For me, I was missing ASDM and AnyConnect packages on the new standby node. I downloaded it from the existing primary node and then copied it to a USB disk. When the USB disk is plugged into the standby ASA I can then copy the files over the ASA flash.

From there on I could do a live failover and see the little “Active” light change on the physical ASA firewalls. With all traffic flowing uninterrupted. Mission accomplished.

Cisco ASA cluster – upgrade

Cisco seems to have a good track record of there products, but I must say that there ASA firewalls have seen a lot of critical bugs in the last couple of years. Both in hardware and software…

The last critical bug I was not informed about, so didn’t catch it before the customer did. Always nice when a customer calls in with the problem of there primary ASA being down. It crashed in a way that meant that it did not come up again. It needed a physical reboot.

Before having the chance to have someone onsite locate the firewall and reboot it that secondary also died. And did not come up again! Customer needs to get online again, so there was no time to get a console cable and see what the heck was going on. So I told them to do a hard reboot on both firewalls. After the ASA booted they both became active again and could see each other. Great, customer online. But why and how.

Contact with Conscia Cisco support could confirm that the exact issue has been hitting multiple customers. Due to a bug, the firmware did a memory buffer overflow when being hit by a specific udp/500 attack. Great, now we know the problem and the fix is to upgrade ASA firmware.

It’s not something I do often, and I always forget to write down to procedure, so here goes.

Upgrade procedure

  1. Have a look at the cisco ASA upgrade guide, to see what version you and on and what is supported to go up to. I were on 9.8.2 and could go up to 9.13.x. So I did. https://www.cisco.com/c/en/us/td/docs/security/asa/upgrade/asa-upgrade/planning.html#ID-2152-0000000a
  2. Download and upload firmware to BOTH members of the cluster
  3. Change the boot image to the newly uploaded image
  4. Update the secondary, make a failover
  5. Update the primary and make a failover
  6. Done

Uploading the images to both nodes with TFTP.

I used the portable version of Tftpd64 by Jounin, simple and works out of the box. Copied the freshly downloaded images to both nodes.

### Primary
DS-ESB-ASA5516x# copy /noconfirm tftp://10.0.2.14/asa9-13-1-lfbff-k8.SPA disk0:/asa9-13-1-lfbff-k8.SPA
DS-ESB-ASA5516x# copy /noconfirm tftp://10.0.2.14/asdm-7131.bin disk0:/asdm-7131.bin

### Secondary
DS-ESB-ASA5516x# failover exec mate copy /noconfirm tftp://10.0.2.14/asa9-13-1$
DS-ESB-ASA5516x# failover exec mate copy /noconfirm tftp://10.0.2.14/asdm-7131$

Change config to the new image

So now we will change over the config so that it will use the new boot images that we have uploaded. First, we remove the existing boot image, and afterwards, we set the new image together with the new ASDM image.

### Show current boot image
DS-ESB-ASA5516x# show running-config boot system
boot system disk0:/gf/asa982-20-lfbff-k8.SPA

### Remove existing boot image
DS-ESB-ASA5516x(config)# no boot system disk0:/gf/asa982-20-lfbff-k8.SPA

### Add new boot image that you just uploaded
DS-ESB-ASA5516x(config)# boot system disk0:/asa9-13-1-lfbff-k8.SPA

### Reload the standby node for the new firmware to take effect
DS-ESB-ASA5516x(config)# failover reload-standby

### Look at the output from show failover, check if the standby is up and verify the firmware version.

Failover and reload the second node

So now the secondary node is booted with the new firmware, time to failover to it so we can reload and have the new firmware running on the primary node. When doing the failover you might lose the SSH connection, just connect again. This time you will be connected to the second node, that is not the active node. Reload the primary, that is now standby and wait for it up come up. It will show in the console that its sending config to mate. Just like when we did it with the first reload of the standby, secondary node.

### Controlled failover to secoundary, standby node
DS-ESB-ASA5516x# no failover active
### reload the primary, standby node for firmware to take effect.
DS-ESB-ASA5516x# failover reload-standby

You are done

Now it’s only to test, and if you want, failback to the primary. But that up to you. I did not lose one ping through the upgrade process. So that cluster is indeed working as it should. While you are at it then why not also update the AnyConnect client and remember to clean up the flash so the old versions and file won’t fill it up. Enjoy your newly updated cluster.