Install and use MegaCLI on VMware host

Over the last decade, I had the fun of how having to manage an LSI based RAID controller. Never on Windows machines, where the GUI-based Storage Manager tools are simple to work with.

Even though I usually find the vib and get it installed I always struggle to remember how it’s installed and what the commands are. This time I will write it down for the future me, or you?

Procedure

  • Find the MegaCLI vib file and download it…
  • Copy vib to ESXi host
  • Install vib
  • Use MegaCLI for whatever purpose you got

Finding the vib

This is where I struggle the most. LSI was bought by Avago and soon after Avago was bought by Broadcom. So the support links for the downloads have been 404 and using Broadcom’s support site is an education degree that I do not own. This time the link was this, giving you a zip file containing the MegaCLI package for all platforms.

If the link does not work for next time, or maybe a newer version is out. I also managed to find it on https://www.broadcom.com/support/download-search. Make a keyword search for MegaCLI, expand the “management software and tools” from the results and choose the newest “MegaCLI x.x Px” For now it’s MegaCLI 5.5 P1 version 8.07.07.

Install MegaCLI

We now got the zip, extract it and under the “VmwareMN” folder there is the vib that we are gonna be needing.

### SCP it to the host
jr@mbp:~ jr$ scp /Users/jr/Download/8-07-07_MegaCLI/VmwareMN/vmware-esx-MegaCLI-8-07-07.vib root@[ESXHOST]:/tmp/

### SSH to the ESXi host and install. Reboot afterwards
[root@esxhost:~] esxcli software vib install -v /tmp/vmware-esx-MegaCLI-8-07-07.vib

If you are lucky and get a “Could not find a trusted signer” when trying to install the vib the workaround is to add “–no-sig-check” at the end of the esxcli command, after the file path. Since I downloaded it from Broadcom’s own site, I trust it.

After the host reboot(which is very annoying, but necessary). We can not find MegaCLI binary under /opt/lsi/MegaCLI/

Useful MegaCLI commands

### Enclosure information
 /opt/lsi/MegaCLI/MegaCli -EncInfo -aALL

### Virtual drive information
/opt/lsi/MegaCLI/MegaCli -LDInfo -Lall -aALL

### Physical drive information
/opt/lsi/MegaCLI/MegaCli -PDList -aALL

### Silence active alarm
/opt/lsi/MegaCLI/MegaCli -AdpSetProp AlarmSilence -aALL

### Disable alarm
/opt/lsi/MegaCLI/MegaCli -AdpSetProp AlarmDsbl -aALL

### Enable alarm
/opt/lsi/MegaCLI/MegaCli -AdpSetProp AlarmEnbl -aALL

### Prepare for removal
/opt/lsi/MegaCLI/MegaCli -PdPrpRmv -PhysDrv [E:S] -aN

### Unconfigured Bad to good
/opt/lsi/MegaCLI/MegaCli -PDMakeGood -PhysDrv[E:S] -aN

I found a guy that did a bit more advanced MegaCLI scripting, its bit old but still very useful. You can find the site here. I have done some copy-pasting from the script, but all credit goes to the guy behind the link.

### List disk status
/opt/lsi/MegaCLI/MegaCli -PDlist -aALL -NoLog | egrep 'Slot|state' | awk '/Slot/{if (x)print x;x="";}{x=(!x)?$0:x" -"$0;}END{print x;}' | sed 's/Firmware state
://g'

Conclusion

CLI is awesome, so many possibilities and so flexible. In my opinion its a bit hard to find, but after you got it installed its easy. I have tested this on ESXi6.7 and it world as it should. I hope you can use some of it.

vCloud – Changing SSL certificate

In this post, I will explain how to install a public certificate into vCloud Director cell(s). This exact environment has a public signed cert that is up for renewal. A new certificate has been bought and signed and is ready to import.

vcd cells have 2 IP addresses that allow support for 2 different SSL endpoints (http and consoleproxy). Each endpoint requires its own SSL certificate. vCloud Director uses a java keystore to read its SSL certificates from.  In a multi-cell environment, you need to create 2 certificates for each cell and import the certificates into the vcd java keystore. But since we hare here using a wildcard certificate the same certificate will be used to but endpoints.

The new certificate have been created with a CSR that was not generated from the vCloud cells, so we need to import both private and public key from an export of the certificate. In this case it’s a .PFX.

Certificate is a wildcard. If you are using a UCC SAN certificate with the exact names then be sure that the names in certificate are matching accordingly to vCloud settings.

I assume you got

  • Already working/configured vCloud environment
  • New public signed certificate exported to a .PFX format (contains both public and private key)

We will

  • Connect to cell with winscp and transfer the .PFX to /tmp/
  • Connect to cell with Putty
    • Create a new keystore with the new certificate
    • Stop vcd service
    • Swap old keystore with new
    • Start vcd service

Initialize certificate change…

winscp copy the .pfx to the cell tmp directory

The commands for creating the new keystore and importing the cert is below. Change the STOREPASS and KEYPASS to something meaningful for your environment. It is also important to notice that the alias of each certificate must be “http” and “consoleproxy”. Else vcd won’t find the certs.

A note about the alias, I have seen it generate GUID but also just numbers. So if your list command is showing “1” then you need to change alias 1 to respectively http or consoleproxy.

### Stop vCloud Director service
service vmware-vcd stop

### make passwords variable in unix
STOREPASS= <pass>
KEYPASS= <pass>

### Add the certificate to a new created certificates.ks keystore.
/opt/vmware/vcloud-director/jre/bin/keytool \
-keystore /tmp/certificates.ks \
-storepass STOREPASS \
-keypass KEYPASS \
-storetype JCEKS \
-importkeystore \
-srckeystore /tmp/wildcard2020.pfx

### List certificate alias
/opt/vmware/vcloud-director/jre/bin/keytool \
-storetype JCEKS \
-storepass STOREPASS \
-keystore /tmp/certificates.ks \
-list | grep -i alias

### Rename certificate random alias to http
/opt/vmware/vcloud-director/jre/bin/keytool \
-storetype JCEKS \
-changealias \-alias "te-d487d1c7-2c76-482a-8e61-69107ee3027f" \
-destalias http -keystore /tmp/certificates.ks

### Add the Remote Console Proxy certificate to a new created certificates.ks keystore.
/opt/vmware/vcloud-director/jre/bin/keytool \
-keystore /tmp/certificates.ks \
-storepass STOREPASS \
-keypass KEYPASS \
-storetype JCEKS \
-importkeystore \
-srckeystore /tmp/wildcard2020.pfx

### List certificate alias
/opt/vmware/vcloud-director/jre/bin/keytool \
-storetype JCEKS \
-storepass STOREPASS \
-keystore /tmp/certificates.ks -list | grep -i alias

### Rename certificate random alias to consoleproxy
/opt/vmware/vcloud-director/jre/bin/keytool \
-storetype JCEKS \
-changealias \
-alias "te-d487d1c7-2c76-482a-8e61-69107ee3027f" \
-destalias consoleproxy \
-keystore certificates.ks

### Make a backup of the existing keystore 
cp /opt/vmware/vcloud-director/certificates.ks /opt/vmware/vcloud-director/certificates.ks_old

### Copy the new keystore file to the vCloud Director environment
cp /tmp/certificates.ks /opt/vmware/vcloud-director/certificates.ks

### Set correct permissions to the keystore file
chown vcloud:vcloud /opt/vmware/vcloud-director/certificates.ks
chmod 600 /opt/vmware/vcloud-director/certificates.ks

### Make cells generate proxy console certs based on new keystore.
cd /opt/vmware/vcloud-director/bin
./cell-management-tool certificates -p -k /opt/vmware/vcloud-director/certificates.ks -w STOREPASS

### Start vCloud Director service
service vmware-vcd start

To see if the cell have booted correctly you can tail the cell log. It will give you a “startup completed in x”.

 tail -f /opt/vmware/vcloud-director/logs/cell.log

I got more than one cell…

That’s awesome – me too. You can scp the certificate from the cell with the new cert to the other cells. So let’s get that newly created keystore over the other cells.

### SSH to next cell and stop the vCloud Director service
service vmware-vcd stop

### Go to keystore path
cd /opt/vmware/vcloud-director/

### Move the existing to new filename with .old surfix.
mv certificate.ks certificate.ks.old

### Copy the new certificate into place
scp root@dc1svcdcell01:/opt/vmware/vcloud-director/certificates.ks .

### Make cells generate proxy console certs based on new keystore.
./cell-management-tool certificates -p -k /opt/vmware/vcloud-director/certificates.ks -w KEYSTORE_PASSWD

### Start vCloud Director service
service vmware-vcd start

I hope you enjoyed reading this post. Feel free to share this on social media if it is worth sharing.

Ceph MDS stuck in ‘rejoin’

CephFS filesystem suddenly dies, what do you do? Well, It’s relaying on the MDS(MetaDataService) to keep an online filesystem. When looking at the Ceph status it gives us that the MDS cache is oversized and files system is degraded. This is only health warning, but the filesystem is not available due to it, that’s good in a way because then there is nothing wrong with the Ceph cluster itself. Looking more into the problem, MDS seems to be in a recovery limbo state. hmm…

MDS services are in “rejoin” limbo state. Never coming back up.

Looking at the MDS status documentation the rejoin state indicated that it’s trying to load the old cache back in before going into “up” state. https://docs.ceph.com/docs/master/cephfs/mds-states/.

Closer looking at the logs the process is stating over and over again, but never finishes. This is due to a mechanism from the monitors kicking in and restarting the MDS service when not responding in to cluster with up in a timely fashion. So it never gets done with what’s its doing. Hmm, and CephFS is still unavailable.

Trying to set mds_beacon_grace to a wicked number did also not help, don’t know if the grace should be doing anything. But going from the default 15 to 1500 did not help. I was hoping to give the MDS time to load in the old cache.

From the logs its respawning the MDS due to lost contact to the cluster.

Going through an endless number of Ceph thread I were reading that others have encountered this exact problem. This thread gave me the info on how to remediate and get back online with the cluster. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/028981.html

Setting the MDS to wipe all client session where the first one. I set this though the Ceph GUI since I find it easier to find compared to the CLI command.

Default its “false”, but setting it to global “true” no client connections are made.

The second one was to delete the MDS “mds*_openfiles.0” from the CephFS metadata pool. Looking into the pool I could the there where many objects referring to open files. But the post stated only to delete the .0 objects. Need to be done for all the MDS services that you have running. The “openfiles” objects are open file hints. It’s safe to delete them. Read more on rados commands on https://docs.ceph.com/docs/giant/man/8/rados/

### Delete for all the MDS you have running. mds1,mds2, mds3 etc.
[root@dspp-mon-a-01 cephadm]# rados -p cephfs_metadata rm mds0_openfiles.0

After deleting the open file objects I stopped all MDS services on all nodes. Some of them did not stop, so I killed the process. Probably should have stopped them first before deleting the open file objects…..

[root@dspp-osd-a-06 cephadm]# systemctl stop ceph-mds.target

After starting up the MDS services again it recovered in a couple of seconds. CephFS is available and “ceph -s” showing healthy condition. Set the wipe_sessions back to false and now CephFS could be mounted again.

What to conclude? there is a fix in 14.2.5. So when it’s ready its is time to update the Ceph cluster. The ticket for it should be this one. https://tracker.ceph.com/issues/41467. These guys also did a good help to resolve the problem https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AOYWQSONTFROPB4DXVYADWW7V25C3G6Z/.