ramsgaard.me

StorSimple 8100 – reuse

Posted on June 1, 2020 by Jesper Ramsgaard

I bought a Microsoft StorSimple 8100 unit. The only catch, it did not contain its SSDs and the password for it was unknown. Fair enough.

Its quite an interesting unit, hardware-wise. Its a 2U with 2x750W PSU, 12×3,5 SAS bays, and 2x compute nodes. Each compute node is in fact a Xeratex CS-6000-AB containing:

1* E5-2648L 1,8Ghz 8 core 70Watt CPU
4* 8GB DDR3 memory
1* LSI/Avago/Broadcom 2308 SAS HBA (1*SFF8088 and 1*internal link)
1* Mellanonx ConnectX3 10/40/56Gbit dual QSFP
1* 128GB SSD for OS.

The two compute nodes share a Xeratex HB1235 enclosure with the 12 3,5″ drive bays. This enclosure is used for many other storage vendors as HPE 3PAR or Dell Compellent SANs.

IPMI/BMC enable

Not having a DisplayPort to connect a screen so you can see what is going on is making this a very proprietary piece of hardware. But when having access to the IPMI then all of sudden it becomes easy to reuse the hardware for something different than the StorSimple software.

This is how to enable the IPMI/BMC hardware.

Reseat one of the controllers or power cycle the appliance with a console cable connected
Press Esc to enter the boot options
Select “Setup Utility” from the list
It will prompt for a password (E1aD8wAbMxB3XcpjwVKD)
Go to Advanced tab
Go to IPMI BMC Configuration
Go to BMC Configuration
Scroll down till you get to the bottom and you will see the network configuration
Select LAN Channel number 1 and static IP source
Enter the IP, subnet, and gateway
Press F10 to save and exit
Log into the BMC with web browser and access the console from there Log in Username: admin Password: admin

The IPMI/BMC interface of Xeratex CS-6000 node

Now you can open a java based KVM tool to get the display from the node and do what you want. Awesome!

Java….

But there is a small catch, you can’t just open and run the IPMI. The firmware is old and uses encryption algorithms that are not allowed anymore. So you need to change the security properties of your java install and run the IPMI in an Internet Explorer running compatibility mode.

This is quite an easy fix. What I did was to open notepad as administrator, and edit the following file:

C:\Program Files\Java\jre1.8.0_131\lib\security\java.security

find and comment out the line that starts with “jdk.jar.disabledAlgorithms” by prefixing a #. Note that this will allow jar files signed with any algorithms to run, which ~~can~~ is to be considered insecure! But for us a necessary measure for getting access to the IPMI.

StorSimple software

Each compute node is using VHDX native-boot. So the SSD has a boot loader, and then each VHDX is in that boot loader. That means that they can deploy a newer version or factory reset by switching over to another VHDX disk. I was actually not aware of something like VHDX native boot, but its a very nice feature. For sure going to use that on my windows based laptop in the future. So much easier than having to do the native OS install.

The StorSimple software is based on Windows Server 2012R2. You are normally only able to use use the console connection for direct management, but it actually also has an IPMI/BMC feature on each compute node you can look deeper into the system.

Since I did not have the device password the StorSimple software could do nothing. So I got my fingers on PCUnlocker, a password reset tool. Booted through IPMI, where I could attach the VHDX file and have it reset the passwords of the administrator. This account was also disabled, but PCUnlocker did also take care of that part.

Now boot back into the StorSimple software I could now choose an administrator account, type in my new password and now I had access to a cmd. It was using server core install, so no GUI but that’s ok because now I had access to all the other HCS PowerShell cmdlets.

Unfortunately the former owner had also tried to mess around with it, so the factory default VHDX images and the compute node signatures did not match and therefore the “reset-hcsfactorydefault” could not validate the factory default images. Bummer.

Many of the HCS cmdlets where PowerShell cmdlets referring to a DDL, so no way to see what was going on. But the “test-hcsfactoryimage” and reset/initialize scripts where full-blown PowerShell. So from there, I could see what was checked for the VHDX image to validate. I actually did a bypass on the validation, and did the reset command, but after each node had generated a new VHDX from the factory VHDX files I booted but was stuck in the boot state of HCS software.

I found an eagerness to find a way to fix it, but then again the time spent would not payout. You need an Azure subscription to actually manage StorSimple since there is no local GUI, only the serial console. So I decided to install Windows Server 2019 in it instead. 🙂

Conclusion

It’s a nice piece of hardware, StorSimple should have been nice to use if it was not depended on Azure. I now have a 2-node possibility to run an HCI cluster running Storage Spaces and with a failover cluster, presented to each node with CSV volumes. I could run HyperV and have a 2U box with full redundancy. I still feel the eager to fix the StorSimple software but not for now 🙂

FreeBSD – install phpipam

Posted on May 26, 2020 by Jesper Ramsgaard

Quick guide on how to install phpipam on FreeBSD. I will assume that you know how to install FreeBSD 🙂

Remember to have a freshly updated server and use sudo instead of directly root access.

### Patch OS
freebsd-update fetch install

### Install all required packages
pkg install nginx php74-sockets php74-openssl php74-gmp php74-gettext php74-mbstring php74-gd php74-curl php74-pear php74-pdo_mysql php74-session php74-filter php74-json php74-iconv php74-ctype mysql57-server git sudo screen

Configure mysql

We won’t do any tuning to mysql, just create a user and database and lets go.

### Enable mysql on boot
sysrc mysql_enable=YES

### Run mysql_secure installation, choose to edit root password and press other to everything else.
mysql_secure_installation

### Login to mysql and create database, user and grant access to user
$ mysql -u root -p
CREATE DATABASE phpipam;
GRANT ALL ON phpipam.* TO phpipam@localhost IDENTIFIED BY 'trwITH!lU';
FLUSH PRIVILEGES;
QUIT;

Configure phpipam

Get phpipam and put in www dir. Use git to get code, this will also make it easier for version updates later on.

### Create folder
mkdir -p /usr/local/www/phpipam

### Get phpipam into folder
git clone https://github.com/phpipam/phpipam.git /usr/local/www/phpipam

### use version instead of dev
cd /usr/local/www/phpipam && git checkout -b 1.4 origin/1.4

### Create config.php
cp /usr/local/www/phpipam/config.dist.php /usr/local/www/phpipam/config.php 

### Edit config.php so it matches mysql settings you created
$db['host'] = 'localhost';
$db['user'] = 'phpipam';
$db['pass'] = 'trwITH!lU';
$db['name'] = 'phpipam';
$db['port'] = 3306;

Updating phpipam

### Create backup of config.php
cp /usr/local/www/phpipam/config /tmp/config.php

### Create backup of database
cd /usr/local/www/phpipam
mysqldump -uroot -p phpipam > db/bkp/phpipam_$(date -v-1d +%d-%B-%Y).db

### Pull from GitHub
cd /var/www/phpipam
git pull
git checkout -b 1.x origin/1.x
git submodule update --init --recursive

Finish up by opening the web interface and follow upgrade procedure.

Configure nginx

Make nginx start on boot and backup the original config. We will then add our own.

### Enable nginx and mysql and boot
sysrc nginx_enable=YES

### backup original config
mv /usr/local/etc/nginx/nginx.conf /usr/local/etc/nginx/nginx.conf.org

After we now have the backup, lets add the content beneath to nginx.conf.

user  www;
worker_processes 2;

error_log /var/log/nginx/error.log;

events {
    worker_connections 1024;
}

http {
    include mime.types;
    default_type application/octet-stream;

    #access_log  logs/access.log  main;
    sendfile on;
    #tcp_nopush     on;

    keepalive_timeout 65;

    gzip on;

    # disable max upload size
    client_max_body_size 0;
    # add timeouts for very large uploads
    client_header_timeout 30m;
    client_body_timeout 30m;

    server {
        listen 80;
        server_name ipam.ramsgaard.me;
        root /usr/local/www/phpipam;
        index index.php;
        location ~ \.php$ {
            fastcgi_split_path_info ^(.+\.php)(/.+)$;
            fastcgi_index index.php;
            fastcgi_pass unix:/var/run/php-fpm.sock;
            include fastcgi_params;
            fastcgi_param PATH_INFO $fastcgi_path_info;
            fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
        }
    }


    # HTTPS server
    #
    #server {
    #    listen       443 ssl;
    #    server_name  localhost;
    #    ssl_certificate      cert.pem;
    #    ssl_certificate_key  cert.key;
    #    ssl_session_cache    shared:SSL:1m;
    #    ssl_session_timeout  5m;
    #    ssl_ciphers  HIGH:!aNULL:!MD5;
    #    ssl_prefer_server_ciphers  on;
    #    location / {
    #        root   html;
    #        index  index.html index.htm;
    #    }
    #}
}

Configure PHP/FPM

Lets make a production ini file and afterwards setup php-fpm config file.

cp /usr/local/etc/php.ini{-production,}

Open the file /usr/local/etc/php-fpm.d/www.conf and uncomment the following lines.

listen.owner = www
listen.group = www
listen.mode = 0660

### Replace the TCP socket with unix socket.
;listen = 127.0.0.1:9000
listen = /var/run/php-fpm.sock;

### Enable and start php-fpm 
sysrc php_fpm_enable=YES
service php-fpm start

Conclusion

We have now installed all the required components, you should now reboot the server and check if all the services is coming up automatically. If so you can proceed and access the web interface of your new phpipam installation. Then follow the guide on how to get setup.

NetApp – ServiceProcessor stuck updating

Posted on April 23, 2020 by Jesper Ramsgaard

After update to OnTap 9.7 the service processors where stuck in “updating”. They never came up again, not even after rebooting it.

Procedure:

Disable auto update
Reboot one SP, wait for it to show online.
Run the update parameter manuel
If its online and updated then enable auto update again.

### Disable autoupdate
system service-processor image modify -node <nodename> -autoupdate false

### Reboot the service processor
system service-processor reboot-sp -node <nodename>

### Initiate update
system service-processor image update -node <nodename>

### Verify version and SP status
system service-processor show

### Enable autoupdate
system service-processor image modify -node <nodename> -autoupdate true

Here we see that the ctrl02 is online again, but with wrong firmware.

After Manuel update we now have the correct firmware and they are online. Ready to enable autoupdate

https://kb.netapp.com/app/answers/answer_view/a_id/1028746/~/service-processor-firmware-update-fails-

Cloud Director 10.1 released

Posted on April 15, 2020 by Jesper Ramsgaard

Been using vCloud since version 5.1. After a brief love affair with something called “Azure Pack” we put all our focus into vCloud.

8.20 was the first sign of heartbeats coming from VCD. We got confirmation that vCloud was for sure the platform that we were and had been looking for. Now we see the 10.1 released and from my point of view it’s a big one, may things change in GUI as in infrastructure. This release is also the final farewell to the old flex GUI.

First off we have to address the naming, I always liked the vCloud term, for me a strong brand. So a bit sad to see that go and now we have to get used to the Cloud Director instead. Thankfully we can still use the acronym VCD for VMware Cloud Director. #LongLiveVCD.

In the next few points, I will address some of the major things within this release.

APIs

We use a lot of the functionality of the APIs of VCD. Since we see that the development of VCD is changing into higher gear, so is the deprecation of the older API versions. For a small service provider, it’s always hard to revisit automation already working with existing APIs. When going on board 10.1 we have to go through a couple of workflows to update the to use the new 34.0 API. But on the other side, it’s also a good chance to refactor and optimize.

VMware Cloud Director API version 29 and below are not supported.
VMware Cloud Director API version 30.0 is deprecated and will become unsupported after VMware Cloud Director 10.1
VMware Cloud Director API version 31.0 is deprecated.

NSX-T feature improvements

More of the core NSX-T features is now available through VCD.

IPSec VPN
Dedicated External Network
BGP and Route Advertisement

We have been looking from the side for NSX-T development to reach an acceptable level for some time. NSX-V is still doing a good job. As someone who right now is standing up a new 16 node VMware cluster as a new provider VDC, I would have wished for it to be 6 months later so that all NSX-T functionality was ready and we could hopefully solo use NSX-T.

But we have to look into maybe having two 8 node clusters for NSX-V and on for NSX-T so we can already now start to transition to NSX-T…

But the good thing about being a VMware customer is that you are not left in the dust. There have been already been created migration tools for NSX-V > NSX-T, NSX-T Data Center Migration Coordinator, but it had no integration to VCD. which bring me to the next point!

NSX-V to NSX-T VCD Migration Tool

This is a way of helping us transition from NSX-V to NSX-T as we are seeing NSX-V lacking to the end of support in January 2021.

Before we could still do a new provider VDC that was backed by NSX-T controller and then start to move workloads over to the new cluster and at the time had to use NSX-T functionality, but all in a manual process.

There is now an automated way to do it, which is VCD aware. The approach will require a new cluster since NSX-V and NSX-T can’t coexist in the same cluster. From the Whats New in 10.1 it stats that the workflow will help with following

Automates migration of vCD metadata and workloads from NSX-V to NSX-T
Migrate per Org VDC migration to reduce maintenance window to single tenant
Minimize network downtime with bridged networks during migration
Live migrate with vMotion to ensure non-disruption to user workloads
Keep source VDC configuration and environment as-is to allow rollback

Tomas did a good discussion on this subject

SSL and Certificate Management

This seems like something to read up on carefully. In short, VCD does not trust endpoint certificates unless they have been imported to the trust store.

There is a tool helping with the import, trust-infra-certs, that automatically connect to the endpoint, grabbing and importing the certificate. If this is not done successfully you will not be able to talk to those endpoints after upgrading to VCD 10.1.

App Launchpad

A new feature to help introduce a marketplace with the help of the content from Bitnami. From there we can now offer customers to easily find, deploy and manage new workloads. Not just as VMs but also as containers.

Daniel did an excellent write up on this subject.

Conclusion

There is still a lot more in this release to talk about, CSE2.6, OSE1.5, Terraform 2.7 provider, etc. read more from the official release notes.

Might have had to write a disclaimer for the length of this post and the lack of interesting pictures, will try to improve for next time.

I love to see VCD take flight. We are looking forward being part of the future journey where things like Bitnami and App Launchpad together with more NSX-T functionality and a whole lot of other features helps us Cloud Providers to help other business to there digital transformation .

Big shout out to VMware and the VCD team!

vCloud SAML authentication – Automation

Posted on March 28, 2020 by Jesper Ramsgaard

Cant say that I did everything by my self in this post, I had a great great help from my college and friend Kasper Hansen. Also gotten a great help from the vExpert community, especially Tom Fojta.

In my last post I found out how to setup vCloud SAML against AzureAD. Now we are gonna look on how to automate each tenant to use the same AzureAD. In these days everybody have either a Microsoft og AzureAD account, so this way its easy to invite them as guest users and this way have controlled access but also ensure that vCloud users have MFA enabled.

We use VRO for the creation of vCloud tenants, in this flow we are now going to introduce a new workflow that will do following. Although the workflow is just a restcall to trigger an event in Azure Automation.

Create AzureAD Groups for admin and viewer
Post federation metadata to vCloud tenant
Post federation groups to vCloud tenant

Enable SAML

Fojta have some very good articles on his blog on the basic setup of SAML to different IDP systems. I also did a piece on it where Azure where the IDP provider.

Because we want to have all organisations linked to the same SAML app in Azure we need to have the same SAML certificate on all organisations. You can only do this with the API, but what the documentation did not say was that the certificate needs to be trusted by the keystore, the java keystore of the cells.

Create a self-signed certificate and make vCloud trust it

These commands will help you create a certificate and a private key in the needed pkcs8 format and certificate in the x509 format.

### Create the self-signed private key and certificate
jr@mbp:~ jr$ openssl req -x509 -nodes -days 365 -newkey rsa:4096 -keyout selfsigned.key.pem -out selfsigned-x509.crt

### Convert the private key to pkcs8 format
jr@mbp:~ jr$ openssl pkcs8 -topk8 -inform PEM -outform PEM -in 
selfsigned.key.pem -out selfsigned-pkcs8.key -nocrypt

When you have done the new self-signed certificate you need to import it to each and one of your cells. After import you will need to restart the cells. One of the errors I did here way that I tried to import a .pem where the private key and certificate where combines, that won’t work. Only import the certificate.

/opt/vmware/vcloud-director/jre/bin/keytool --import -trustcacerts -keystore /opt/vmware/vcloud-director/jre/lib/security/cacerts -alias saml -file selfsigned-x508.crt

Publish federation settings to API

We where having a lot of trial and error in this step, because that vCloud did not trust the certificate. Each time a put where done to the API the log complained. /opt/vmware/vcloud-director/logs/vcloud-container-debug.log it showed “Failed to generate keystore | requestId=<id>,request=PUT.”

Following is a example done in PowerShell, insert your self-signed certificate where it says —–END/BEGIN CERTIFICATE—–

### XML with federation metadata for AzureAD saml app
$xmlBody = 
@'
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<OrgFederationSettings xmlns="http://www.vmware.com/vcloud/v1.5" xmlns:rasd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData" xmlns:ovf="http://schemas.dmtf.org/ovf/envelope/1" xmlns:vssd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingData" xmlns:common="http://schemas.dmtf.org/wbem/wscim/1/common" xmlns:vmw="http://www.vmware.com/schema/ovf" xmlns:ovfenv="http://schemas.dmtf.org/ovf/environment/1" xmlns:vmext="http://www.vmware.com/vcloud/extension/v1.5" xmlns:ns9="http://www.vmware.com/vcloud/versions" href="https://<VCD_URI>/api/admin/org/7688ff82-77e8-4f70-a4b6-b1767ab110d1/settings/federation" type="application/vnd.vmware.admin.organizationFederationSettings+xml">
<SAMLMetadata>
    ...
</SAMLMetadata>
    <Enabled>true</Enabled>
    <SamlSPEntityId>test</SamlSPEntityId>
    <SamlAttributeMapping>
    	<EmailAttributeName>EmailAddress</EmailAttributeName>
    	<UserNameAttributeName>UserName</UserNameAttributeName>
    	<FirstNameAttributeName>http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname</FirstNameAttributeName>
    	<SurnameAttributeName>http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname</SurnameAttributeName>
    	<FullNameAttributeName>FullName</FullNameAttributeName>
    	<GroupAttributeName>Groups</GroupAttributeName>
    	<RoleAttributeName>Role</RoleAttributeName>
    </SamlAttributeMapping>
    <SamlSPKeyAndCertificateChain>
         <Key>-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----</Key>
        <CertificateChain>
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----</CertificateChain>
     </SamlSPKeyAndCertificateChain>
</OrgFederationSettings>
'@

Invoke-RestMethod -Uri "https://<VCD_URI>/api/admin/org/$orgId/settings/federation" -Method Put  -Headers @{'x-vcloud-authorization'= $vCDAuthorizationToken ; Accept = 'application/*+xml;version=31.0'; "Content-type"  =  "application/*+xml;version=31.0"} -Body ([System.Text.Encoding]::UTF8.GetBytes(($xmlBody)))

Publish SAML groups to API

Now that the SAML metadata/certificate is uploaded and in place we need to add groups to tenant. You can read more about what groups/users should be imported in my other SAML blog post.

Each tenant have its own role ids, so when doing automation with group import we need to query the vCloud API and get the role ids. There is a specific query API to get data. When using a system account we need to specify a “VCLOUD-TENANT-CONTEXT” in the header of the request. This we we can query a tenant context from a system account.

 ### Retrieve roles from tenant
[xml]$xml = Invoke-RestMethod  -UseBasicParsing -Uri 'https://<VCD_URI>/api/query?type=role&page=1&pageSize=20&links=true'  -Method get  -Headers @{'x-vcloud-authorization'= $vCDAuthorizationToken ; Accept = 'application/*+xml;version=31.0'; "Content-type"  =  "application/*+xml;version=31.0";  "X-VMWARE-VCLOUD-TENANT-CONTEXT" = "$orgId"}'

### Find the real is for x
$RoleHref = ($xml.QueryResultRecords.RoleRecord | where {$_.Name -eq "$role"}).href

After we got the role id we can now send up the group together with the role id and this was be able to authenticate based on a SAML group from AzureAD.

### Define XML with role and groupid
$xmlBody =
@'
<Group xmlns="http://www.vmware.com/vcloud/v1.5"
    xmlns:ns9="http://www.vmware.com/vcloud/versions" name="{1}"
    type="application/vnd.vmware.admin.group+xml">
    <ProviderType>SAML</ProviderType>
    <Role href="{0}" type="application/vnd.vmware.admin.role+xml"/>
</Group>
'@ -f $RoleHref , $GroupId
 
### Post xml to vcd 
Invoke-RestMethod -UseBasicParsing -Uri "https://<VCD_URI>/api/admin/org/$orgId/groups"  -Method Post  -Headers @{'x-vcloud-authorization'= $vCDAuthorizationToken ; Accept = 'application/*+xml;version=31.0'; "Content-type"  =  "application/*+xml;version=31.0"} -Body ([System.Text.Encoding]::UTF8.GetBytes(($xmlBody)))

Conclusion

Now we have all the pieces for making automation where we can enable a tenant for SAML authentication and afterwards import f.eks. a viewer and admin group. External users will then be invited to the AzureAD, imported into the right group and now they have access to the their tenant. We can help the organisation secure the access to their virtual datacenter with MFA and they will have single sign-on with there own user that originates from their own AzureAD or Microsoft account.

A service library will be made where users can be invited to the tenant organisation. So that when one user have been invited that user will be able to invite its colleagues.

Homelab – v1

Posted on February 18, 2020 by Jesper Ramsgaard

This will be a short blog series of the new setup and how you can start to do your own homelab.

The basic idea of a homelab

I have always had a homelab, small, but enough to learn and the more you learn the bigger your need is. The first homelab consisted of 2* Apple Mac mini. The Apple Mac mini is very power efficient and very quiet. Not the beefiest hardware, but just enough to be able to run a vCenter and have vSAN running.

Apple Mac Mini v5.1 mid-2011 A1347
2.3 GHz Core i5 (I5-2415M)
16GB DDR3 memory
Dual Drive kit
256GB Cache disk
600GB capacity disk

They were mounted in a Sonnet MacRack mini 1U enclosure. Which have been perfect for many years. In my small setup I have been running my pFsense firewall and all sorts of small VMs, due to the small memory amount I was primary FreeBSD VMs with services as Zabbix, Weewx, OpenHAB, Unify controller, TOR and things like that. All stuff to play around with besides VMware of cause.

“Homelabbing” is where is see people learn and are having fun, without breaking too much.

The idea of new homelab

I have always had a way higher power bill than other “normal” people”. Servers, NAS and home automation gear standing around are not good for you power bill. And that’s also why my first homelab was made of Mac mini.

So instead of having huge servers in the garage or basement, I have always tried to keep the footprint down. The WOF(WifeAprovalFactor) also makes a hit here 🙂

I have a wall-mounted 19″ rack with 12U and 600mm depth. Placed in the garage where noise is not a problem anymore.

I want to run an all-flash VMware vSAN cluster with three nodes. I don’t want only two hosts and a witness appliance, even if it works and it is a fully supported concept for small- or branch offices. I want a real scale. Each server should have one cache device and at least one SSD for the capacity tier. I went all-in and decided to go with two SSDs for capacity. All servers have to be connected with 10Gbit SFP+ for vSAN and vMotion.

Conclusion of upcoming homelab

Small footprint, both power, and space.
3 node all-flash vSAN cluster
10Gbit SFP+ networking
Formfactor must be rack

The new hardware

Decided to go with Supermicro hardware. They have IPMI and actually some of the E300 series is now on the VMware compatibility list.

Supermicro kits such as the E300 are a very popular choices amongst the VMware community. It got a powerful Xeon-based CPUs and support for up to 128GB of memory, it is perfect for running a killer vSphere/vSAN setup and still keeping cost, noise and power bill down.

BOM

Here is a list of what the hardware consists of. This gives a hell of horsepower for a homelab and plenty of memory and CPU for doing nested environment so test our NSX-V to NSX-T migrations etc.

Number	Item
3	Supermicro E300-8D 4core Intel D-1518 2Ghz
3	Intel 790p 128GB disk for cache tier
3	Supermicro Riser card
3	Supermicro SATA-DOM 32GB
12	Samsung 32GB DDR4 memory modules
3	Intel 600GB SSD for capacity tier
3	Supermicro Rackmounts

Now the hardware is documented. Next will be rack and stack and how the environment will be designed 🙂

vCloud SAML authentication

Posted on February 13, 2020 by Jesper Ramsgaard

vCloud have LDAP, SAML and local users as an option for tenant authentication. In this post, we are looking into SAML integration. With AzureAD.

The cool thing about AzureAD is that you will gain the MFA option out of the box, and when tenants want access we can also invite them from their own AzureAD tenant into the resource AzureAD tenant. This gives flexibility and overview of who has access.

ADFS is also an option, but there you need to keep your own infrastructure with a resource AD/ADFS and furthermore need a 3. party MFA solution.

Process:

Setup Enterprise app in desired resource AzureAD
Setup claims
Set federation entity id for tenant
Import vCloud federation metadata to AzureAD
Import AzureAD enterprise app federation metadata to vCloud
Setup allowed users/groups in vCloud

AzureAD

Let’s get started with Azure AD configuration. Login to your AzureAD portal https://portal.azure.com. Navigate to “Azure Active Directory” > “Enterprise App” and press “New Application”. Choose “Non-gallery application”. Give it the name “vCloud SAML test” and press “Add”. This will take a couple of minutes.

Navigate back to “Enterprise Apps” > “All applications” and choose your newly created App.

For test purpose, add/assign a test user to the app. This is under “Users and groups”. This user will be able to login to the enterprise app with AzureAD.

Now go to “Single Sign-on”. This will now ask for the sign-on method, and here we will choose “SAML”. This will then take us to the SAML setup. The first thing to do is importing the metadata from the cloud.

You will find the metadata by logging in to vCloud, go to the tenant, under “administration” > “federation” tab. Enter the URL for the tenant as a entity id, apply and afterwards download the metadata from the link.

You will find the metadata by logging in to vCloud, go to the tenant, under administration choose the federation tab. Enter the URL for the tenant as a entity id, apply and afterwards download the metadata from the link.

In azureAD “Upload Metadata” and chose the downloaded file from vCloud. This will give AzureAD the knowledge of where to redirect and accept request from.

vCloud can validate a couple of user/group parameters. Vmware documentation. So we will add some claims to Azure AD.

Now we will need to download the AzureAD metadata and import into vCloud. Fetch the data by pressing “Download” to the “Federation Metadata XML”.

Head over to vCloud tenant federation page again. Paste the content from the download metadata file. check the “Use SAML identity” and apply. Now we are almost ready to try it out. But first, head over to “Users” tab in vCloud. We need to add the user/role to whom are allowed to gain vCloud Access.

Here we put in the mail address and role of the user from Azure AD. When the SAML response then returns to vCloud then vCloud can see it been authenticated in Azure AD and that the user is an Org admin.

Next step would be to use groups and roles so that we can put users into groups in Azure AD and that way manage access for the tenant. But after this, we can now head to the tenant URL. We will then be redirected to the Azure AD login page, login and accept to MFA so that we can be redirected to our vCloud tenant.

And voila, we have logged into our vCloud tenant with Azure AD.

Troubleshooting:

When I first started this project I was using a GUID as a vCloud entity id. That meant that I could get it to work with ADFS but not AzureAD. I went full mole on the troubleshooting.

In the end, I intercepted the SAML responses. These are encoded in base64, easy task to decode. And afterwards, I got the XML that either ADFS or AzureAD is sending back. I could then compare them, and I saw som <ds> tags to the cert that wasn’t on in the response from AzureAD. Unfortunately, that was a duck and meant nothing.

By tailing the log from vCloud, tail -f /opt/vmware/vcloud-director/logs/vcloud-container-debug.log, I could get some hints when the SAML auth failed.

org.opensaml.common.SAMLException: Local entity is not the intended audience of the assertion in at least one AudienceRestriction

doing a bit more googling and found out that I should be looking at the <audience> tag from the two SAML responses. And yes, that made some sense.

Azure AD sets the value of this element to the value of Issuer element of the AuthnRequest that initiated the sign-on. To evaluate the Audience value, use the value of the App ID URI that was specified during application registration.
Like the Issuer value, the Audience value must exactly match one of the service principal names that represents the cloud service in Azure AD. However, if the value of the Issuer element is not a URI value, the Audience value in the response is the Issuer value prefixed with spn:.
https://stackoverflow.com/questions/38978298/azuread-jwt-token-audience-claim-prefix-makes-jwt-token-invalid

And that was the problem, spn: prefix when not using a URL as entity id. Changing it to the URL made it work.

Maybe this is obvious to the world, but I didn’t know it, but glad my troubleshooting skills where sufficient 🙂

APC 7920 PDU Console

Posted on February 11, 2020 by Jesper Ramsgaard

Short post – needed to reset password on a APC PDU. Only way to do this was getting a console cable, of cause this cable is a proprietary cable of APC and not within my reach when you need it.

So found the pinout and did a little DIY. Pinout is as in the table

APC RJ12 Pin	DB9 Pin
1 Not used	1
2(GND) Yellow	5 (GND)
3 Green	2 (RX)
4 Red	3 (TX)
5 (GND) White	5 (GND)
6 Not used	1

RJ12 pluged into PDU and connected to jump wires. White and yellow is GND. Green

DB9 male console cable. White is GND pin5. Gray is TX pin3. Black is RX pin2.

Press and hold the reset button of the APC PDU. Wait for Orange LED to blink and press reset again. Check console for login each sec. When it respons you have 30 sec to login with apc/apc and reset the password.

Install and use MegaCLI on VMware host

Posted on January 24, 2020 by Jesper Ramsgaard

Over the last decade, I had the fun of how having to manage an LSI based RAID controller. Never on Windows machines, where the GUI-based Storage Manager tools are simple to work with.

Even though I usually find the vib and get it installed I always struggle to remember how it’s installed and what the commands are. This time I will write it down for the future me, or you?

Procedure

Find the MegaCLI vib file and download it…
Copy vib to ESXi host
Install vib
Use MegaCLI for whatever purpose you got

Finding the vib

This is where I struggle the most. LSI was bought by Avago and soon after Avago was bought by Broadcom. So the support links for the downloads have been 404 and using Broadcom’s support site is an education degree that I do not own. This time the link was this, giving you a zip file containing the MegaCLI package for all platforms.

If the link does not work for next time, or maybe a newer version is out. I also managed to find it on https://www.broadcom.com/support/download-search. Make a keyword search for MegaCLI, expand the “management software and tools” from the results and choose the newest “MegaCLI x.x Px” For now it’s MegaCLI 5.5 P1 version 8.07.07.

Install MegaCLI

We now got the zip, extract it and under the “VmwareMN” folder there is the vib that we are gonna be needing.

### SCP it to the host
jr@mbp:~ jr$ scp /Users/jr/Download/8-07-07_MegaCLI/VmwareMN/vmware-esx-MegaCLI-8-07-07.vib root@[ESXHOST]:/tmp/

### SSH to the ESXi host and install. Reboot afterwards
[root@esxhost:~] esxcli software vib install -v /tmp/vmware-esx-MegaCLI-8-07-07.vib

If you are lucky and get a “Could not find a trusted signer” when trying to install the vib the workaround is to add “–no-sig-check” at the end of the esxcli command, after the file path. Since I downloaded it from Broadcom’s own site, I trust it.

After the host reboot(which is very annoying, but necessary). We can not find MegaCLI binary under /opt/lsi/MegaCLI/

Useful MegaCLI commands

### Enclosure information
 /opt/lsi/MegaCLI/MegaCli -EncInfo -aALL

### Virtual drive information
/opt/lsi/MegaCLI/MegaCli -LDInfo -Lall -aALL

### Physical drive information
/opt/lsi/MegaCLI/MegaCli -PDList -aALL

### Silence active alarm
/opt/lsi/MegaCLI/MegaCli -AdpSetProp AlarmSilence -aALL

### Disable alarm
/opt/lsi/MegaCLI/MegaCli -AdpSetProp AlarmDsbl -aALL

### Enable alarm
/opt/lsi/MegaCLI/MegaCli -AdpSetProp AlarmEnbl -aALL

### Prepare for removal
/opt/lsi/MegaCLI/MegaCli -PdPrpRmv -PhysDrv [E:S] -aN

### Unconfigured Bad to good
/opt/lsi/MegaCLI/MegaCli -PDMakeGood -PhysDrv[E:S] -aN

I found a guy that did a bit more advanced MegaCLI scripting, its bit old but still very useful. You can find the site here. I have done some copy-pasting from the script, but all credit goes to the guy behind the link.

### List disk status
/opt/lsi/MegaCLI/MegaCli -PDlist -aALL -NoLog | egrep 'Slot|state' | awk '/Slot/{if (x)print x;x="";}{x=(!x)?$0:x" -"$0;}END{print x;}' | sed 's/Firmware state
://g'

Conclusion

CLI is awesome, so many possibilities and so flexible. In my opinion its a bit hard to find, but after you got it installed its easy. I have tested this on ESXi6.7 and it world as it should. I hope you can use some of it.

vCloud – Changing SSL certificate

Posted on January 13, 2020 by Jesper Ramsgaard

In this post, I will explain how to install a public certificate into vCloud Director cell(s). This exact environment has a public signed cert that is up for renewal. A new certificate has been bought and signed and is ready to import.

vcd cells have 2 IP addresses that allow support for 2 different SSL endpoints (http and consoleproxy). Each endpoint requires its own SSL certificate. vCloud Director uses a java keystore to read its SSL certificates from. In a multi-cell environment, you need to create 2 certificates for each cell and import the certificates into the vcd java keystore. But since we hare here using a wildcard certificate the same certificate will be used to but endpoints.

The new certificate have been created with a CSR that was not generated from the vCloud cells, so we need to import both private and public key from an export of the certificate. In this case it’s a .PFX.

Certificate is a wildcard. If you are using a UCC SAN certificate with the exact names then be sure that the names in certificate are matching accordingly to vCloud settings.

I assume you got

Already working/configured vCloud environment
New public signed certificate exported to a .PFX format (contains both public and private key)

We will

Connect to cell with winscp and transfer the .PFX to /tmp/
Connect to cell with Putty
- Create a new keystore with the new certificate
- Stop vcd service
- Swap old keystore with new
- Start vcd service

Initialize certificate change…

winscp copy the .pfx to the cell tmp directory

The commands for creating the new keystore and importing the cert is below. Change the STOREPASS and KEYPASS to something meaningful for your environment. It is also important to notice that the alias of each certificate must be “http” and “consoleproxy”. Else vcd won’t find the certs.

A note about the alias, I have seen it generate GUID but also just numbers. So if your list command is showing “1” then you need to change alias 1 to respectively http or consoleproxy.

### Stop vCloud Director service
service vmware-vcd stop

### make passwords variable in unix
STOREPASS= <pass>
KEYPASS= <pass>

### Add the certificate to a new created certificates.ks keystore.
/opt/vmware/vcloud-director/jre/bin/keytool \
-keystore /tmp/certificates.ks \
-storepass STOREPASS \
-keypass KEYPASS \
-storetype JCEKS \
-importkeystore \
-srckeystore /tmp/wildcard2020.pfx

### List certificate alias
/opt/vmware/vcloud-director/jre/bin/keytool \
-storetype JCEKS \
-storepass STOREPASS \
-keystore /tmp/certificates.ks \
-list | grep -i alias

### Rename certificate random alias to http
/opt/vmware/vcloud-director/jre/bin/keytool \
-storetype JCEKS \
-changealias \-alias "te-d487d1c7-2c76-482a-8e61-69107ee3027f" \
-destalias http -keystore /tmp/certificates.ks

### Add the Remote Console Proxy certificate to a new created certificates.ks keystore.
/opt/vmware/vcloud-director/jre/bin/keytool \
-keystore /tmp/certificates.ks \
-storepass STOREPASS \
-keypass KEYPASS \
-storetype JCEKS \
-importkeystore \
-srckeystore /tmp/wildcard2020.pfx

### List certificate alias
/opt/vmware/vcloud-director/jre/bin/keytool \
-storetype JCEKS \
-storepass STOREPASS \
-keystore /tmp/certificates.ks -list | grep -i alias

### Rename certificate random alias to consoleproxy
/opt/vmware/vcloud-director/jre/bin/keytool \
-storetype JCEKS \
-changealias \
-alias "te-d487d1c7-2c76-482a-8e61-69107ee3027f" \
-destalias consoleproxy \
-keystore certificates.ks

### Make a backup of the existing keystore 
cp /opt/vmware/vcloud-director/certificates.ks /opt/vmware/vcloud-director/certificates.ks_old

### Copy the new keystore file to the vCloud Director environment
cp /tmp/certificates.ks /opt/vmware/vcloud-director/certificates.ks

### Set correct permissions to the keystore file
chown vcloud:vcloud /opt/vmware/vcloud-director/certificates.ks
chmod 600 /opt/vmware/vcloud-director/certificates.ks

### Make cells generate proxy console certs based on new keystore.
cd /opt/vmware/vcloud-director/bin
./cell-management-tool certificates -p -k /opt/vmware/vcloud-director/certificates.ks -w STOREPASS

### Start vCloud Director service
service vmware-vcd start

To see if the cell have booted correctly you can tail the cell log. It will give you a “startup completed in x”.

 tail -f /opt/vmware/vcloud-director/logs/cell.log

I got more than one cell…

That’s awesome – me too. You can scp the certificate from the cell with the new cert to the other cells. So let’s get that newly created keystore over the other cells.

### SSH to next cell and stop the vCloud Director service
service vmware-vcd stop

### Go to keystore path
cd /opt/vmware/vcloud-director/

### Move the existing to new filename with .old surfix.
mv certificate.ks certificate.ks.old

### Copy the new certificate into place
scp root@dc1svcdcell01:/opt/vmware/vcloud-director/certificates.ks .

### Make cells generate proxy console certs based on new keystore.
./cell-management-tool certificates -p -k /opt/vmware/vcloud-director/certificates.ks -w KEYSTORE_PASSWD

### Start vCloud Director service
service vmware-vcd start

I hope you enjoyed reading this post. Feel free to share this on social media if it is worth sharing.

Ceph MDS stuck in ‘rejoin’

Posted on January 8, 2020 by Jesper Ramsgaard

CephFS filesystem suddenly dies, what do you do? Well, It’s relaying on the MDS(MetaDataService) to keep an online filesystem. When looking at the Ceph status it gives us that the MDS cache is oversized and files system is degraded. This is only health warning, but the filesystem is not available due to it, that’s good in a way because then there is nothing wrong with the Ceph cluster itself. Looking more into the problem, MDS seems to be in a recovery limbo state. hmm…

MDS services are in “rejoin” limbo state. Never coming back up.

Looking at the MDS status documentation the rejoin state indicated that it’s trying to load the old cache back in before going into “up” state. https://docs.ceph.com/docs/master/cephfs/mds-states/.

Closer looking at the logs the process is stating over and over again, but never finishes. This is due to a mechanism from the monitors kicking in and restarting the MDS service when not responding in to cluster with up in a timely fashion. So it never gets done with what’s its doing. Hmm, and CephFS is still unavailable.

Trying to set mds_beacon_grace to a wicked number did also not help, don’t know if the grace should be doing anything. But going from the default 15 to 1500 did not help. I was hoping to give the MDS time to load in the old cache.

From the logs its respawning the MDS due to lost contact to the cluster.

Going through an endless number of Ceph thread I were reading that others have encountered this exact problem. This thread gave me the info on how to remediate and get back online with the cluster. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/028981.html

Setting the MDS to wipe all client session where the first one. I set this though the Ceph GUI since I find it easier to find compared to the CLI command.

Default its “false”, but setting it to global “true” no client connections are made.

The second one was to delete the MDS “mds*_openfiles.0” from the CephFS metadata pool. Looking into the pool I could the there where many objects referring to open files. But the post stated only to delete the .0 objects. Need to be done for all the MDS services that you have running. The “openfiles” objects are open file hints. It’s safe to delete them. Read more on rados commands on https://docs.ceph.com/docs/giant/man/8/rados/

### Delete for all the MDS you have running. mds1,mds2, mds3 etc.
[root@dspp-mon-a-01 cephadm]# rados -p cephfs_metadata rm mds0_openfiles.0

After deleting the open file objects I stopped all MDS services on all nodes. Some of them did not stop, so I killed the process. Probably should have stopped them first before deleting the open file objects…..

[root@dspp-osd-a-06 cephadm]# systemctl stop ceph-mds.target

After starting up the MDS services again it recovered in a couple of seconds. CephFS is available and “ceph -s” showing healthy condition. Set the wipe_sessions back to false and now CephFS could be mounted again.

What to conclude? there is a fix in 14.2.5. So when it’s ready its is time to update the Ceph cluster. The ticket for it should be this one. https://tracker.ceph.com/issues/41467. These guys also did a good help to resolve the problem https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/AOYWQSONTFROPB4DXVYADWW7V25C3G6Z/.

Disk mapping Windows/VMware

Posted on December 21, 2019 by Jesper Ramsgaard

Since I’m working in a datacenter department at a service provider automation is a big thing. We have lots of different automatic workflows already. Everything from reading out power usage for co-location customers to creating a fully functional virtual datacenter with VMware vCloud Director.

The latest idea was to create an automatic disk expansion service. We monitor the customer’s environments with PRTG and call to help them with an expansion when more disk space is needed. But that’s only within business hours and of our service desk are busy we don’t always make the expansion in a timely fashion. For an exchange server, this is bad, full-disk means no mail flow.

Our backend developer(super skilled guy) extended the service agent that we run on all customer servers, with a new data collector that looks for free space and disk-identifiers. If a disk is running full he will create a RabbitMQ ticket that will trigger a vRealize Orchestrator workflow that finds the disk and expands it. Then reports back to his services so that his service can expand the disk from within Windows.

Identifying Windows disk from VMware environment

Our google foo was giving the same result over and over again, we should look at the SCSI ID. From within Windows, you can get the LUN ID and what controller its located. That position should then be the same as seen from VMware side.

While testing it on Windows 2016+ this worked ok. BUT we have customers that are still on Windows 2012, and here it didn’t work. *Sigh*. If the VM where having multiple controllers then we could not see what UnitId were to attach to the corresponding Controller Id. So back to the drawing board.

### From VMs Id and ControllerId and UnitId the disk that needs expansion is found. 
#$vmDisk = (Get-VM -Id $vmid | Get-HardDisk) | where { $_.ExtensionData.ControllerKey -eq ((Get-VM -id $vmid | Get-ScsiController ).ExtensionData | where { $_.BusNumber -eq $ControllerId }).Key } | where { $_.ExtensionData.UnitNumber -eq $SCSITargetId }

### Afterwards the disk can have the added capacity.
$vmDisk | Set-HardDisk -CapacityGB ($vmDisk.CapacityGB + $ExpandSizeGb) -Confirm:$false

We then kept looking but could not find anything in particular. Thinking about a physical disk having a serial number we began to pursue that idea, the VM should see the UUID that VMware where presenting. And yes, this sure seems to be working a Windows 2008 through Windows 2019.

VMware VM extension data – UUID

With the disk serial number approach, it was also easier to find the disk.

### UUID can be found in the VM extension data.
$vmDisk = (Get-VM -Id $vmid | Get-HardDisk) | Where-Object {$_.ExtensionData.Backing.uuid.Replace("-","") -eq $disksn }

Conclusion:

Don’t know why other people are not suggesting the disk serial number approach instead of the SCSI ID. But my theory is that many looks at what data they can get from the vCenter GUI. And here the SCSI ID based on controller id and unit id is the only thing really available.

But there is a lot of nice data when using PowerCLI to look at the data. Especially when doing automation.

Cisco ASA cluster – upgrade

Posted on December 20, 2019 by Jesper Ramsgaard

Cisco seems to have a good track record of there products, but I must say that there ASA firewalls have seen a lot of critical bugs in the last couple of years. Both in hardware and software…

The last critical bug I was not informed about, so didn’t catch it before the customer did. Always nice when a customer calls in with the problem of there primary ASA being down. It crashed in a way that meant that it did not come up again. It needed a physical reboot.

Before having the chance to have someone onsite locate the firewall and reboot it that secondary also died. And did not come up again! Customer needs to get online again, so there was no time to get a console cable and see what the heck was going on. So I told them to do a hard reboot on both firewalls. After the ASA booted they both became active again and could see each other. Great, customer online. But why and how.

Contact with Conscia Cisco support could confirm that the exact issue has been hitting multiple customers. Due to a bug, the firmware did a memory buffer overflow when being hit by a specific udp/500 attack. Great, now we know the problem and the fix is to upgrade ASA firmware.

It’s not something I do often, and I always forget to write down to procedure, so here goes.

Upgrade procedure

Have a look at the cisco ASA upgrade guide, to see what version you and on and what is supported to go up to. I were on 9.8.2 and could go up to 9.13.x. So I did. https://www.cisco.com/c/en/us/td/docs/security/asa/upgrade/asa-upgrade/planning.html#ID-2152-0000000a
Download and upload firmware to BOTH members of the cluster
Change the boot image to the newly uploaded image
Update the secondary, make a failover
Update the primary and make a failover
Done

Uploading the images to both nodes with TFTP.

I used the portable version of Tftpd64 by Jounin, simple and works out of the box. Copied the freshly downloaded images to both nodes.

### Primary
DS-ESB-ASA5516x# copy /noconfirm tftp://10.0.2.14/asa9-13-1-lfbff-k8.SPA disk0:/asa9-13-1-lfbff-k8.SPA
DS-ESB-ASA5516x# copy /noconfirm tftp://10.0.2.14/asdm-7131.bin disk0:/asdm-7131.bin

### Secondary
DS-ESB-ASA5516x# failover exec mate copy /noconfirm tftp://10.0.2.14/asa9-13-1$
DS-ESB-ASA5516x# failover exec mate copy /noconfirm tftp://10.0.2.14/asdm-7131$

Change config to the new image

So now we will change over the config so that it will use the new boot images that we have uploaded. First, we remove the existing boot image, and afterwards, we set the new image together with the new ASDM image.

### Show current boot image
DS-ESB-ASA5516x# show running-config boot system
boot system disk0:/gf/asa982-20-lfbff-k8.SPA

### Remove existing boot image
DS-ESB-ASA5516x(config)# no boot system disk0:/gf/asa982-20-lfbff-k8.SPA

### Add new boot image that you just uploaded
DS-ESB-ASA5516x(config)# boot system disk0:/asa9-13-1-lfbff-k8.SPA

### Reload the standby node for the new firmware to take effect
DS-ESB-ASA5516x(config)# failover reload-standby

### Look at the output from show failover, check if the standby is up and verify the firmware version.

Failover and reload the second node

So now the secondary node is booted with the new firmware, time to failover to it so we can reload and have the new firmware running on the primary node. When doing the failover you might lose the SSH connection, just connect again. This time you will be connected to the second node, that is not the active node. Reload the primary, that is now standby and wait for it up come up. It will show in the console that its sending config to mate. Just like when we did it with the first reload of the standby, secondary node.

### Controlled failover to secoundary, standby node
DS-ESB-ASA5516x# no failover active
### reload the primary, standby node for firmware to take effect.
DS-ESB-ASA5516x# failover reload-standby

You are done

Now it’s only to test, and if you want, failback to the primary. But that up to you. I did not lose one ping through the upgrade process. So that cluster is indeed working as it should. While you are at it then why not also update the AnyConnect client and remember to clean up the flash so the old versions and file won’t fill it up. Enjoy your newly updated cluster.

Prevent Drive Failure at 32,768 Hours

Posted on December 2, 2019 by Jesper Ramsgaard

This is one of the nasty bugs. Some SSD models will fail after they have been powered on for more than 32768 hours. Imagine running vSAN and you bought x amount of disks that where affected. They will all fail at the same time, so you are left alone with your backup(hopefully).

I seen this one time before, where Intel disks where the problems. Unfortunately the Intel SSDs where metadata disks in a Ceph storage cluster, and since they all failed at the same time, the cluster died!

This is of cause due to that nobody where informed of the bug. When buying hardware from HPE and other enterprise hardware vendors we cat a mail letting us know of the problem before it becomes a disaster.

https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00092491en_us

Update procedure – VMware

We had to do a firmware update of the disks, we are running VMware and vSAN. And gladly HPE have allready released the patch. Also with guidance for VMware.

https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_6089c15599b647aca0c049ce24#tab2

download the patch, copy it to /tmp of the ESXi servers.
Unzip and make the .vmexe file executable chmod +x CP****.vmexe
Put one of the hosts into maintenance mode.
Run the CP***.vmexe – ./CP****.vmexe. It will lists the disks that it found and you tell it the disk numbers for those you want to have firmware upgraded.
After upgrade I did a reboot anyway.

Remember that reboot of vSAN nodes can take a long time, 10-30 min. On the console of the server it says: “vSAN initialising SSD XXX” Give it time, it will boot.

Fetching firmware version

When you use the HPE custom VMware image then we have all the HPE tools on the server, so that we can query hardware etc.

cd /opt/smartstorageadmin/ssacli/bin
Execute ./ssacli ctrl slot=0 pd all show detail

For more a command cheat sheet you could look at https://wiki.phoenixlzx.com/page/ssacli/ or the official documentation https://support.hpe.com/hpsc/doc/public/display?docId=c03909334

This will give you all info on the disks behind the controller. The model number is the one that you can look up on HPEs site to see if its affected.

vRO – start a workflow with AQMP message

Posted on October 21, 2019 by Jesper Ramsgaard

System to system interaction can be hard. API integrations are a way of doing it, but we can also use a message bus. Actually I think that using a message bus is a very awesome way of doing it, it’s a very loose couple between systems and we can queue multiple things and have the task or messages in RabbitMQ until system is ready to consume the messages.

This is a guide of using vRO to pick up the RabbitMQ message and start a workflow with the payload of the message.

Adding RabbitMQ to vRO:

Open vRO legacy client, login
Expand Library > AMQP > Configuration
Start the workflow Add a broker. This will pair vRO and RabbitMQ.
Follow the wizard. I’m using a virtual host, but input all depends on your RabbitMQ config.

Running the workflow “Add a broker” to pair vRO and RabbitMQ.

Subscribe vRO to a RabbitMQ queue:

From the same position in vRO, now run the workflow “Subscribe to queues“. This will make vRO aware of the queue and be able to use it as a trigger.

Choose the broker we added in the section before.

Add the name of the queue from RabbitMQ, must be identical from what the queue name is in RabbitMQ.

Now vRO is monitoring the queue, we are ready to proceed.

Creating a policy:

Now we need to create a policy that can tie the event of a message from RabbitMQ into a starting a workflow.

Navigate to the policy fane in vRO and create a new policy.

Inside the new policy, add a new policy element.

From the popup you can now filter for the queue that you earlier made a subscription on.

Right-click the queue name and chose **Add trigger event**

Holding focus at the “onMessage” event we can choose a workflow that the policy will start on message.

You will need to search, this time its for the workflow that you want to start when there is a message in the queue.

Now we will need to map a variable to the payload of the message that we have received. You need to have an in variable declared as a string in the workflow that you have mapped to.

The in variable from your workflow will show and by double-clicking on the name of the variable you are able to choose the “event.key.body” to it. This will make the payload of the message become a variable for your workflow.

Wrap-up:

That was a lot of screen dumps, I hope it still makes sense.

Now you should make the policy start when vRO starts and for now also start the policy so that you can see it works. The case that is shown in the screen dumps will have a JSON as the message payload, it then sends it to the messageBody variable and then inside the script it will extract the values that it needs in for the workflow to run.

Example of how to parse the JSON to a JavaScript object and afterwards parse the needed parameters over to a new variable. In this case its parameters for a PowerShell script.

Killing a virtual machine the esxcli way

Posted on October 20, 2019 by Jesper Ramsgaard

Rarely I run into a ghost VM. I can’t do anything with the VM from vCenter or local UI for the ESXi host. It looks like it’s powered off but in fact, it’s still running in a sort of ghost state. You can vMotion all VMs of and reboot that hosts that the ghost VM is on. Sometimes its standalone hosts and then killing the VMs world with esxcli is easier.

Connect to the ESXi host with ssh
Get a list of all running VM worlds

esxcli vm process list

3. Identify the world from the output and take note of the World ID. From here we will kill the world. Start with type soft and then escalate it if it doesn’t work.

esxcli vm process kill --type=[soft/hard/force] -–world-id=ID

VM should now be killed, the VMX files are unlocked and you can manage the VM with the GUI tools again. If it didn’t work then you are left with the option to reboot the host containing the ghost VM.

Clear Windows print queue

Posted on October 13, 2019 by Jesper Ramsgaard

Even as an infrastructure guy in a datacenter you now and then have to deal with printers. The small beasts with there own life and horrible drivers! I always forget the path for where the spooler puts its temp files. So now I put it here for future me to find it again next time I have to clear spooler files and restart det print services.

GUI way is slow and by having the CMD edition you can always guess what should be done in the GUI.

Open an elevated command prompt.
Type net stop spooler then press “Enter“.
Execute del %systemroot%\System32\spool\printers* /Q
Type net start spooler then press “Enter“.
The print queue on your Windows should now be cleared.

ESXCLI host upgrade procedure

Posted on August 27, 2019 by Jesper Ramsgaard

Most of the time you would want to use VMware Update Manager when doing upgrade. Its part of vCenter and is necessary tool when having to maintain your environment. But for smaller deployments, with standalone hosts and no vCenter the following upgrade methods are desired and can help the upgrade time. Instead of having to upgrade with IPMI and an ISO.

Online mode:

This method is for getting the update online, no need to download ISO/offline bundles, etc. This will work for most of the upgrade use cases.

1: Connect to your ESXi host via the host client and enable SSH. Afterward ssh to the ESXi host and enable ESXi firewall rule to allow the host to access the internet.

esxcli network firewall ruleset set -e true -r httpClient

2: With the beneath command you will get a list of available ESXi packaged that are on the VMware repos. Enter this command to list all available profiles. We filter only those which are relevant to our case – upgrade to ESXi 6.7

esxcli software sources profile list -d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml | grep -i ESXi-6.7

3. Chose the desired profile and use the following command for choosing and upgrading the ESXi version. Before upgrade its a good idea to enter maintenance mode.

esxcli system maintenanceMode set --enable true

esxcli software profile update -p ESXi-6.7.0-20190402001-standard -d https://hostupdate.vm
ware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml

4. After it’s done, you will need to restart the host, after its rebooted you will run on the new ESXi version.

Custom, with Offline bundle:

This method is for when you desire to install a custom update, or that your hosts down have access to the internet.

1: Download the offline bundle from the VMware webpage, in this upgrade I will use an HPE custom version. But if you run a generic version, that will also work.

2: After downloading the “VMware-ESXi-6.7.0-8169922-depot.zip” file, place it (upload it) to a datastore which is visible by your ESXi host. Best would be a local datastore if this host has some. If not, it can also be a shared datastore too.

3: Find the profile name from the depot offline bundle

 esxcli software sources profile list -d /vmfs/volumes/prd.r60lun01/ISO/VMware-ESXi-6.7.0-Up
019-depot.zip

Put your host into maintenance mode, enable SSH if you haven’t done yet.

3: Execute this command to upgrade your ESXi 6.x to 6.7

esxcli software profile update -p ESXi-6.7.0-13006603-standard -d /vmfs/volumes/your_datastore/VMware-ESXi-6.7.0-13006603-depot.zip

esxcli software profile update -p HPE-ESXi-6.7.0-Update2-Gen9plus-670.U2.10.4.1.8 -d /vmfs/volumes/prd.r60lun01/ISO/VMware-ESXi-6.7.0-Update2-13006603-HPE-Gen9plus-670.U2.10.4.1.8-Apr2019-depot.zip

After checking that your upgrade was successful, reboot your host. You should see a message saying that the upgrade completed successfully.

Troubleshooting

I have tried to get an error with:

Failed updating the bootloader: Execution of command /usr/lib/vmware/bootloader-installer/install-bootloader failed: non-zero code returned…. return code: 1”
Error when upgrading, due to “insufficient space”.

This problem is due to the SWAP is but on the installation of the ESXi, not a good thing. So let’s change it.

Go to the UI of the ESXi Hosts https://IP/ui, login and proceed to the following:

Manage > System > Swap > Edit Settings

Chose the dropdown and select a datastore. Apply and the swap space is not freed from the ESXi install device so that you can try to upgrade again.

Conclusion:

After the upgrade, it’s a good idea to disable the ESXi firewall rule for “HTTP outside access”. Stop and disable SSH again, but it’s optional 🙂

esxcli network firewall ruleset set -e false -r httpClient

Now you should have an upgraded host.

NSX 6.3.6 to 6.4.5 – Controller problem encountered

Posted on June 25, 2019 by Jesper Ramsgaard

NSX upgrades can be a delicate thing to upgrade, even though everything is in its finest shape.

After we successfully have upgrade the NSX managers we proceeded with upgrading of the NSX Controllers. We did pre-check and issued command “show control-cluster status” and it looked fine, upgrade to 6.4.6 went well and we could vMotion VMs around after the controller was booted. But post-checks was not ok, the “show control-cluster status” did not return as expected and we where not confident to proceed with the host upgrades.

After some trouble shooting we found that the /var/log partition on 2/3 of the controllers where full. Without any other evidence we concluded that this was the problem. After some google-fu we didn’t find any KB or blogs on how to purge logs.

But we found out that we could get into a engeering mode that would give us shell access. Long store short, we did the following:

1. https://kb.vmware.com/s/article/2149630 to gain shell access on manager
1.1 password is IAmOnThePhoneWithTechSupport
2. Extracting root passwords for controllers with /home/secureall/secureall/sem/WEB-INF/classes/GetNvpApiPassword.sh controller-nn
3. Loged into each controller, and issued : debug os-shell and thereby gain root shell access.
4. Deleted /var/log/syslog.1 on each node.
5. Rolling restart of controllers and after they booted they all joined the cluster.

After this we got the status as we wanted. In the mean while we had create a case with VMware support and the supporter was on a remote session with us. We told him what we have done, we verified that the controlleres was health and they where.

Next step, VIB upgrade on the hosts.

Good commands to know:

show process monitor

show controller list all

show control-cluster status

Edit: This article from VMware have the exact problem we encountered. We also contacted VMware Support, but before they where able to assist us we had the problem solved. 🙂
https://kb.vmware.com/s/article/59509

Process of getting the root password for controllers.

Tomcat9 and java12 on FreeBSD

Posted on May 28, 2019 by Jesper Ramsgaard

Quick post, had to make Tomcat9 and Java12 work together. The procedure is as follows:

1. pkg install tomcat9 (it will also install java8)
2. pkg install openjdk12

Now and edit /etc/rc.conf with a parameter to start tomcat on boot and set the tomcat java_home.

tomcat9_enable="YES"
tomcat9_java_home="/usr/local/openjdk12"

And not to part that took me a long time to figure out. In Java12, there is no longer a feature that tomcat is using in its startup parameters. But if you remove that from the init script you are able to start it up. The line is: Djava.endorsed.dirs=’/usr/local/apache-tomcat-8.0/endorsed’ \

command="/usr/local/bin/jsvc"
command_args="-java-home '${_tomcat_java_home}' \
        -server \
        -user ${_tomcat_catalina_user} \
        -umask ${_tomcat_umask} \
        -pidfile '${pidfile}' \
        -wait ${_tomcat_wait} \
        -outfile '${_tomcat_stdout}' \
        -errfile '${_tomcat_stderr}' \
        -classpath '${_tomcat_catalina_home}/bin/bootstrap.jar:/usr/local/share/java/classes/commons-daemon.jar:$
{_tomcat_catalina_home}/bin/tomcat-juli.jar${_tomcat_classpath}' \
        -Djava.util.logging.manager=${_tomcat_logging_manager} \
        -Djava.util.logging.config.file='${_tomcat_logging_config}' \
        ${_tomcat_java_opts} \
        -Djava.endorsed.dirs='/usr/local/apache-tomcat-8.0/endorsed' \<<<<<<<<<<<< Remove this line!!!
        -Djava.endorsed.dirs='${_tomcat_catalina_home}/endorsed' \
        -Dcatalina.home='${_tomcat_catalina_home}' \
        -Dcatalina.base='${_tomcat_catalina_base}' \
        -Djava.io.tmpdir='${_tomcat_catalina_tmpdir}' \
        org.apache.catalina.startup.Bootstrap \
        ${_tomcat_pipe_cmd}"

run_rc_command "$1"

After this, you are now able to boot tomcat9 with java12 🙂