vCloud – Changing SSL certificate

In this post, I will explain how to install a public certificate into vCloud Director cell(s). This exact environment has a public signed cert that is up for renewal. A new certificate has been bought and signed and is ready to import.

vcd cells have 2 IP addresses that allow support for 2 different SSL endpoints (http and consoleproxy). Each endpoint requires its own SSL certificate. vCloud Director uses a java keystore to read its SSL certificates from.  In a multi-cell environment, you need to create 2 certificates for each cell and import the certificates into the vcd java keystore. But since we hare here using a wildcard certificate the same certificate will be used to but endpoints.

The new certificate have been created with a CSR that was not generated from the vCloud cells, so we need to import both private and public key from an export of the certificate. In this case it’s a .PFX.

Certificate is a wildcard. If you are using a UCC SAN certificate with the exact names then be sure that the names in certificate are matching accordingly to vCloud settings.

I assume you got

  • Already working/configured vCloud environment
  • New public signed certificate exported to a .PFX format (contains both public and private key)

We will

  • Connect to cell with winscp and transfer the .PFX to /tmp/
  • Connect to cell with Putty
    • Create a new keystore with the new certificate
    • Stop vcd service
    • Swap old keystore with new
    • Start vcd service

Initialize certificate change…

winscp copy the .pfx to the cell tmp directory

The commands for creating the new keystore and importing the cert is below. Change the STOREPASS and KEYPASS to something meaningful for your environment. It is also important to notice that the alias of each certificate must be “http” and “consoleproxy”. Else vcd won’t find the certs.

A note about the alias, I have seen it generate GUID but also just numbers. So if your list command is showing “1” then you need to change alias 1 to respectively http or consoleproxy.

To see if the cell have booted correctly you can tail the cell log. It will give you a “startup completed in x”.

I got more than one cell…

That’s awesome – me too. You can scp the certificate from the cell with the new cert to the other cells. So let’s get that newly created keystore over the other cells.

I hope you enjoyed reading this post. Feel free to share this on social media if it is worth sharing.

Ceph MDS stuck in ‘rejoin’

CephFS filesystem suddenly dies, what do you do? Well, It’s relaying on the MDS(MetaDataService) to keep an online filesystem. When looking at the Ceph status it gives us that the MDS cache is oversized and files system is degraded. This is only health warning, but the filesystem is not available due to it, that’s good in a way because then there is nothing wrong with the Ceph cluster itself. Looking more into the problem, MDS seems to be in a recovery limbo state. hmm…

MDS services are in “rejoin” limbo state. Never coming back up.

Looking at the MDS status documentation the rejoin state indicated that it’s trying to load the old cache back in before going into “up” state.

Closer looking at the logs the process is stating over and over again, but never finishes. This is due to a mechanism from the monitors kicking in and restarting the MDS service when not responding in to cluster with up in a timely fashion. So it never gets done with what’s its doing. Hmm, and CephFS is still unavailable.

Trying to set mds_beacon_grace to a wicked number did also not help, don’t know if the grace should be doing anything. But going from the default 15 to 1500 did not help. I was hoping to give the MDS time to load in the old cache.

From the logs its respawning the MDS due to lost contact to the cluster.

Going through an endless number of Ceph thread I were reading that others have encountered this exact problem. This thread gave me the info on how to remediate and get back online with the cluster.

Setting the MDS to wipe all client session where the first one. I set this though the Ceph GUI since I find it easier to find compared to the CLI command.

Default its “false”, but setting it to global “true” no client connections are made.

The second one was to delete the MDS “mds*_openfiles.0” from the CephFS metadata pool. Looking into the pool I could the there where many objects referring to open files. But the post stated only to delete the .0 objects. Need to be done for all the MDS services that you have running. The “openfiles” objects are open file hints. It’s safe to delete them. Read more on rados commands on

### Delete for all the MDS you have running. mds1,mds2, mds3 etc.
[root@dspp-mon-a-01 cephadm]# rados -p cephfs_metadata rm mds0_openfiles.0

After deleting the open file objects I stopped all MDS services on all nodes. Some of them did not stop, so I killed the process. Probably should have stopped them first before deleting the open file objects…..

[root@dspp-osd-a-06 cephadm]# systemctl stop

After starting up the MDS services again it recovered in a couple of seconds. CephFS is available and “ceph -s” showing healthy condition. Set the wipe_sessions back to false and now CephFS could be mounted again.

What to conclude? there is a fix in 14.2.5. So when it’s ready its is time to update the Ceph cluster. The ticket for it should be this one. These guys also did a good help to resolve the problem

Disk mapping Windows/VMware

Since I’m working in a datacenter department at a service provider automation is a big thing. We have lots of different automatic workflows already. Everything from reading out power usage for co-location customers to creating a fully functional virtual datacenter with VMware vCloud Director.

The latest idea was to create an automatic disk expansion service. We monitor the customer’s environments with PRTG and call to help them with an expansion when more disk space is needed. But that’s only within business hours and of our service desk are busy we don’t always make the expansion in a timely fashion. For an exchange server, this is bad, full-disk means no mail flow.

Our backend developer(super skilled guy) extended the service agent that we run on all customer servers, with a new data collector that looks for free space and disk-identifiers. If a disk is running full he will create a RabbitMQ ticket that will trigger a vRealize Orchestrator workflow that finds the disk and expands it. Then reports back to his services so that his service can expand the disk from within Windows.

Identifying Windows disk from VMware environment

Our google foo was giving the same result over and over again, we should look at the SCSI ID. From within Windows, you can get the LUN ID and what controller its located. That position should then be the same as seen from VMware side.

While testing it on Windows 2016+ this worked ok. BUT we have customers that are still on Windows 2012, and here it didn’t work. *Sigh*. If the VM where having multiple controllers then we could not see what UnitId were to attach to the corresponding Controller Id. So back to the drawing board.

### From VMs Id and ControllerId and UnitId the disk that needs expansion is found. 
#$vmDisk = (Get-VM -Id $vmid | Get-HardDisk) | where { $_.ExtensionData.ControllerKey -eq ((Get-VM -id $vmid | Get-ScsiController ).ExtensionData | where { $_.BusNumber -eq $ControllerId }).Key } | where { $_.ExtensionData.UnitNumber -eq $SCSITargetId }

### Afterwards the disk can have the added capacity.
$vmDisk | Set-HardDisk -CapacityGB ($vmDisk.CapacityGB + $ExpandSizeGb) -Confirm:$false

We then kept looking but could not find anything in particular. Thinking about a physical disk having a serial number we began to pursue that idea, the VM should see the UUID that VMware where presenting. And yes, this sure seems to be working a Windows 2008 through Windows 2019.

VMware VM extension data – UUID

With the disk serial number approach, it was also easier to find the disk.

### UUID can be found in the VM extension data.
$vmDisk = (Get-VM -Id $vmid | Get-HardDisk) | Where-Object {$_.ExtensionData.Backing.uuid.Replace("-","") -eq $disksn } 


Don’t know why other people are not suggesting the disk serial number approach instead of the SCSI ID. But my theory is that many looks at what data they can get from the vCenter GUI. And here the SCSI ID based on controller id and unit id is the only thing really available.

But there is a lot of nice data when using PowerCLI to look at the data. Especially when doing automation.

Cisco ASA cluster – upgrade

Cisco seems to have a good track record of there products, but I must say that there ASA firewalls have seen a lot of critical bugs in the last couple of years. Both in hardware and software…

The last critical bug I was not informed about, so didn’t catch it before the customer did. Always nice when a customer calls in with the problem of there primary ASA being down. It crashed in a way that meant that it did not come up again. It needed a physical reboot.

Before having the chance to have someone onsite locate the firewall and reboot it that secondary also died. And did not come up again! Customer needs to get online again, so there was no time to get a console cable and see what the heck was going on. So I told them to do a hard reboot on both firewalls. After the ASA booted they both became active again and could see each other. Great, customer online. But why and how.

Contact with Conscia Cisco support could confirm that the exact issue has been hitting multiple customers. Due to a bug, the firmware did a memory buffer overflow when being hit by a specific udp/500 attack. Great, now we know the problem and the fix is to upgrade ASA firmware.

It’s not something I do often, and I always forget to write down to procedure, so here goes.

Upgrade procedure

  1. Have a look at the cisco ASA upgrade guide, to see what version you and on and what is supported to go up to. I were on 9.8.2 and could go up to 9.13.x. So I did.
  2. Download and upload firmware to BOTH members of the cluster
  3. Change the boot image to the newly uploaded image
  4. Update the secondary, make a failover
  5. Update the primary and make a failover
  6. Done

Uploading the images to both nodes with TFTP.

I used the portable version of Tftpd64 by Jounin, simple and works out of the box. Copied the freshly downloaded images to both nodes.

### Primary
DS-ESB-ASA5516x# copy /noconfirm tftp:// disk0:/asa9-13-1-lfbff-k8.SPA
DS-ESB-ASA5516x# copy /noconfirm tftp:// disk0:/asdm-7131.bin

### Secondary
DS-ESB-ASA5516x# failover exec mate copy /noconfirm tftp://$
DS-ESB-ASA5516x# failover exec mate copy /noconfirm tftp://$

Change config to the new image

So now we will change over the config so that it will use the new boot images that we have uploaded. First, we remove the existing boot image, and afterwards, we set the new image together with the new ASDM image.

### Show current boot image
DS-ESB-ASA5516x# show running-config boot system
boot system disk0:/gf/asa982-20-lfbff-k8.SPA

### Remove existing boot image
DS-ESB-ASA5516x(config)# no boot system disk0:/gf/asa982-20-lfbff-k8.SPA

### Add new boot image that you just uploaded
DS-ESB-ASA5516x(config)# boot system disk0:/asa9-13-1-lfbff-k8.SPA

### Reload the standby node for the new firmware to take effect
DS-ESB-ASA5516x(config)# failover reload-standby

### Look at the output from show failover, check if the standby is up and verify the firmware version.

Failover and reload the second node

So now the secondary node is booted with the new firmware, time to failover to it so we can reload and have the new firmware running on the primary node. When doing the failover you might lose the SSH connection, just connect again. This time you will be connected to the second node, that is not the active node. Reload the primary, that is now standby and wait for it up come up. It will show in the console that its sending config to mate. Just like when we did it with the first reload of the standby, secondary node.

### Controlled failover to secoundary, standby node
DS-ESB-ASA5516x# no failover active
### reload the primary, standby node for firmware to take effect.
DS-ESB-ASA5516x# failover reload-standby

You are done

Now it’s only to test, and if you want, failback to the primary. But that up to you. I did not lose one ping through the upgrade process. So that cluster is indeed working as it should. While you are at it then why not also update the AnyConnect client and remember to clean up the flash so the old versions and file won’t fill it up. Enjoy your newly updated cluster.

Prevent Drive Failure at 32,768 Hours

This is one of the nasty bugs. Some SSD models will fail after they have been powered on for more than 32768 hours. Imagine running vSAN and you bought x amount of disks that where affected. They will all fail at the same time, so you are left alone with your backup(hopefully).

I seen this one time before, where Intel disks where the problems. Unfortunately the Intel SSDs where metadata disks in a Ceph storage cluster, and since they all failed at the same time, the cluster died!

This is of cause due to that nobody where informed of the bug. When buying hardware from HPE and other enterprise hardware vendors we cat a mail letting us know of the problem before it becomes a disaster.

Update procedure – VMware

We had to do a firmware update of the disks, we are running VMware and vSAN. And gladly HPE have allready released the patch. Also with guidance for VMware.

  1. download the patch, copy it to /tmp of the ESXi servers.
  2. Unzip and make the .vmexe file executable chmod +x CP****.vmexe
  3. Put one of the hosts into maintenance mode.
  4. Run the CP***.vmexe – ./CP****.vmexe. It will lists the disks that it found and you tell it the disk numbers for those you want to have firmware upgraded.
  5. After upgrade I did a reboot anyway.

Remember that reboot of vSAN nodes can take a long time, 10-30 min. On the console of the server it says: “vSAN initialising SSD XXX” Give it time, it will boot.

Fetching firmware version

When you use the HPE custom VMware image then we have all the HPE tools on the server, so that we can query hardware etc.

  1. cd /opt/smartstorageadmin/ssacli/bin
  2. Execute ./ssacli ctrl slot=0 pd all show detail

For more a command cheat sheet you could look at or the official documentation

This will give you all info on the disks behind the controller. The model number is the one that you can look up on HPEs site to see if its affected.

vRO – start a workflow with AQMP message

System to system interaction can be hard. API integrations are a way of doing it, but we can also use a message bus. Actually I think that using a message bus is a very awesome way of doing it, it’s a very loose couple between systems and we can queue multiple things and have the task or messages in RabbitMQ until system is ready to consume the messages.

This is a guide of using vRO to pick up the RabbitMQ message and start a workflow with the payload of the message.

Adding RabbitMQ to vRO:

  • Open vRO legacy client, login
  • Expand Library > AMQP > Configuration
  • Start the workflow Add a broker. This will pair vRO and RabbitMQ.
  • Follow the wizard. I’m using a virtual host, but input all depends on your RabbitMQ config.
Running the workflow “Add a broker” to pair vRO and RabbitMQ.

Subscribe vRO to a RabbitMQ queue:

From the same position in vRO, now run the workflow “Subscribe to queues“. This will make vRO aware of the queue and be able to use it as a trigger.

Subscribing to a RabbitMQ queue.
Give the queue subscription a name.
Choose the broker we added in the section before.
Add the name of the queue from RabbitMQ, must be identical from what the queue name is in RabbitMQ.

Now vRO is monitoring the queue, we are ready to proceed.

Creating a policy:

Now we need to create a policy that can tie the event of a message from RabbitMQ into a starting a workflow.

Navigate to the policy fane in vRO and create a new policy.
Inside the new policy, add a new policy element.
Choose the AMQP:Subscription.
From the popup you can now filter for the queue that you earlier made a subscription on.
Right-click the queue name and chose Add trigger event
Choose onMessage
Holding focus at the “onMessage” event we can choose a workflow that the policy will start on message.
You will need to search, this time its for the workflow that you want to start when there is a message in the queue.
Now we will need to map a variable to the payload of the message that we have received. You need to have an in variable declared as a string in the workflow that you have mapped to.
The in variable from your workflow will show and by double-clicking on the name of the variable you are able to choose the “event.key.body” to it. This will make the payload of the message become a variable for your workflow.


That was a lot of screen dumps, I hope it still makes sense.

Now you should make the policy start when vRO starts and for now also start the policy so that you can see it works. The case that is shown in the screen dumps will have a JSON as the message payload, it then sends it to the messageBody variable and then inside the script it will extract the values that it needs in for the workflow to run.

Example of how to parse the JSON to a JavaScript object and afterwards parse the needed parameters over to a new variable. In this case its parameters for a PowerShell script.

Killing a virtual machine the esxcli way

Rarely I run into a ghost VM. I can’t do anything with the VM from vCenter or local UI for the ESXi host. It looks like it’s powered off but in fact, it’s still running in a sort of ghost state. You can vMotion all VMs of and reboot that hosts that the ghost VM is on. Sometimes its standalone hosts and then killing the VMs world with esxcli is easier.

  1. Connect to the ESXi host with ssh
  2. Get a list of all running VM worlds
esxcli vm process list

3. Identify the world from the output and take note of the World ID. From here we will kill the world. Start with type soft and then escalate it if it doesn’t work.

esxcli vm process kill --type=[soft/hard/force] -–world-id=ID

VM should now be killed, the VMX files are unlocked and you can manage the VM with the GUI tools again. If it didn’t work then you are left with the option to reboot the host containing the ghost VM.

Clear Windows print queue

Even as an infrastructure guy in a datacenter you now and then have to deal with printers. The small beasts with there own life and horrible drivers! I always forget the path for where the spooler puts its temp files. So now I put it here for future me to find it again next time I have to clear spooler files and restart det print services.

GUI way is slow and by having the CMD edition you can always guess what should be done in the GUI.

  1. Open an elevated command prompt.
  2. Type net stop spooler then press “Enter“.
  3. Execute del %systemroot%\System32\spool\printers* /Q
  4. Type net start spooler then press “Enter“.
  5. The print queue on your Windows should now be cleared.

ESXCLI host upgrade procedure

Most of the time you would want to use VMware Update Manager when doing upgrade. Its part of vCenter and is necessary tool when having to maintain your environment. But for smaller deployments, with standalone hosts and no vCenter the following upgrade methods are desired and can help the upgrade time. Instead of having to upgrade with IPMI and an ISO.

Online mode:

This method is for getting the update online, no need to download ISO/offline bundles, etc. This will work for most of the upgrade use cases.

1: Connect to your ESXi host via the host client and enable SSH. Afterward ssh to the ESXi host and enable ESXi firewall rule to allow the host to access the internet.

esxcli network firewall ruleset set -e true -r httpClient

2: With the beneath command you will get a list of available ESXi packaged that are on the VMware repos. Enter this command to list all available profiles. We filter only those which are relevant to our case – upgrade to ESXi 6.7

esxcli software sources profile list -d | grep -i ESXi-6.7

3. Chose the desired profile and use the following command for choosing and upgrading the ESXi version. Before upgrade its a good idea to enter maintenance mode.

esxcli system maintenanceMode set --enable true
esxcli software profile update -p ESXi-6.7.0-20190402001-standard -d https://hostupdate.vm

4. After it’s done, you will need to restart the host, after its rebooted you will run on the new ESXi version.

Custom, with Offline bundle:

This method is for when you desire to install a custom update, or that your hosts down have access to the internet.

1: Download the offline bundle from the VMware webpage, in this upgrade I will use an HPE custom version. But if you run a generic version, that will also work.

2: After downloading the “” file, place it (upload it) to a datastore which is visible by your ESXi host. Best would be a local datastore if this host has some. If not, it can also be a shared datastore too.

3: Find the profile name from the depot offline bundle

 esxcli software sources profile list -d /vmfs/volumes/prd.r60lun01/ISO/VMware-ESXi-6.7.0-Up

Put your host into maintenance mode, enable SSH if you haven’t done yet.

3: Execute this command to upgrade your ESXi 6.x to 6.7

esxcli software profile update -p ESXi-6.7.0-13006603-standard -d /vmfs/volumes/your_datastore/

esxcli software profile update -p HPE-ESXi-6.7.0-Update2-Gen9plus-670.U2. -d /vmfs/volumes/prd.r60lun01/ISO/

After checking that your upgrade was successful, reboot your host. You should see a message saying that the upgrade completed successfully.


I have tried to get an error with:

Failed updating the bootloader: Execution of command /usr/lib/vmware/bootloader-installer/install-bootloader failed: non-zero code returned…. return code: 1”

Error when upgrading, due to “insufficient space”.

This problem is due to the SWAP is but on the installation of the ESXi, not a good thing. So let’s change it.

Go to the UI of the ESXi Hosts https://IP/ui, login and proceed to the following:

Manage > System > Swap > Edit Settings

Chose the dropdown and select a datastore. Apply and the swap space is not freed from the ESXi install device so that you can try to upgrade again.


After the upgrade, it’s a good idea to disable the ESXi firewall rule for “HTTP outside access”. Stop and disable SSH again, but it’s optional 🙂

esxcli network firewall ruleset set -e false -r httpClient

Now you should have an upgraded host.

NSX 6.3.6 to 6.4.5 – Controller problem encountered

NSX upgrades can be a delicate thing to upgrade, even though everything is in its finest shape.

After we successfully have upgrade the NSX managers we proceeded with upgrading of the NSX Controllers. We did pre-check and issued command “show control-cluster status” and it looked fine, upgrade to 6.4.6 went well and we could vMotion VMs around after the controller was booted. But post-checks was not ok, the “show control-cluster status” did not return as expected and we where not confident to proceed with the host upgrades.

After some trouble shooting we found that the /var/log partition on 2/3 of the controllers where full. Without any other evidence we concluded that this was the problem. After some google-fu we didn’t find any KB or blogs on how to purge logs.

But we found out that we could get into a engeering mode that would give us shell access. Long store short, we did the following:

1. to gain shell access on manager
1.1 password is IAmOnThePhoneWithTechSupport
2. Extracting root passwords for controllers with /home/secureall/secureall/sem/WEB-INF/classes/ controller-nn
3. Loged into each controller, and issued : debug os-shell and thereby gain root shell access.
4. Deleted /var/log/syslog.1 on each node.
5. Rolling restart of controllers and after they booted they all joined the cluster.


After this we got the status as we wanted. In the mean while we had create a case with VMware support and the supporter was on a remote session with us. We told him what we have done, we verified that the controlleres was health and they where.

Next step, VIB upgrade on the hosts.

Good commands to know:

Edit: This article from VMware have the exact problem we encountered. We also contacted VMware Support, but before they where able to assist us we had the problem solved. 🙂

Process of getting the root password for controllers.

Tomcat9 and java12 on FreeBSD

Quick post, had to make Tomcat9 and Java12 work together. The procedure is as follows:

1. pkg install tomcat9 (it will also install java8)
2. pkg install openjdk12

Now and edit /etc/rc.conf with a parameter to start tomcat on boot and set the tomcat java_home.

And not to part that took me a long time to figure out. In Java12, there is no longer a feature that tomcat is using in its startup parameters. But if you remove that from the init script you are able to start it up. The line is: Djava.endorsed.dirs=’/usr/local/apache-tomcat-8.0/endorsed’ \

After this, you are now able to boot tomcat9 with java12 🙂

Getting all domains from Office365 tenants

Mail spoffing etc. is a big problem, there are technologies that can help, but many domain owners have not yet implemented them. To help our customers we have started to monitor and see if the SPF, DKIM and DMARC policies have been implementened, and if not we can help 🙂

Our own spamfilter solution have a button that gives you an export over all the domains, nice and easy, but Office 365 CSP portal doesnt.

So there is a quick script to help with that. Next post will hopefully contain the checkscript for if the domain have implemented SPF, DKIM or DMARC.

NSX Edge PowerShell manipulation

This is from a VMware support experience. A customer could not change DNS server parameters of the NSX Edge IP Pool. But actually is was a problem due to a bug in VCD 9.5, where a Edge XML config was missing some tags and therefor not being able to validate the XML when VCD post the edited XML config back to NSX manager.

I have attached VMware support answer in the bottom of the post.

Script will get all edges from the NSX manager, then you find the correct one and fill into the next part of the script. Then you get the XML down to a file on your local machine, you then edit the file and put in the missing tags and lastly PUT the XML backup NSX manager. After this operation, it works from the GUI again.

From Support:
– The issue you are seeing is a known issue 9.5.
– Like I mentioned in the previous email, this is due to missing elements from the xml.
– From the xml in the logs, I could see there are 52 NAT rules on that edge.Correct me if I am wrong. The following 2 rules had the elements missing

I have attached the file with the list of all the NAT rules seen from the logs if you need to cross-verify.

– To fix the issue,please follow

If you have any further questions,let me know.

Have a good evening,

Best regards,


Make a clone of VMs to NAS – The PowerCLI way

Quick post, had a customer that yearly wants a clone of their VMs, copied to a NAS, and then shipped to customers HQ. The owner of the company put this as a requirement. Fair enough. I have almost always done the clone of the VMs by GUI, in the start, this was easy because they only had 5 servers, but they now have more. So this time I wanted to try and script it instead. It took me some extra time, but in the end, I think it’s worth it. My PowerShell skills are not great, still learning so bear with me.

FreeNAS installation failed – Operation not permitted

Ran into an annoying problem, have been having this problem multiple times in the past, and never remember what the fix is. So now it’s on the blog for the next time that I need it. Use the shell option in the FreeNAS installer, the disks that I wanted to install onto was ada0 and ada1.

1. sysctl kern.geom.debugflags=0x10
2. dd if=/dev/zero of=/dev/ada0 bs=512 count=1 && dd if=/dev/zero of=/dev/ada1 bs=512 count=1

This will wipe out the sectors that keep the partition schema and afterward, you can install FreeNAS without problems.

ESXi – physical memory population

Had en interesting problem where a ESXi host only showed it had 30GB of memory, but the motherboard was populated with 6*8GB modules. In earlier versions of ESXi 5.5< it was possible to use dmidecode to show how the physical hardware was populated. But since 6.0> that have been removed.

The new command to find those kind of information are now “smbiosDump”

You can also just run smbiosDump without any paramenters and you get a hole lot of information to crawl through.

Timedrift – domain controller and clients

Customers domain controllers where both virtual and due to CPU congestion it seems that time had been drifting.

So it was 5 minutes behind, and so was the clients. The fix is as follows.

1. Find the DC that have the PDC role.

2. Issue the follwing command to sync the time with some of the servers.

3. After the time on the PDC again is correct, then issue following on the other domain controllers, that are not PDC.

4. let the clients resync there time, either wait for it to happen or issue the following

Freebsd – Going from stable to release

I outdated FreeBSD 10.1-Stable server needed to be updated for it to install packages again. Problem was, it was deployed from stable, i normally never use stable because it not production ready, its a development branche. But this server was stable and here are the steps to get it to a release train.

1. Update the source tree on the server.

2. Follow the link below. Only change i did was to “-j 6” when making so that it used multiple cores.

After the merge-ui command you just choose “later” for all the promt with merge of config files.

And volia, The server is updated to FreeBSD 10.4-release. Then you can update with the binary process, freebsd-update upgrade -r 11.2-RELEASE.

In case you need to know more about the stable vs. release, here is a link i found en the FreeBSD forum.

P2V of a FreeBSD

Was looking for someone that had done a P2V of FreeBSD, but me google skills was not good enough. So here it goes.

1. Make a VM with disk a bit larger then the source phycial machine.
2. Boot the VM on a FreeBSD live cd
3. Give it a IP address

4. Make NC listen for any input on port xxxx ( )

5 on the physical maskine you run DD and pipe it to NC.

6. wait for it to finish….

Ceph – slow recovery speed

Onsite at customer they had a 36bays OSD node down in there 500TB cluster build with 4TB HDDs. When it came back online the Ceph cluster started to recover from it and rebalance the cluster.

Problem was, it was dead slow. 78Mb/s is not much when you have a 500TB Cluster. So what to do?

There a several settings within Ceph you can adjust. Here are the two settings that worked for me.

osd max backfills:
Description: The maximum number of backfills allowed to or from a single OSD.
Default value: 1

I set it to 8, and the recovery went to 350Mb/s. Set it to 16 and recovery was 700Mb/s, but clients where also affected. So 8 was a more moderat setting.

osd recovery max active

Description: The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests places an increased load on the cluster.
Default value: 3

Set it up a notch to 4.