Tuesday, 2 June 2015

Confused QMM Exchange Native Move Jobs

Setting the Scene

In a recent migration engagement I've come across an error, the cause of which, to date, is unknown even to Dell, yet it seems to be associated with target Exchange server availability.

The target Exchange server crashed due to some underlying storage issues while a couple of Native Move mailbox migration jobs were in progress.

I confirmed that from an Exchange server perspective the mailbox moves survived the crash (Get-MoveRequest | Get-MoveRequestStatistics), because the databases failed over to the surviving DAG member. Once the server was recovered and the databases moved back to the original server, I clicked Retry Move in the QMM console. I was expecting that QMM will update itself by querying Exchange and getting fresh information. Instead, QMM stuck its tongue out at me with the following error:

"Parameter set cannot be resolved using the specified named parameters."

The Issue

When in Native Move migration mode, QMM is nothing more than an easy to drive GUI which behind the scenes builds New-MoveRequest PowerShell commands and submits them to the target server.

QMM keeps track of the migration progress in its own SQL database. In a way this is good because it provides access to quick status and configuration information. However, in the case of Native Move jobs, where QMM is no longer in control of data movement between servers, QMM still appears to look at its SQL database as THE source of truth, as opposed to what the Exchange server is reporting. As a result, things get out of sync quite easily with nasty consequences in terms of time and cost of the migration project.

In case of Exchange server failure QMM appears to pollute its SQL database with wrong data. Common sense dictates that if a move request has successfully been submitted and in progress for some time, one would query the Exchange server to obtain fresh progress information and update its records accordingly. QMM, however, appears to looks exclusively at its own database with no regard to Exchange status reports. When I click "Retry Move", QMM fails to detect that a move request is already in progress despite the fact that its very own database contains a record of a successful command submission, which in itself is a hint that QMM should go and ask Exchange for an update. Instead it attempts to build a new move request command using polluted data from its SQL database.

Specifically, this is what happened in my case:
  • Fact: QMM submits New-MoveRequest commands on the target server. Data is "pulled" by the target server from the source server. Therefore parameters are interpreted from the target server's perspective.
  • Considering the point above, the correct parameter to use would be -TargetDatabase. Instead QMM uses -RemoteTargetDatabase, as if data would be "pushed".
  • When the target Exchange server tries to execute the command, it encounters the -RemoteTargetDatabase parameter pointing to its very own, local database.
  • The target Exchange server scratches its head: "I own this database, why on earth is the command telling me that it's remote???"
  • Quite rightly, when QMM attempts to submit the wrongly built command, Exchange gets confused and throws the error.

The Fix

Dell was unable to put the finger on the cause. However they did come up with the following workaround:

  1. Open SQL Server Management Studio. In the QMM Exchange database, locate and open the MAILBOX table.

  1. Find the failing mailbox in the MAILBOX table and take note of the value in the ID field.
  1. From the MAILBOX_PROCESSING_PROPERTY table, delete the entry matching the ID identified above:

  1. If there is a mailbox move in progress (check in the Exchange Management Shell), then cancel it. Then, in the QMM console, retry the mailbox move. This time the correct command is assembled and successfully submitted.

Other Errors

Some other errors I've come across with Native Move jobs:

Error: Mailbox 'Migrated_User' is already being moved to 'Target_Database'.

Reason: QMM spawns multiple New-MoveRequest commands in rapid succession, a couple of seconds apart, before Exchange gets a chance to process the first instance. Effectively QMM appears to be a tad too impatient, not allowing sufficient time for Exchange to process the first command, thus causing QMM to believe that its request went no-where. Thus it will issue the second instance of the same command. However, by then, Exchange would have already processed the first command, and the second attempt results in the above error.
Workaround: Highlight the mailbox with this error and click Retry Move. This forces QMM to query Exchange and update the status of the job.
Note: Dell did neither confirm nor deny this to be a bug. The case was closed without a resolution.

Error: Failed to retrieve RootDSE at 'Source.Domain' under 'DOMAIN\account'
Reason: The domain controller configured in QMM was down for maintenance or otherwise unavailable (e.g. network down).
Workaround: Highlight the mailbox with this error and click Retry Move. This forces QMM to retry the connection to the domain controller.
Note: Dell did neither confirm nor deny this to be a bug. The case was closed without a resolution.

These errors occurred for about 25-30% of the entire migrated user base. Cutover schedules (or mailbox switches) were dead in the water, given that a human had to push a button to retry the jobs.

Since one cannot rely on the schedule to switch over mailboxes, you'll have to pay someone to watch it overnight and push the button as needed, or inconvenience users during the day.

Final Thoughts

QMM for Exchange is a great tool and it can streamline much of the work. Its Native Move engine however is riddled with bugs and it virtually lacks any smarts to recover from Exchange and AD server or network failures. It also seems to have a flawed logic in keeping up with where the Exchange servers are up to. Its scheduling options are great, provided that during the migration the domain controllers and Exchange servers stay available.

A word of caution: QMM doesn't work well with DAGs. Failing over databases to other DAG members doesn't go down well with QMM. Regardless what others are saying, this was my experience. It may change in the future as Dell improves the product.

At the time of this writing, however, if you will only use it in Native Move mode, then carefully consider the cost of the software versus that of a consultant who would build the scripts which could be used in conjunction with QMM-AD.

You would have to use Native Move if:
  • You have lots of remote users with poor links and large mailboxes, and thus you want to preserve the OST file.
  • The target server is Exchange 2013.
  • You migrate to a resource forest (users will keep logging in to their legacy domain with their legacy credentials).
If all three conditions are true then the only option that preserves the OST file is Native Move. In this instance you might find that using QMM-AD with custom Exchange PowerShell scripts will be more cost effective.

Hope this will save some headache for someone out there.