Friday, 3 March 2017

Where Has My Licence Gone?!

"Who removed my licence?"
"Probably Microsoft. Your 30-day grace period has likely expired..."

Hi All,

Recently one of my clients reported that some users lost their O365 licence. They were working yesterday and no longer could log on today - the licence was wiped. Completely. No prior notice.

What was going on?

I searched the audit log. I could see the admin re-assigning the licence so that the user can work, but no trace of its removal.

To cut it short, it ended up on Microsoft's laps. After a couple of weeks of log analysis and investigation, it turned out that following a hybrid mailbox on-boarding, the O365 licence failed to be applied to a small group of users. Even though unlicensed, they were able to connect and use their e-mail because they were in the 30-day grace period. Once the 30-day grace period ended, the mailbox has been disconnected

Why didn't I see an entry in the audit log? Simple: A licence removal event didn't occur because there was no licence to remove in the first place - remember, the license assignment failed. Going back as far as when the mailbox was migrated, we could see an audit log entry for the failed attempt assigning a licence. This can happen when scripting it, and it is easily missed, especially when you have lots of accounts.

While this resolved the mystery, it also revealed a couple of shortcomings of O365:

  • We couldn't tell from the O365 audit log entries what licences were assigned or removed from the user. Microsoft pointed out during the case that O365 and Azure maintain separate audit logs, and the Azure log is more detailed. Not so the O365 log. You can see the activity though and who actioned it. NOTE: It may take up to 12 hours for the action to appear in the log.
  • O365 does NOT alert about the imminent end of the 30-day grace period.
  • Microsoft has very little documentation about the 30-day grace period.
On the topic of documentation, the engineer who worked on the case passed me this link, which states (clutter removed by me):
Assume that you have a hybrid deployment of Microsoft Exchange Online in Microsoft Office 365 and on-premises Microsoft Exchange Server...
If a license is not assigned to the user, the mailbox may be disconnected...
This issue occurs if the mailbox was migrated to Exchange Online as a regular user mailbox ... If the user isn't licensed, and if the 30-day grace period has ended, the mailbox is disconnected...

Now that I know what to look for, I've come across this link which states:
After you create a new mailbox using the Exchange Management Shell, you have to assign it an Exchange Online license or it will be disabled when the 30-day grace period ends.

Takeway #1: Always check licences after a mailbox on-boarding in a hybrid migration.

Takeway #2: Monitor your users regularly for their licensed status. Automatic alerting may be flaky, so if you are a developer then you may want to rock up an application and use the audit APIs to extract data and send alerts.

Takeawy #3: You need to search the correct audit log. There are a couple Security and Compliance centers in different places on the portal. The one you're after is under Admin centers | Security & Compliance, then in the new window navigate to Search & Investigation | Audit Log Search

Once there, select to search for the Update user and Changed user license activities in the User administration activities section:

Happy auditing!

Friday, 10 February 2017

Email and UPN do not belong to the same namespace

Hi There,

Just recently I helped out in a case where hybrid Exchange users with the mailbox in the cloud were failing to retrieve autodiscover configuration data.

It was an environment with multiple federated domains, and mailboxes split between on-premises and O365. On-premises mailboxes worked well, only O365 mailboxes were failing, and, more bizarrely, one of the domains worked while the others didn't.

As we know, mail clients for onboarded hybrid mailboxes go through a double autodiscover process:

  1. The first iteration discovers the on-premises mail user, which then redirects the client to the cloud mailbox.
  2. The client then goes through a second autodiscover process, this time against the cloud mailbox.

Looking at the Microsoft Remote Connectivity Analyzer output, I noticed that the first iteration succeeded. The issue was with the second pass. The error:

X-AutoDiscovery-Error: LiveIdBasicAuth:LiveServerUnreachable:<X-forwarded-for:><ADFS-Business-105ms><RST2-Business-654ms-871ms-0ms-ppserver=PPV: 30 H: CO1IDOALGN268 V: 0-puid=>LiveIdSTS logon failure '<S:Fault xmlns:S=""><S:Code><S:Value>S:Sender</S:Value><S:Subcode><S:Value>wst:InvalidRequest</S:Value></S:Subcode></S:Code><S:Reason><S:Text xml:lang="en-US">Invalid Request</S:Text></S:Reason><S:Detail><psf:error xmlns:psf=""><psf:value>0x800488fc</psf:value><psf:internalerror><psf:code>0x8004786c</psf:code><psf:text>Email and UPN do not belong to the same namespace.%0d%0a</psf:text></psf:internalerror></psf:error></S:Detail></S:Fault>'<FEDERATED><UserType:Federated>Logon failed "User@domain.tld".;

Needless to say, I checked on-premises and cloud user and mailbox properties, UPNs, addresses, connector address spaces, proxy addresses, the lot, to no avail. It was all configured correctly.

Then I checked ADFS. There I found a couple of errors, so I turned on debug trace. Nothing obvious there either, so I turned it off.

I tested ADFS logon via https://sts.domain.tld/adfs/ls/IdpInitiatedSignon.aspx: Successful.

ADFS was working well per se. Somehow, it was getting incorrect details from O365.

I also suspected that somehow the ADFS proxy was breaking the SSL stream - I dealt with a similar situation before. However this idea has been dropped when the original (3rd party) proxy was replaced with Microsoft's native Web Application Proxy and the issue remained.

To recap:

  • Users had matching e-mail and UPN
  • ADFS itself was working
  • Multiple federated domains
  • ADFS Proxy (WAP) ruled out

Then I decided to re-federate the domains. Since I had very little (a.k.a. none whatsoever) information on how it was set up initially, and no details on any deployment history, tabula rasa seemed in order.

So I converted the domains to Managed, then re-federated them. Since there are multiple federated domains, I used the Convert-MsolDomainToFederated cmdlet with the SupportMultipleDomain switch.

Coffee time, to allow some time to pass to do its things (it's a distributed environment and things don't happen instantly).

<suspense>Then came the test</suspense> ...  Lo and behold: everything started to work! Every federated domain, every service.

In summary: The hybrid environment and user accounts were configured correctly, yet the wrong details were passed by O365 to ADFS. Either the trusts were incorrectly configured, or the federation metadata got corrupted. Whatever it was, re-federating the domains fixed it.

Sunday, 29 January 2017

The Curious Case of GroupPolicy Error 1053

Recently I was involved in a case where a user was losing his Taskbar icons and email signatures. He had all possible folders redirected to a remote network share over a WAN link. The icons are stored in the AppData (Roaming) structure and I wanted to bring it back to the local computer (a.k.a. cancel redirection for the AppData folder) without affecting other folders.

I knew I had to fiddle with some GPOs, and this is where the fun started.

I created a new GPO, applied security to only scope it to this one user, and applied GPUPDATE. It bombed out:

The Details of the same GPO suggests that the user is disabled:

ErrorCode: 1331
ErrorDescription: This user can't sign in because this account is currently disabled.

I new network connectivity wasn't an issue. I checked DNS, although computer settings were applied successfully. This uncovered a whole heap of inconsistencies which I corrected, however it didn't affect my GPOs. I even turned on USERENV logging - nothing useful in there either, and neither in the GroupPolicy log.

I new the account wasn't disabled because I was logged on with that account.

I was banging my head against the wall. Then I came across a forum where someone suggested to clear the credentials vault.

Checked the vault, and surprises surprise, found the credentials of another user, and yes, that user's account was DISABLED! Enabled the account, and GPUPDATE now reported that the credentials are invalid. Indeed, I noticed Security-Kerberos Warning 14 events in the user's System log to which I didn't give much attention before:

I was getting somewhere.

So apparently, once upon a time, my user had to access another user's resources for whatever reason. However, in time, the other user moved on, his password was changed and the account was disabled. The credentials in my user's Credentials Manager vault were not refreshed or removed.

There was no trace anywhere in any log that I could find of the account in the vault: not in the USERENV log, not in the domain controller's Security log - nowhere. Not one hint.

Clearing the obsolete account's credentials from my user's Credential Manager fixed it: the GPOs are now applying happily.

Takeaway point: GroupPolicy Error 1053 events in the System log with ErrorCode 1331 is likely caused by wrong credentials of another user in Credential Manager. Go check your vault and clear/update anything suspicious.