Unable to add a second SCOM 2016 MS: Event ID 1008

Recently i came across a strange situation where i was unable to add a second management server to the existing scom management group.

I had taken care of all the pre-requisites like:

  • SCOM actions, SCOM sdk were admins on the second management servers as well as on the DB servers
  • Even the account and computers accounts were admin on both RMS and DBs and DW
  • Firewall ports are opened and traffic is allowed

still the second SCOM MS would not get installed

SCOM secondary MS setup error

It will fail at the stage of Data ware house configuration and roll back everything.

 

Now, the only error in event id under Application log is:

The Open Procedure for service “MOMConnector” in DLL “D:\Program Files\Microsoft System Center 2016\Operations Manager\Server\MOMConnectorPerformance.dll” failed. Performance data for this service will not be available. The first four bytes (DWORD) of the Data section contains the error code.

Event ID 1008 Source: Perflib

i researched online a lot but nothing solved. until i got another error in the scom setup log file.

Below is part of the error, for you to understand the context

15:38:43]: Info: :Finished evaluation of rule ‘NewDBForConfigureDataWarehouseForAllServersRules’

15:38:43]: Info: :Finished evaluation of rule ‘NewDBForConfigureDataWarehouseForAllServersRules’

[15:38:43]: Debug: :Action ConfigureDataWarehouseForAllServers will not be needed.[15:38:43]: Always: :Done validating action list; now running individual actions.[15:38:43]: Always: :Current Action: GetCommonProperties

[15:38:43]: Info: :Info:Getting Common Values for Server Postprocessor

[15:38:43]: Info: :GetCommonProperties completed.

[15:38:43]: Always: :Current Action: StartServices

[15:38:43]: Always: :Starting OM Services.

[15:38:43]: Debug: :StartService: attempting to start service OMSDK

[15:38:43]: Debug: :StartService: Able to start the service OMSDK after 0 minutes.[15:38:43]: Debug: :StartService: attempting to start service healthservice

[15:38:43]: Debug: :StartService: Able to start the service healthservice after 0 minutes.[15:38:43]: Debug: :StartService: attempting to start service cshost

[15:38:43]: Debug: :StartService: Able to start the service cshost after 0 minutes.[15:38:43]: Info: :StartServices completed.[15:38:43]: Always: :Current Action: GetDataReaderWriterAccounts

[15:38:47]: Error: :GetAccountForAProfileFromManagementGroup error: Threw Exception.Type: System.InvalidOperationException, Exception Error Code: 0x80131509, Exception.Message: Sequence contains no matching element

[15:38:47]: Error: :StackTrace:   at System.Linq.Enumerable.First[TSource](IEnumerable`1 source, Func`2 predicate)   at Microsoft.EnterpriseManagement.OperationsManager.Setup.ReportingComponent.GetAccountForAProfileFromManagementGroup(ManagementGroup managementGroup, String profileGuid, Guid managementTypeId, String& userName, String& userDomain)[15:38:47]: Error: :GetDataReaderWriterAccounts failed with the following exception: : Threw Exception.Type: System.Reflection.TargetInvocationException, Exception Error Code: 0x80131604, Exception.Message: Exception has been thrown by the target of an invocation.

[15:38:47]: Error: :StackTrace:   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)   at System.Delegate.DynamicInvokeImpl(Object[] args)   at Microsoft.EnterpriseManagement.SetupFramework.ActionEngine.Action.Run(String displayStringNamespace, ProgressData progressData, Func`2 progressDelegate)   at Microsoft.EnterpriseManagement.SetupFramework.ActionEngine.InstallStep.Run(String displayStringNamespace, ProgressData progressData, Func`2 progressDelegate)[15:38:47]: Error: :Inner Exception.Type: System.InvalidOperationException, Exception Error Code: 0x80131604, Exception.Message: Sequence contains no matching element

[15:38:47]: Error: :InnerException.StackTrace:   at System.Linq.Enumerable.First[TSource](IEnumerable`1 source, Func`2 predicate)   at Microsoft.EnterpriseManagement.OperationsManager.Setup.ReportingComponent.GetAccountForAProfileFromManagementGroup(ManagementGroup managementGroup, String profileGuid, Guid managementTypeId, String& userName, String& userDomain)   at Microsoft.EnterpriseManagement.OperationsManager.Setup.ReportingComponent.GetDWWriterAccountFromManagementGroup(String managementServerName, String& userName, String& userDomain)   at Microsoft.SystemCenter.Essentials.SetupFramework.InstallItemsDelegates.OMDataWarehouseProcessor.GetDataReaderWriterAccounts()

[15:38:47]: Error: :FATAL ACTION: GetDataReaderWriterAccounts

[15:38:47]: Error: :FATAL ACTION: DWInstallActionsPostProcessor

[15:38:47]: Error: :ProcessInstalls: Running the PostProcessDelegate returned false.[15:38:47]: Always: :SetErrorType: Setting VitalFailure. currentInstallItem: Data Warehouse Configuration

[15:38:47]: Error: :ProcessInstalls: Running the PostProcessDelegate for OMDATAWAREHOUSE failed…. This is a fatal item.  Setting rollback.[15:38:47]: Info: :SetProgressScreen: FinishMinorStep.[15:38:47]: Always: :!***** Installing: POSTINSTALL ***[15:38:47]: Info: :ProcessInstalls: Rollback is set and we are not doing an uninstall so we will stop processing installs

After Binging a lot i still did not find any solution or any blog where something similar was mentioned.

My usual mantra is to go over and over the error logs in most cases it can direct you to the hidden cause. Well in this case also, it did for me.

Actual Issue:

when i read the setup error log i understood that scom setup while was unable to get the database reader account details and it is something mandatory for it to process and complete the setup.

Now, comes the resolution part:

Go to SCOM console – > Administration pane -> run as profile and check for the accounts associated for below two:

SCOM DW Run as account.png

In my case the associations of run as profile with scom DW account was not targeted properly. I added all the necessary targets as shown in below screenshots.

SCOM Run as Profile association target 2SCOM DW run as profile asscoated target 1

Once it was added, i rebooted the server.

Now, before running the setup again, you will need to delete the management server entry manually from the RMS.

Note:  There will be no entry of SCOM on the server, but if you login to RMS you will see an entry of the MS in grayed out state.

This happens as it was able to successfully register with the root management server during the setup and while it failed to register with data warehouse, scom setup file cannot rollback or unregister itself.

Now rerun the setup and it will be successful.

Voila! issue solved.

Advertisements

System Center and Devops

In this blog, I will share with you on how devops approach is followed and maintained while using system center suite of products.

Before going into details let’s talk about devops first.

The definition – DevOps (development and operations) is an enterprise software development phrase used to mean a type of agile relationship between development and IT operations. The goal of DevOps is to change and improve the relationship by advocating better communication and collaboration between these two business units.

Ref: http://www.webopedia.com/TERM/D/devops_development_operations.html

Now in today’s fast changing enterprise world all business leaders ultimate goal is to be more collaborative and inter connected across various business functions and to do that you need an IT team and technology that enables you.

In market, there are various tools and suite of products that come to help enterprises. However, Microsoft suite of products system center suite is a winner in many areas.

In Devops approach from SME to large enterprise we follow the below approach

System Center DevOps Model

 

Monitor:

Starting with monitoring your entire Infrastructure SCOM (system center operations manager) is a great tool. Why you ask, as OOTB it has all management packs to monitor your entire Microsoft technological solutions and you have plenty of third party solution and adapters that makes it easier to integrate with other technologies or solutions.

Service:

Now when it comes to IT service management, you can rely on SCSM (system center service manager)

It is a great tool to manage all your incidents, service requests, change, problems. Off course you can use it for release and business relationship management but those features are not that great. If you combine it with other solutions then it is wonderful ITSM product.

Manage:

Now for managing your infrastructure you have SCCM (system center configuration manager) OOTB tool can manage all windows software including OS for desktop, laptops, servers and with other third-party solution you can extend its functionality for managing and patching other third-party software’s.

Automation:

Now for a successful devops you need to automate and combine all these functions. This is where SCORC (System Center Orchestrator) along with PowerShell comes in handy. You can automate almost any anything across your infrastructure.

Example Scenarios:

SCOM detects that one of your critical web services is down -> It then automatically create an incident and assigns it to L1 Wintel team. -> Wintel engineer validates the alert runs a runbook in SCORC from the SCSM console (which restarts IIS service) -> Now as the service is started -> SCOM alert is auto resolved and closed -> Incident in SCSM console is resolved with all the actions that got executed in background captured in incident logs -> Wintel engineer notices that from the past incidents and also by his experience that high RAM usage his root cause for this issue-> He goes to SCSM console raises a change request for increasing RAM on the server-> Goes to SCCM console and checks if the server is compliant with all latest security patches and critical updates -> once the change is approved in SCSM-> he uses SCORC runbooks which is integration with SCVMM to increase the RAM on the server -> weeks later from SCOM performance he pulls up a report and verifies that that IIS service going down has never happened after memory was increased on the server.

Above was just a high level example on how all the system center products work hand in hand. This makes it super easy to manage enterprise level IT infrastructure.

Resolve issues with SCOM Maintenance Mode

SCOM is having maintenance mode feature from 2007 R2 onwards which is useful when you need to set objects into maintenance mode from current start time to  a future time.

However, starting from SCOM 2016 we get the niche feature of scheduling maintenance mode in advance.

Even in earlier version of SCOM you can schedule maintenance mode. Check this link for more information

Now going to the things that make this possible.

Backend there is a SQL table “dbo.p_MaintenanceModeJob” which is queried by stored procedure stored procedure called “dbo.p_ScheduledJobsEveryFiveMinutue”, as the name says it runs every 5 minutes.

That is the reason, the minimum time for maintenance mode to set is 5 mins and also this is the reason that  sometimes it can take upto 5 minutes to get an object out of maintenance mode

 

So, if you are in a situation where you see that your scom objects are not coming out of maintenance mode or going into maintenance mode automatically have a check on this procedure. Check if this stored procedure is running or not.

The Stored procedure  that gets executed by the “dbo.p_ScheduledJobsEveryFiveMinutues”

USE [OperationsManager]

GO

/****** Object:  StoredProcedure [dbo].[p_ScheduledJobsEveryFiveMinutes]    Script Date: 9/25/2017 3:20:56 PM ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

ALTER PROCEDURE [dbo].[p_ScheduledJobsEveryFiveMinutes]

AS

BEGIN

    SET NOCOUNT ON

    EXEC dbo.p_MaintenanceModeJob

            EXEC dbo.p_MaintenanceScheduleJob

         

    EXEC dbo.p_JobStatusTimeout

    RETURN 0

END

and SQL query that it executes is

USE [OperationsManager]

GO

/****** Object:  StoredProcedure [dbo].[p_MaintenanceModeJob]    Script Date: 9/25/2017 3:18:33 PM ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

ALTER PROCEDURE [dbo].[p_MaintenanceModeJob]

AS

BEGIN

            SET NOCOUNT ON  

    DECLARE @Err int

    DECLARE @TranCount int

    DECLARE @StartTime datetime

    DECLARE @Now datetime

    DECLARE @TempMaintMode TABLE (BaseManagedEntityId uniqueidentifier,

                                 StartTime datetime)

    SET @TranCount = @@TRANCOUNT 

    SET @Now = getutcdate()

    INSERT INTO @TempMaintMode

    SELECT BaseManagedEntityId, StartTime

    FROM dbo.MaintenanceMode

    WHERE EndTime IS NULL AND ScheduledEndTime < @Now

    SET @Err = @@ERROR

    IF (@Err <> 0) GOTO Error_Exit      

    BEGIN TRAN

    UPDATE dbo.MaintenanceMode

    SET IsInMaintenanceMode = 0,

        EndTime = @Now,

        LastModified = @Now

    WHERE BaseManagedEntityId IN (SELECT BaseManagedEntityId FROM @TempMaintMode)

    SET @Err = @@ERROR

    IF (@Err <> 0) GOTO Error_Exit  

    UPDATE dbo.MaintenanceModeHistory

    SET EndTime = @Now,

        LastModified = @Now

    FROM @TempMaintMode AS TM

    WHERE MaintenanceModeHistory.BaseManagedEntityId = TM.BaseManagedEntityId

    AND MaintenanceModeHistory.StartTime = TM.StartTime

    SET @Err = @@ERROR

    IF (@Err <> 0) GOTO Error_Exit  

    COMMIT TRAN

    RETURN 0

Error_Exit:

    IF (@@TRANCOUNT > @TranCount)

    BEGIN

        ROLLBACK TRAN

    END

    RETURN 1

END

 

By Mistake Deleted Microsoft.SystemCenter.SecureReferenceOverride

Did you by mistake or unknowingly deleted  Microsoft.SystemCenter.SecureReferenceOverride MP?

Don’t worry, in this blog post i will help you to recreate it.

You will come across this MP when you are trying to remove some other management pack and it says it cannot be removed as it has dependencies on Microsoft.SystemCenter.SecureReferenceOverride.

Now if some newbie is working on SCOM, he/ she will probably  go ahead and delete it, just to remove the management packs. But it can become a serious issue in your SCOM infrastructure.

Before i go an tell you what are the issues and how to resolve. Let me tell you what this management pack does for us.

Microsoft.SystemCenter.SecureReferenceOverride management pack stores the association of run as account with run as profiles.

So, while trying to remove this MP, the one things that come to mind is remove all the run as accounts and run as profile associations. However it does not work that way always. Even after you have removed the references it will still says it has dependencies and not allow to remove the management pack.

So here is the solution:

  1.  Export the Microsoft.SystemCenter.SecureReferenceOverride MP and manually remove all reference to it, refer to blog by  Marnix Wolf for more information.
  2. Second scenario, is where you have deleted the MPs by mistake or unknowingly

Now you might be able to remove the MP that you wanted to remove but Immediately you will see that there is a flood of alerts in SCOM. Complaining about run as accounts not available and also a bunch of SCOM data warehouse related alerts.

Now you are only way to get things right is to recollect all the run as profiles and run as accounts association and recreate them.

Go to Administration -> Run as configuration ->Profiles

And choose all the run as profiles which needs to be associated. Typically it will be various SQL, SharePoint Run as profiles, Linux run as profiles, Data warehouse account.

As soon as you associated the first one, you will immediately see that the Microsoft.SystemCenter.SecureReferenceOverride management pack gets recreated.

Once you finish doing for all of run as profiles, Voila!! You solved it.

Congrats! Now all things are in order.

SCOM DW Report Deployment Errors with SharePoint 2013 MP

Today i am going to talk about one of annoying errors that has been flooding my SCOM lately.

Error:

Data Warehouse failed to deploy reports for a management pack to SQL Reporting Services Server. Failed to deploy reporting component to the SQL Server Reporting Services server. The operation will be retried.
Exception ‘DeploymentException’: Failed to deploy reports for management pack with version dependent id ‘edf9e0b9-65aa-df29-6729-d16f0005e820’. Failed to deploy linked report ‘Microsoft.SharePoint.Server_Performance_Report’. Failed to convert management pack element reference ‘$MPElement[Name=”Microsoft.SharePoint.Foundation.2013.Responsetime”]$’ to guid. Check if MP element referenced exists in the MP. An object of class ManagementPackElement with ID 75668869-f88c-31f3-d081-409da1f06f0f was not found.
One or more workflows were affected by this.
Workflow name: Microsoft.SystemCenter.DataWarehouse.Deployment.Report

In short the error was telling me that my SCOM is unable to deploy SharePoint server performance related reports to SCOM reporting services, which means that SharePoint Reports will be unavailable, however that was not the case for me. As I was able to see all SP 2013 reports listed in my reporting pane.

So I thinking it to be a false positive and have been breaking my head on this for almost 3 days to resolve the alert.

  • I deleted the SharePoint 2013 MP and added it again
  • Reconfigured it again
  • Recheck all run as accounts

But still nothing seemed to fix this thing.

Then after a lot searching I came across Kevin Holman blogs which states that it is known issues with SharePoint 2013 MP 15.0.4425.1000

https://blogs.technet.microsoft.com/kevinholman/2013/05/13/configuring-the-sharepoint-2013-management-pack/

Follow the above link for more information on this.

 

Troubleshooting SCOM 2016 MS Grayed out State

While working at a customer site, I came across a situation suddenly where on of the MS was in grayed out state.

Initially I followed the main troubleshooting steps in these situation that we all do:

  • Flush Health service state cache from SCOM console, for MS you can do it from Operations Manager folder -> Management group Health

Click on MS and in RHS under task select ‘Flush Health service state’ It was fine for 2 mins or so and then again back to grayed out state.

So time for next move

  • Deleted the ‘Health service state’ under operations manfer-> Server folder on the MS

Still same thing server is backed to grayed out state.

I am now going through each and every error in event viewer and that’s when I came across below error related to healthservice

SCOM_Config_Override_2 copy

Now it makes sense, so the main culprit here is this workflow.

Actually, this workflow is lined with OMS, but client is neither using OMS nor have they configured it ever.

So, it quite odd to have this pop here.

In the OMS configuration console, nothing is configured. So very odd.

Obviously, you can’t remove the OMS or System Center Advisor or Intelligence MP from SCOM as to do that you have to remove other dependant MPs which is not possible, so what do we do?

Now comes the resolution,

  • GO to Authoring Pane-> Rules type intel
  • You will find the below 3 rules show up in search results.
  • Click on first 2 rules and select enable hit override value as False make sure to select enforce check box too

SCOM_Override_3

  • Repeat it other for the 2nd rule

SCOM_Override_1

  • Save these overrides in a separate MP and not under Default MP
  • Once done, now again flush healthservice state cache and check

Voila! MS is now healthy, when I check event viewer logs, even the error are now gone.

Note: Do this only when you DO not want to connect your on-premise SCOM to OMS

Unix / Linux SCOM Commandlets

CMdLet Description
Get-SCXAgent Returns list of managed UNIX / Linux computers
Get-SCXSSHCredential Creates an SSH credential
Install-SCXAgent Install SCOM agent for discovered UNIX / Linux computers.
Invoke-SCXDiscovery Invokes the discovery operation for the specified configuration of UNIX / Linux computers.
Remove-SCXAgent Remove a UNIX or Linux computer from a management group.
Set-SCXResourcePool Change the managing resource pool for the targeted UNIX or Lunix computer.
Uninstall-SCXAgent Uninstall the UNIX / Linux agent.
Update-SCXAgent Updates the UNIX / Linux agent
scxcertconfig -list List the Xplat certificates installed in management group
scxcertconfig -remove Remove the Xplat certificates installed in management group

Example 1:

Input: get-SCXagent

Output: Will return list of all Unix / Linux managed agents

Example 2:

Input: get-SCXagent | where {$_.Name -match “X01C-XPSCOM”} | Remove-SCXAgent

Output: No output will be displayed however, agent that matches with name X01C-XPSCOM will be removed from management group.

Example 3:

Input: scxcertconfig -list

Output: Will display all Xplat certificates installed in management group.

Example 4:

Input : scxcertconfig -remove-all

Output: No output will be displayed however, all Xplat certificates installed in management group will be removed.