Error Executing the Runbook in System Center Orchestrator

Issues:

Runbook is not getting triggered and also when I try to open orchestrator web console I see the below error

So, I went with the basic troubleshooting steps

  • Checking if the orchestrator service account is not locked out
  • Cross checked, that account has all the access, it is admin on the server and also the same account has been used during setup and configuration

All looks fine except in event viewer I find an error which says access denied for orchestrator account when a runbook is getting triggered.

Went and checked all the possible things including IIS and network settings, and ultimately understood it is happening as Orchestrator web service is broken and as a result web service is not calling the runbooks.

Resolution:

Reinstall Orchestrator web service only and everything will work as expected.

 

Advertisements

Unable to add a second SCOM 2016 MS: Event ID 1008

Recently i came across a strange situation where i was unable to add a second management server to the existing scom management group.

I had taken care of all the pre-requisites like:

  • SCOM actions, SCOM sdk were admins on the second management servers as well as on the DB servers
  • Even the account and computers accounts were admin on both RMS and DBs and DW
  • Firewall ports are opened and traffic is allowed

still the second SCOM MS would not get installed

SCOM secondary MS setup error

It will fail at the stage of Data ware house configuration and roll back everything.

 

Now, the only error in event id under Application log is:

The Open Procedure for service “MOMConnector” in DLL “D:\Program Files\Microsoft System Center 2016\Operations Manager\Server\MOMConnectorPerformance.dll” failed. Performance data for this service will not be available. The first four bytes (DWORD) of the Data section contains the error code.

Event ID 1008 Source: Perflib

i researched online a lot but nothing solved. until i got another error in the scom setup log file.

Below is part of the error, for you to understand the context

15:38:43]: Info: :Finished evaluation of rule ‘NewDBForConfigureDataWarehouseForAllServersRules’

15:38:43]: Info: :Finished evaluation of rule ‘NewDBForConfigureDataWarehouseForAllServersRules’

[15:38:43]: Debug: :Action ConfigureDataWarehouseForAllServers will not be needed.[15:38:43]: Always: :Done validating action list; now running individual actions.[15:38:43]: Always: :Current Action: GetCommonProperties

[15:38:43]: Info: :Info:Getting Common Values for Server Postprocessor

[15:38:43]: Info: :GetCommonProperties completed.

[15:38:43]: Always: :Current Action: StartServices

[15:38:43]: Always: :Starting OM Services.

[15:38:43]: Debug: :StartService: attempting to start service OMSDK

[15:38:43]: Debug: :StartService: Able to start the service OMSDK after 0 minutes.[15:38:43]: Debug: :StartService: attempting to start service healthservice

[15:38:43]: Debug: :StartService: Able to start the service healthservice after 0 minutes.[15:38:43]: Debug: :StartService: attempting to start service cshost

[15:38:43]: Debug: :StartService: Able to start the service cshost after 0 minutes.[15:38:43]: Info: :StartServices completed.[15:38:43]: Always: :Current Action: GetDataReaderWriterAccounts

[15:38:47]: Error: :GetAccountForAProfileFromManagementGroup error: Threw Exception.Type: System.InvalidOperationException, Exception Error Code: 0x80131509, Exception.Message: Sequence contains no matching element

[15:38:47]: Error: :StackTrace:   at System.Linq.Enumerable.First[TSource](IEnumerable`1 source, Func`2 predicate)   at Microsoft.EnterpriseManagement.OperationsManager.Setup.ReportingComponent.GetAccountForAProfileFromManagementGroup(ManagementGroup managementGroup, String profileGuid, Guid managementTypeId, String& userName, String& userDomain)[15:38:47]: Error: :GetDataReaderWriterAccounts failed with the following exception: : Threw Exception.Type: System.Reflection.TargetInvocationException, Exception Error Code: 0x80131604, Exception.Message: Exception has been thrown by the target of an invocation.

[15:38:47]: Error: :StackTrace:   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor)   at System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments)   at System.Delegate.DynamicInvokeImpl(Object[] args)   at Microsoft.EnterpriseManagement.SetupFramework.ActionEngine.Action.Run(String displayStringNamespace, ProgressData progressData, Func`2 progressDelegate)   at Microsoft.EnterpriseManagement.SetupFramework.ActionEngine.InstallStep.Run(String displayStringNamespace, ProgressData progressData, Func`2 progressDelegate)[15:38:47]: Error: :Inner Exception.Type: System.InvalidOperationException, Exception Error Code: 0x80131604, Exception.Message: Sequence contains no matching element

[15:38:47]: Error: :InnerException.StackTrace:   at System.Linq.Enumerable.First[TSource](IEnumerable`1 source, Func`2 predicate)   at Microsoft.EnterpriseManagement.OperationsManager.Setup.ReportingComponent.GetAccountForAProfileFromManagementGroup(ManagementGroup managementGroup, String profileGuid, Guid managementTypeId, String& userName, String& userDomain)   at Microsoft.EnterpriseManagement.OperationsManager.Setup.ReportingComponent.GetDWWriterAccountFromManagementGroup(String managementServerName, String& userName, String& userDomain)   at Microsoft.SystemCenter.Essentials.SetupFramework.InstallItemsDelegates.OMDataWarehouseProcessor.GetDataReaderWriterAccounts()

[15:38:47]: Error: :FATAL ACTION: GetDataReaderWriterAccounts

[15:38:47]: Error: :FATAL ACTION: DWInstallActionsPostProcessor

[15:38:47]: Error: :ProcessInstalls: Running the PostProcessDelegate returned false.[15:38:47]: Always: :SetErrorType: Setting VitalFailure. currentInstallItem: Data Warehouse Configuration

[15:38:47]: Error: :ProcessInstalls: Running the PostProcessDelegate for OMDATAWAREHOUSE failed…. This is a fatal item.  Setting rollback.[15:38:47]: Info: :SetProgressScreen: FinishMinorStep.[15:38:47]: Always: :!***** Installing: POSTINSTALL ***[15:38:47]: Info: :ProcessInstalls: Rollback is set and we are not doing an uninstall so we will stop processing installs

After Binging a lot i still did not find any solution or any blog where something similar was mentioned.

My usual mantra is to go over and over the error logs in most cases it can direct you to the hidden cause. Well in this case also, it did for me.

Actual Issue:

when i read the setup error log i understood that scom setup while was unable to get the database reader account details and it is something mandatory for it to process and complete the setup.

Now, comes the resolution part:

Go to SCOM console – > Administration pane -> run as profile and check for the accounts associated for below two:

SCOM DW Run as account.png

In my case the associations of run as profile with scom DW account was not targeted properly. I added all the necessary targets as shown in below screenshots.

SCOM Run as Profile association target 2SCOM DW run as profile asscoated target 1

Once it was added, i rebooted the server.

Now, before running the setup again, you will need to delete the management server entry manually from the RMS.

Note:  There will be no entry of SCOM on the server, but if you login to RMS you will see an entry of the MS in grayed out state.

This happens as it was able to successfully register with the root management server during the setup and while it failed to register with data warehouse, scom setup file cannot rollback or unregister itself.

Now rerun the setup and it will be successful.

Voila! issue solved.

System Center and Devops

In this blog, I will share with you on how devops approach is followed and maintained while using system center suite of products.

Before going into details let’s talk about devops first.

The definition – DevOps (development and operations) is an enterprise software development phrase used to mean a type of agile relationship between development and IT operations. The goal of DevOps is to change and improve the relationship by advocating better communication and collaboration between these two business units.

Ref: http://www.webopedia.com/TERM/D/devops_development_operations.html

Now in today’s fast changing enterprise world all business leaders ultimate goal is to be more collaborative and inter connected across various business functions and to do that you need an IT team and technology that enables you.

In market, there are various tools and suite of products that come to help enterprises. However, Microsoft suite of products system center suite is a winner in many areas.

In Devops approach from SME to large enterprise we follow the below approach

System Center DevOps Model

 

Monitor:

Starting with monitoring your entire Infrastructure SCOM (system center operations manager) is a great tool. Why you ask, as OOTB it has all management packs to monitor your entire Microsoft technological solutions and you have plenty of third party solution and adapters that makes it easier to integrate with other technologies or solutions.

Service:

Now when it comes to IT service management, you can rely on SCSM (system center service manager)

It is a great tool to manage all your incidents, service requests, change, problems. Off course you can use it for release and business relationship management but those features are not that great. If you combine it with other solutions then it is wonderful ITSM product.

Manage:

Now for managing your infrastructure you have SCCM (system center configuration manager) OOTB tool can manage all windows software including OS for desktop, laptops, servers and with other third-party solution you can extend its functionality for managing and patching other third-party software’s.

Automation:

Now for a successful devops you need to automate and combine all these functions. This is where SCORC (System Center Orchestrator) along with PowerShell comes in handy. You can automate almost any anything across your infrastructure.

Example Scenarios:

SCOM detects that one of your critical web services is down -> It then automatically create an incident and assigns it to L1 Wintel team. -> Wintel engineer validates the alert runs a runbook in SCORC from the SCSM console (which restarts IIS service) -> Now as the service is started -> SCOM alert is auto resolved and closed -> Incident in SCSM console is resolved with all the actions that got executed in background captured in incident logs -> Wintel engineer notices that from the past incidents and also by his experience that high RAM usage his root cause for this issue-> He goes to SCSM console raises a change request for increasing RAM on the server-> Goes to SCCM console and checks if the server is compliant with all latest security patches and critical updates -> once the change is approved in SCSM-> he uses SCORC runbooks which is integration with SCVMM to increase the RAM on the server -> weeks later from SCOM performance he pulls up a report and verifies that that IIS service going down has never happened after memory was increased on the server.

Above was just a high level example on how all the system center products work hand in hand. This makes it super easy to manage enterprise level IT infrastructure.

SCOM DW Report Deployment Errors with SharePoint 2013 MP

Today i am going to talk about one of annoying errors that has been flooding my SCOM lately.

Error:

Data Warehouse failed to deploy reports for a management pack to SQL Reporting Services Server. Failed to deploy reporting component to the SQL Server Reporting Services server. The operation will be retried.
Exception ‘DeploymentException’: Failed to deploy reports for management pack with version dependent id ‘edf9e0b9-65aa-df29-6729-d16f0005e820’. Failed to deploy linked report ‘Microsoft.SharePoint.Server_Performance_Report’. Failed to convert management pack element reference ‘$MPElement[Name=”Microsoft.SharePoint.Foundation.2013.Responsetime”]$’ to guid. Check if MP element referenced exists in the MP. An object of class ManagementPackElement with ID 75668869-f88c-31f3-d081-409da1f06f0f was not found.
One or more workflows were affected by this.
Workflow name: Microsoft.SystemCenter.DataWarehouse.Deployment.Report

In short the error was telling me that my SCOM is unable to deploy SharePoint server performance related reports to SCOM reporting services, which means that SharePoint Reports will be unavailable, however that was not the case for me. As I was able to see all SP 2013 reports listed in my reporting pane.

So I thinking it to be a false positive and have been breaking my head on this for almost 3 days to resolve the alert.

  • I deleted the SharePoint 2013 MP and added it again
  • Reconfigured it again
  • Recheck all run as accounts

But still nothing seemed to fix this thing.

Then after a lot searching I came across Kevin Holman blogs which states that it is known issues with SharePoint 2013 MP 15.0.4425.1000

https://blogs.technet.microsoft.com/kevinholman/2013/05/13/configuring-the-sharepoint-2013-management-pack/

Follow the above link for more information on this.

 

Unix / Linux SCOM Commandlets

CMdLet Description
Get-SCXAgent Returns list of managed UNIX / Linux computers
Get-SCXSSHCredential Creates an SSH credential
Install-SCXAgent Install SCOM agent for discovered UNIX / Linux computers.
Invoke-SCXDiscovery Invokes the discovery operation for the specified configuration of UNIX / Linux computers.
Remove-SCXAgent Remove a UNIX or Linux computer from a management group.
Set-SCXResourcePool Change the managing resource pool for the targeted UNIX or Lunix computer.
Uninstall-SCXAgent Uninstall the UNIX / Linux agent.
Update-SCXAgent Updates the UNIX / Linux agent
scxcertconfig -list List the Xplat certificates installed in management group
scxcertconfig -remove Remove the Xplat certificates installed in management group

Example 1:

Input: get-SCXagent

Output: Will return list of all Unix / Linux managed agents

Example 2:

Input: get-SCXagent | where {$_.Name -match “X01C-XPSCOM”} | Remove-SCXAgent

Output: No output will be displayed however, agent that matches with name X01C-XPSCOM will be removed from management group.

Example 3:

Input: scxcertconfig -list

Output: Will display all Xplat certificates installed in management group.

Example 4:

Input : scxcertconfig -remove-all

Output: No output will be displayed however, all Xplat certificates installed in management group will be removed.

SCCM Primary sites design considerations

Today I will discuss scenarios under which you might require multiple primary sites.

As a thumb rule use a stand-alone primary site to support management of all of your systems and users. This topology is also successful when your company’s different geographic locations can be successfully served by a single primary site. To help manage network traffic, you can use multiple management and distribution points across your infrastructure to optimize network traffic.

A stand-alone primary site supports:

  • 175,000 total clients and devices, not to exceed:
    • 150,000 desktops (computers that run Windows, Linux, and UNIX)
    • 25,000 devices that run Mac and Windows CE 7.0

For mobile device management:

  • 50,000 devices by using on-premises MDM
  • 150,000 cloud-based devices

For example, a stand-alone primary site that supports 150,000 desktops and 10,000 Mac or Windows CE 7.0 can support only an additional 15,000 devices. Those devices can be either cloud-based or managed by using on-premises MDM.

For more information on sizing check https://docs.microsoft.com/en-us/sccm/core/plan-design/configs/size-and-scale-numbers

Now let’s get into scenarios of considering more than 1 Primary sites

  1. Load balancing across two Primary Sites

This scenario comes into play when you will have a Central Administration Site (CAS), and 2 or more Primary Sites with the thought of splitting the clients across multiple primary sites, in this scenarios if you lose one Primary site, you could still support half of your environment until the other Primary is recovered.

Below are pros and cons of this design:

Pros

  • If you lose the CAS or One Primary, then at least one Primary is still functional, as are its Secondary Sites until the CAS or other Primary is brought back online.

The deciding factor for this is if you have a tight SLA in bringing up SCCM sites then this is your best bet.

Typically, it takes around 3 hours to bring back SCCM sites if you have SCCM DB as SCCM site backup available.

  • Removes the Single Point of Failure scenario from the design, as clients assigned to other primaries would still be able to report in and be managed.

If need be, you can also manually switch clients to report to the available primary sites and continue to manage them

Cons

  • Increased Licensing costs
  • Increased hardware costs
  • Increased SQL Replication
  • Change latency across the Infrastructure as well as Locking due to replication latency
  1. Redundancy and High Availability

The data from Primary Sites and the CAS replicates among sites in the hierarchy. The CAS also provides centralized Administration and reporting.

Note that automatic Client Re-assignment does not occur when a Primary Site fails.

The result of a Primary Site failure is that the Primary Site and its Secondary sites communication are now broken, and the Secondary Sites cannot be re-parented. This coupled with the fact that the Client cannot be easily re-assigned in the time it would take to recover the failed Primary Site means there is really not a valid reason to do this unless the time it will take you to recover the Primary site, is greater than the time it would take to reassign and reinstall all of the Secondary sites the failed primary had.

However, this becomes valid when the scenario of Natural Disaster or War Type precautions for redundancy are being considered where the other location won’t be coming back online for quite some time.

  1. Geographic Boundaries

In some scenarios, companies across different countries require that each continent or country can share data, but that they also must be able to still support their country or continents clients must still be manageable. In this case, which is a business case for continuity; it would be feasible to have more than one Primary Site. Making the choice to use another Primary site in this case should be based on connectivity and client count because just using a Secondary site or remote Distribution point should be good enough for Geographic separation.

  1. Political or just that your clients want it

In some scenarios, your client you want multiple primary sites and segregate clients between them just because they are being managed by different departments or heads.

There can also be situations where they want to segregate data clients between and do not want everybody in the organization to have to access to all information.

Practically this cannot be a good reason to have multiple primary sites as SCCM user roles permissions can take care of it. And CAS by default will have access to all the information across primary sites.

However, there are situations that I have come across where this is required for client satisfaction.

Stop runbook instance by orchestrator

If you are an automation geek, you will come across multiple scenarios where you would like to stop a runbook while it is getting executed. unfortunately OOTB MS orchestrator does not have any activity that supports this model.

There is PS command lets that you can use but it is quite complex and most important it is very difficult to  have it triggered automatically (runbook runtime).

However, there is awesome ready made integration pack available from Kelverion Kelverion integration pack for runbook management you can use this IP to stop, start, get runbook status and get runbook ID. Obviously this comes at a cost, now if you are an geek like me who like to do things rather than buy them then click here: Stop runbook instance

How it works:

Just enter  Runbook ID or unique ID is taken as an input parameter, (you need to fetch this info the orchestraor database or orchestrator web URL).

Once unique ID is entered,  stop runbook will automatically fetch it instance id (this will be unique every time a runbook runs) from orchestrator DB and stop only that instance of the runbook.

I hope this will be of  help to the community.

Please post your feedback in comments section below.