Troubleshooting SCOM 2016 MS Grayed out State

While working at a customer site, I came across a situation suddenly where on of the MS was in grayed out state.

Initially I followed the main troubleshooting steps in these situation that we all do:

  • Flush Health service state cache from SCOM console, for MS you can do it from Operations Manager folder -> Management group Health

Click on MS and in RHS under task select ‘Flush Health service state’ It was fine for 2 mins or so and then again back to grayed out state.

So time for next move

  • Deleted the ‘Health service state’ under operations manfer-> Server folder on the MS

Still same thing server is backed to grayed out state.

I am now going through each and every error in event viewer and that’s when I came across below error related to healthservice

SCOM_Config_Override_2 copy

Now it makes sense, so the main culprit here is this workflow.

Actually, this workflow is lined with OMS, but client is neither using OMS nor have they configured it ever.

So, it quite odd to have this pop here.

In the OMS configuration console, nothing is configured. So very odd.

Obviously, you can’t remove the OMS or System Center Advisor or Intelligence MP from SCOM as to do that you have to remove other dependant MPs which is not possible, so what do we do?

Now comes the resolution,

  • GO to Authoring Pane-> Rules type intel
  • You will find the below 3 rules show up in search results.
  • Click on first 2 rules and select enable hit override value as False make sure to select enforce check box too

SCOM_Override_3

  • Repeat it other for the 2nd rule

SCOM_Override_1

  • Save these overrides in a separate MP and not under Default MP
  • Once done, now again flush healthservice state cache and check

Voila! MS is now healthy, when I check event viewer logs, even the error are now gone.

Note: Do this only when you DO not want to connect your on-premise SCOM to OMS

Advertisements

Unix / Linux SCOM Commandlets

CMdLet Description
Get-SCXAgent Returns list of managed UNIX / Linux computers
Get-SCXSSHCredential Creates an SSH credential
Install-SCXAgent Install SCOM agent for discovered UNIX / Linux computers.
Invoke-SCXDiscovery Invokes the discovery operation for the specified configuration of UNIX / Linux computers.
Remove-SCXAgent Remove a UNIX or Linux computer from a management group.
Set-SCXResourcePool Change the managing resource pool for the targeted UNIX or Lunix computer.
Uninstall-SCXAgent Uninstall the UNIX / Linux agent.
Update-SCXAgent Updates the UNIX / Linux agent
scxcertconfig -list List the Xplat certificates installed in management group
scxcertconfig -remove Remove the Xplat certificates installed in management group

Example 1:

Input: get-SCXagent

Output: Will return list of all Unix / Linux managed agents

Example 2:

Input: get-SCXagent | where {$_.Name -match “X01C-XPSCOM”} | Remove-SCXAgent

Output: No output will be displayed however, agent that matches with name X01C-XPSCOM will be removed from management group.

Example 3:

Input: scxcertconfig -list

Output: Will display all Xplat certificates installed in management group.

Example 4:

Input : scxcertconfig -remove-all

Output: No output will be displayed however, all Xplat certificates installed in management group will be removed.

MS SCSM VS Atlasian JIRA Service Desk

Jira SDM SCSM
Overview This is built on Atlassians JIRA workflow engine, JIRA Service Desk offers a collaborative, agile platform and knowledge base that’s low cost, easy to set up. Tool OOTB not fully ITIL compliant This is a product from Microsoft and come with System Center suite. Currently this is in its 3rd generation. Tool OOTB is MOF and ITIL compliant. Offers greater flexibility in terms of integration with other system center and MS products
Ease of use Easy to set up and use. However limited OOTB capabilities, needs a lot of customizations. Easy to set up and use, 2016 edition has better UI experience that previous versions. Also third part applications like ItnetX SCSM Portal, and Cireson SCSM Portal increases UI experience
ITIL Process OOTB only Incident and service request Management, rest modules can be built. However requires effort. OOTB has Incident, service catalogue, change, problem, request fulfillment, release management process
Scalability At the very begining you need to decide what are the features that you will use as this related to design of the solution. Scalability is biggest advantage of SCSM, you can easily scale horizontally or vertically based on your needs.
Automation Capabilities Limited, when it comes to outside Atlasion solutions Enhanced automation capabilities with System center and other MS products like SharePoint, Exchange, AD and other third party products. Need to use System center orchestrator and other third party adapters from Kelverion
Licensing Costs < US$ 40,000 (Approx., will vary based on region) Licensing is based on core so it depends upon your infrastructure sizing
Hardware costs Runs on most VMs technology like VM ware, Hyper-V, Oracle VM Runs on most VMs technology like VM ware, Hyper-V, Oracle VM
Resource costs Medium Low, Tool is very ease to use, however development costs are high
Company Size SME SME
Support Costs Low Low

Patching by Orchestrator Part -2

In this blog i will share with you script and other resources that you can use to automate patching.

Restart-Computer -ComputerName REMOTE_COMPUTER_NAME -Force – Wait WinRM

This will force server to restart and return a success code only after the server restarts and WinRM service is up and running. This will ensure that the steps that you have arranged to be executed after a server reboot happens correctly and they do not get failed.

Like i said i in my earlier Patching by Orchestrator Part -1 Blog that i am unable to share complete runbooks or scripts that i have used due to NDA with my employer.

If you know of more script or other innovative ways of automating patching then please post them in comments section.

Note: Credit and risk for all the script and links that i have mentioned here goes to their respective authors.

SCCM Primary sites design considerations

Today I will discuss scenarios under which you might require multiple primary sites.

As a thumb rule use a stand-alone primary site to support management of all of your systems and users. This topology is also successful when your company’s different geographic locations can be successfully served by a single primary site. To help manage network traffic, you can use multiple management and distribution points across your infrastructure to optimize network traffic.

A stand-alone primary site supports:

  • 175,000 total clients and devices, not to exceed:
    • 150,000 desktops (computers that run Windows, Linux, and UNIX)
    • 25,000 devices that run Mac and Windows CE 7.0

For mobile device management:

  • 50,000 devices by using on-premises MDM
  • 150,000 cloud-based devices

For example, a stand-alone primary site that supports 150,000 desktops and 10,000 Mac or Windows CE 7.0 can support only an additional 15,000 devices. Those devices can be either cloud-based or managed by using on-premises MDM.

For more information on sizing check https://docs.microsoft.com/en-us/sccm/core/plan-design/configs/size-and-scale-numbers

Now let’s get into scenarios of considering more than 1 Primary sites

  1. Load balancing across two Primary Sites

This scenario comes into play when you will have a Central Administration Site (CAS), and 2 or more Primary Sites with the thought of splitting the clients across multiple primary sites, in this scenarios if you lose one Primary site, you could still support half of your environment until the other Primary is recovered.

Below are pros and cons of this design:

Pros

  • If you lose the CAS or One Primary, then at least one Primary is still functional, as are its Secondary Sites until the CAS or other Primary is brought back online.

The deciding factor for this is if you have a tight SLA in bringing up SCCM sites then this is your best bet.

Typically, it takes around 3 hours to bring back SCCM sites if you have SCCM DB as SCCM site backup available.

  • Removes the Single Point of Failure scenario from the design, as clients assigned to other primaries would still be able to report in and be managed.

If need be, you can also manually switch clients to report to the available primary sites and continue to manage them

Cons

  • Increased Licensing costs
  • Increased hardware costs
  • Increased SQL Replication
  • Change latency across the Infrastructure as well as Locking due to replication latency
  1. Redundancy and High Availability

The data from Primary Sites and the CAS replicates among sites in the hierarchy. The CAS also provides centralized Administration and reporting.

Note that automatic Client Re-assignment does not occur when a Primary Site fails.

The result of a Primary Site failure is that the Primary Site and its Secondary sites communication are now broken, and the Secondary Sites cannot be re-parented. This coupled with the fact that the Client cannot be easily re-assigned in the time it would take to recover the failed Primary Site means there is really not a valid reason to do this unless the time it will take you to recover the Primary site, is greater than the time it would take to reassign and reinstall all of the Secondary sites the failed primary had.

However, this becomes valid when the scenario of Natural Disaster or War Type precautions for redundancy are being considered where the other location won’t be coming back online for quite some time.

  1. Geographic Boundaries

In some scenarios, companies across different countries require that each continent or country can share data, but that they also must be able to still support their country or continents clients must still be manageable. In this case, which is a business case for continuity; it would be feasible to have more than one Primary Site. Making the choice to use another Primary site in this case should be based on connectivity and client count because just using a Secondary site or remote Distribution point should be good enough for Geographic separation.

  1. Political or just that your clients want it

In some scenarios, your client you want multiple primary sites and segregate clients between them just because they are being managed by different departments or heads.

There can also be situations where they want to segregate data clients between and do not want everybody in the organization to have to access to all information.

Practically this cannot be a good reason to have multiple primary sites as SCCM user roles permissions can take care of it. And CAS by default will have access to all the information across primary sites.

However, there are situations that I have come across where this is required for client satisfaction.

Patching by Orchestrator Part -1

Today i will explain you how you can achieve patching by orchestrator including complex patching procedures.

Below is list of software that you will need:

  1. Excel – To put the steps that needs to be carried out in a sequential order
  2. MS SQL DBs
  3. PS Scripts
  4. MS Orchestrator
  5. MS SCCM
  6. MS SCSM (incase you want to make patching a self service offering)

Now a few things to keep things in order:

  • In a Excel sheet arrange all steps that needs to be automated, below is just an example of column headers:

SequenceNo ActionType ComputerName Parameter1 Parameter2 Parameter3 Parameter4 Parameter5 Parameter6 Outcome Expected Patching by Orchestrator Excel sheet Template Patching by Orchestrator Template

  1. Now create DBs called ‘SteptoExecute’ in a SQL instance where you will upload the steps created in the previous steps, my suggestion is to create it in Orchestrator instance itself
  2. Another DBs called ‘ExecutedSteps’
  3. Now you have to create 1 main runbook which in turn will call multiple other runbooks that you need to create in next step.
  4. Here runbooks should be arranged to first read steps from ‘SteptoExecute’ and then invoke runbooks based on sequence#
  5. Other multiple runbooks that you need to create will mainly depend upon complexity of steps, here i will just give a high level of runbooks that will typical be required
    • SCCM Patch push code runbook
    • SCCM patching status code check runbook
    • Service Start runbook – Runbook will take service name
    • Service Stop runbook
    • Service restart runbook
    • Computer restart
    • Computer restart with timeout
    • Computer restart with check when it’s back online
    • Run Program- Can use to run a batch file
    • Web Status code checker – can be used to check response status code of web application – Click to download runbook  web application response status code checker
    • Email activity runbook – to send status of steps executed over an email
  6. Now use orchestrator runbook to invoke steps from the DBs in the sequential order arranged, use SQL DB read activity or a PS script
  7. Each step has an associated action type with it as explained earlier, which in turn will call the runbooks and runbooks will execute the steps such as stop/start/restart services, run batch files, rename files, start a web service, stop a web site and etc
  8. Make sure to make runbooks names same as action type, it will easier to invoke runbooks in the previous step
  9. Once all the steps are executed a email activity combines all the steps executed along with the results that are saved in ‘ExecutedSteps’ Db

Download link to Visio Diagram

In the next part i will share with you scripts that you will need and reference to other MVP and system center blogs which will be helpful to you.

However, due to NDA with my ex employer i will not be able to share complete runbooks or scripts that i have used. 

Please note, do test your runbooks in a dev environment first. Sometimes it can take a few trial and error in getting the steps in a sequence order.

Patching by Orchestrator Part -2

Hex & RGB Color Scale

While designing forms in Service Manage or make changes to colour scheme in SCOM you need hex colour codes.

I needed it while making changes to colour scheme in service manger self service portal and had to go to multiple portals to generate value for each colour. So i decided to put all the codes in excel sheet and publish it for everyone who needs it.

Download file: Hex & RGB Color Scale