System Center and Devops

In this blog, I will share with you on how devops approach is followed and maintained while using system center suite of products.

Before going into details let’s talk about devops first.

The definition – DevOps (development and operations) is an enterprise software development phrase used to mean a type of agile relationship between development and IT operations. The goal of DevOps is to change and improve the relationship by advocating better communication and collaboration between these two business units.

Ref: http://www.webopedia.com/TERM/D/devops_development_operations.html

Now in today’s fast changing enterprise world all business leaders ultimate goal is to be more collaborative and inter connected across various business functions and to do that you need an IT team and technology that enables you.

In market, there are various tools and suite of products that come to help enterprises. However, Microsoft suite of products system center suite is a winner in many areas.

In Devops approach from SME to large enterprise we follow the below approach

System Center DevOps Model

 

Monitor:

Starting with monitoring your entire Infrastructure SCOM (system center operations manager) is a great tool. Why you ask, as OOTB it has all management packs to monitor your entire Microsoft technological solutions and you have plenty of third party solution and adapters that makes it easier to integrate with other technologies or solutions.

Service:

Now when it comes to IT service management, you can rely on SCSM (system center service manager)

It is a great tool to manage all your incidents, service requests, change, problems. Off course you can use it for release and business relationship management but those features are not that great. If you combine it with other solutions then it is wonderful ITSM product.

Manage:

Now for managing your infrastructure you have SCCM (system center configuration manager) OOTB tool can manage all windows software including OS for desktop, laptops, servers and with other third-party solution you can extend its functionality for managing and patching other third-party software’s.

Automation:

Now for a successful devops you need to automate and combine all these functions. This is where SCORC (System Center Orchestrator) along with PowerShell comes in handy. You can automate almost any anything across your infrastructure.

Example Scenarios:

SCOM detects that one of your critical web services is down -> It then automatically create an incident and assigns it to L1 Wintel team. -> Wintel engineer validates the alert runs a runbook in SCORC from the SCSM console (which restarts IIS service) -> Now as the service is started -> SCOM alert is auto resolved and closed -> Incident in SCSM console is resolved with all the actions that got executed in background captured in incident logs -> Wintel engineer notices that from the past incidents and also by his experience that high RAM usage his root cause for this issue-> He goes to SCSM console raises a change request for increasing RAM on the server-> Goes to SCCM console and checks if the server is compliant with all latest security patches and critical updates -> once the change is approved in SCSM-> he uses SCORC runbooks which is integration with SCVMM to increase the RAM on the server -> weeks later from SCOM performance he pulls up a report and verifies that that IIS service going down has never happened after memory was increased on the server.

Above was just a high level example on how all the system center products work hand in hand. This makes it super easy to manage enterprise level IT infrastructure.

Advertisements

Resolve issues with SCOM Maintenance Mode

SCOM is having maintenance mode feature from 2007 R2 onwards which is useful when you need to set objects into maintenance mode from current start time to  a future time.

However, starting from SCOM 2016 we get the niche feature of scheduling maintenance mode in advance.

Even in earlier version of SCOM you can schedule maintenance mode. Check this link for more information

Now going to the things that make this possible.

Backend there is a SQL table “dbo.p_MaintenanceModeJob” which is queried by stored procedure stored procedure called “dbo.p_ScheduledJobsEveryFiveMinutue”, as the name says it runs every 5 minutes.

That is the reason, the minimum time for maintenance mode to set is 5 mins and also this is the reason that  sometimes it can take upto 5 minutes to get an object out of maintenance mode

 

So, if you are in a situation where you see that your scom objects are not coming out of maintenance mode or going into maintenance mode automatically have a check on this procedure. Check if this stored procedure is running or not.

The Stored procedure  that gets executed by the “dbo.p_ScheduledJobsEveryFiveMinutues”

USE [OperationsManager]

GO

/****** Object:  StoredProcedure [dbo].[p_ScheduledJobsEveryFiveMinutes]    Script Date: 9/25/2017 3:20:56 PM ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

ALTER PROCEDURE [dbo].[p_ScheduledJobsEveryFiveMinutes]

AS

BEGIN

    SET NOCOUNT ON

    EXEC dbo.p_MaintenanceModeJob

            EXEC dbo.p_MaintenanceScheduleJob

         

    EXEC dbo.p_JobStatusTimeout

    RETURN 0

END

and SQL query that it executes is

USE [OperationsManager]

GO

/****** Object:  StoredProcedure [dbo].[p_MaintenanceModeJob]    Script Date: 9/25/2017 3:18:33 PM ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

ALTER PROCEDURE [dbo].[p_MaintenanceModeJob]

AS

BEGIN

            SET NOCOUNT ON  

    DECLARE @Err int

    DECLARE @TranCount int

    DECLARE @StartTime datetime

    DECLARE @Now datetime

    DECLARE @TempMaintMode TABLE (BaseManagedEntityId uniqueidentifier,

                                 StartTime datetime)

    SET @TranCount = @@TRANCOUNT 

    SET @Now = getutcdate()

    INSERT INTO @TempMaintMode

    SELECT BaseManagedEntityId, StartTime

    FROM dbo.MaintenanceMode

    WHERE EndTime IS NULL AND ScheduledEndTime < @Now

    SET @Err = @@ERROR

    IF (@Err <> 0) GOTO Error_Exit      

    BEGIN TRAN

    UPDATE dbo.MaintenanceMode

    SET IsInMaintenanceMode = 0,

        EndTime = @Now,

        LastModified = @Now

    WHERE BaseManagedEntityId IN (SELECT BaseManagedEntityId FROM @TempMaintMode)

    SET @Err = @@ERROR

    IF (@Err <> 0) GOTO Error_Exit  

    UPDATE dbo.MaintenanceModeHistory

    SET EndTime = @Now,

        LastModified = @Now

    FROM @TempMaintMode AS TM

    WHERE MaintenanceModeHistory.BaseManagedEntityId = TM.BaseManagedEntityId

    AND MaintenanceModeHistory.StartTime = TM.StartTime

    SET @Err = @@ERROR

    IF (@Err <> 0) GOTO Error_Exit  

    COMMIT TRAN

    RETURN 0

Error_Exit:

    IF (@@TRANCOUNT > @TranCount)

    BEGIN

        ROLLBACK TRAN

    END

    RETURN 1

END

 

By Mistake Deleted Microsoft.SystemCenter.SecureReferenceOverride

Did you by mistake or unknowingly deleted  Microsoft.SystemCenter.SecureReferenceOverride MP?

Don’t worry, in this blog post i will help you to recreate it.

You will come across this MP when you are trying to remove some other management pack and it says it cannot be removed as it has dependencies on Microsoft.SystemCenter.SecureReferenceOverride.

Now if some newbie is working on SCOM, he/ she will probably  go ahead and delete it, just to remove the management packs. But it can become a serious issue in your SCOM infrastructure.

Before i go an tell you what are the issues and how to resolve. Let me tell you what this management pack does for us.

Microsoft.SystemCenter.SecureReferenceOverride management pack stores the association of run as account with run as profiles.

So, while trying to remove this MP, the one things that come to mind is remove all the run as accounts and run as profile associations. However it does not work that way always. Even after you have removed the references it will still says it has dependencies and not allow to remove the management pack.

So here is the solution:

  1.  Export the Microsoft.SystemCenter.SecureReferenceOverride MP and manually remove all reference to it, refer to blog by  Marnix Wolf for more information.
  2. Second scenario, is where you have deleted the MPs by mistake or unknowingly

Now you might be able to remove the MP that you wanted to remove but Immediately you will see that there is a flood of alerts in SCOM. Complaining about run as accounts not available and also a bunch of SCOM data warehouse related alerts.

Now you are only way to get things right is to recollect all the run as profiles and run as accounts association and recreate them.

Go to Administration -> Run as configuration ->Profiles

And choose all the run as profiles which needs to be associated. Typically it will be various SQL, SharePoint Run as profiles, Linux run as profiles, Data warehouse account.

As soon as you associated the first one, you will immediately see that the Microsoft.SystemCenter.SecureReferenceOverride management pack gets recreated.

Once you finish doing for all of run as profiles, Voila!! You solved it.

Congrats! Now all things are in order.

SCOM DW Report Deployment Errors with SharePoint 2013 MP

Today i am going to talk about one of annoying errors that has been flooding my SCOM lately.

Error:

Data Warehouse failed to deploy reports for a management pack to SQL Reporting Services Server. Failed to deploy reporting component to the SQL Server Reporting Services server. The operation will be retried.
Exception ‘DeploymentException’: Failed to deploy reports for management pack with version dependent id ‘edf9e0b9-65aa-df29-6729-d16f0005e820’. Failed to deploy linked report ‘Microsoft.SharePoint.Server_Performance_Report’. Failed to convert management pack element reference ‘$MPElement[Name=”Microsoft.SharePoint.Foundation.2013.Responsetime”]$’ to guid. Check if MP element referenced exists in the MP. An object of class ManagementPackElement with ID 75668869-f88c-31f3-d081-409da1f06f0f was not found.
One or more workflows were affected by this.
Workflow name: Microsoft.SystemCenter.DataWarehouse.Deployment.Report

In short the error was telling me that my SCOM is unable to deploy SharePoint server performance related reports to SCOM reporting services, which means that SharePoint Reports will be unavailable, however that was not the case for me. As I was able to see all SP 2013 reports listed in my reporting pane.

So I thinking it to be a false positive and have been breaking my head on this for almost 3 days to resolve the alert.

  • I deleted the SharePoint 2013 MP and added it again
  • Reconfigured it again
  • Recheck all run as accounts

But still nothing seemed to fix this thing.

Then after a lot searching I came across Kevin Holman blogs which states that it is known issues with SharePoint 2013 MP 15.0.4425.1000

https://blogs.technet.microsoft.com/kevinholman/2013/05/13/configuring-the-sharepoint-2013-management-pack/

Follow the above link for more information on this.

 

Non Standard CCMSetup Error Codes

While troubleshooting SCCM site server roles or client, you will come across a lot of errors. While some of the error codes are easy to understand thanks to CCMTrace tool.

You will also find this toll in your SCCM media under tools folder. Just open the log with cmtrace and copy the error code that you get highlighted in red and hit CTRL+L and paste the error code, immediately you will get a pop up which will show what the error means.

However, there are a few cases where MS SCCM teams decided to use non- standard error codes which means whatever the result that CMTrace told you is not correct. If you are in that kind of a situation below table if for you.

If you know more non-standard codes, share them in comments below.

 

Error Code Meaning
0 Success
6 Error
7 Reboot Required
8 Setup already running
9 Prerequisite evaluation failure
10 Setup manifest hash validation failure

 

Troubleshooting SCOM 2016 MS Grayed out State

While working at a customer site, I came across a situation suddenly where on of the MS was in grayed out state.

Initially I followed the main troubleshooting steps in these situation that we all do:

  • Flush Health service state cache from SCOM console, for MS you can do it from Operations Manager folder -> Management group Health

Click on MS and in RHS under task select ‘Flush Health service state’ It was fine for 2 mins or so and then again back to grayed out state.

So time for next move

  • Deleted the ‘Health service state’ under operations manfer-> Server folder on the MS

Still same thing server is backed to grayed out state.

I am now going through each and every error in event viewer and that’s when I came across below error related to healthservice

SCOM_Config_Override_2 copy

Now it makes sense, so the main culprit here is this workflow.

Actually, this workflow is lined with OMS, but client is neither using OMS nor have they configured it ever.

So, it quite odd to have this pop here.

In the OMS configuration console, nothing is configured. So very odd.

Obviously, you can’t remove the OMS or System Center Advisor or Intelligence MP from SCOM as to do that you have to remove other dependant MPs which is not possible, so what do we do?

Now comes the resolution,

  • GO to Authoring Pane-> Rules type intel
  • You will find the below 3 rules show up in search results.
  • Click on first 2 rules and select enable hit override value as False make sure to select enforce check box too

SCOM_Override_3

  • Repeat it other for the 2nd rule

SCOM_Override_1

  • Save these overrides in a separate MP and not under Default MP
  • Once done, now again flush healthservice state cache and check

Voila! MS is now healthy, when I check event viewer logs, even the error are now gone.

Note: Do this only when you DO not want to connect your on-premise SCOM to OMS

Unix / Linux SCOM Commandlets

CMdLet Description
Get-SCXAgent Returns list of managed UNIX / Linux computers
Get-SCXSSHCredential Creates an SSH credential
Install-SCXAgent Install SCOM agent for discovered UNIX / Linux computers.
Invoke-SCXDiscovery Invokes the discovery operation for the specified configuration of UNIX / Linux computers.
Remove-SCXAgent Remove a UNIX or Linux computer from a management group.
Set-SCXResourcePool Change the managing resource pool for the targeted UNIX or Lunix computer.
Uninstall-SCXAgent Uninstall the UNIX / Linux agent.
Update-SCXAgent Updates the UNIX / Linux agent
scxcertconfig -list List the Xplat certificates installed in management group
scxcertconfig -remove Remove the Xplat certificates installed in management group

Example 1:

Input: get-SCXagent

Output: Will return list of all Unix / Linux managed agents

Example 2:

Input: get-SCXagent | where {$_.Name -match “X01C-XPSCOM”} | Remove-SCXAgent

Output: No output will be displayed however, agent that matches with name X01C-XPSCOM will be removed from management group.

Example 3:

Input: scxcertconfig -list

Output: Will display all Xplat certificates installed in management group.

Example 4:

Input : scxcertconfig -remove-all

Output: No output will be displayed however, all Xplat certificates installed in management group will be removed.