Oh, the Coffee Machine is a fertile location

Today’s discussion was centered around the results of a report recently run by our head of operations.

Over the past 2 years we have transformed our Service Desk. Significant investment has been made in a new platform to replace a multitude of legacy platforms (I’m sure you recognise the kind of thing, several Remedy, HP and other custom or proprietary tools replaced by a single unified tool with a unified model).

The report is quite insightful. One of the business cases for the new system was that we would have a process that allowed our end-users and customers to create their own tickets, so that no longer would we have the “Excel Batch Trouble Ticket Update” at the end of the week from various departments.

We used to have a situation where end users in their respective LOB would bi-pass the Service Desk and simply walk up to their local support staff and ‘command’ assistance. That person would then (feeling the need to react – unempowered due to their rank or perhaps feeling more allegiance to their LOB than the Corporate Beast (frankly, I think just keen to be seen to do the right thing for their customers as quickly as they can) would then also bi-pass the trouble ticket system, attempt to resolve the problem and if they were unable to resolve the problem, they would engage experts across the IT business directly to resolve the issue. At the end of the week, the individual who had been initially ‘called for help’ by the end-user would upload an excel file with the jobs they had performed and the estimated duration which would create retrospective trouble tickets in our system. Of course, these were never accurate, had missing data and more importantly, gave no indication as to the nature of the problem, the resources engaged etc.

Now although our ‘customers’ were unhappy to have impact, the fact that they had a body to shout at who would react for them was a confidence booster and we did get a lot of flak when we proposed changing the process for them to an online system. (The reality was that around 80% [!] of all major end-user incidents were

So, with our new “self serve” system having been in place for around 6 months, we were interested to understand the resource impact.

Guess what, our report shows a near 80% reduction in major end-user reported incidents. How good is that!!! That means at the same time as deploying the new trouble ticketing platform, we’ve significantly improved our underlying end-user service delivery infrastructure to the same extent without any major changes to management technology, infrastructure transformation or processes. Our infrastructure has simply become significantly more reliable magically.

Doesn’t sound right eh!

So, further investigation was required. We no longer have an Excel spreadsheet. The End-users are still contacting IT staff directly and getting them to look into their problems directly (we could name these end-users…we know who they are and where they are although we will not name them because they are [either] senior [and/or] intrinsic profit centres!).

We’ve gotten rid of the Excel spreadsheet so this workload has no way of being captured at all now.

What is striking here is that we have to face it, end-users will find a way around systems and our staff will always try to do the right thing for them. There must be a way to channel this end-user requirement within the technology the end-user is comfortable with. They all have a Blackberry, iPhone or Android. They are all socially enabled.

Our lesson for today is that we need to learn how to think like an end-user and find a bi-pass system which works as well for them as it does for us.

A Funny Thing Happened Around the Coffee Machine Today

March 8, 2012 1 comment

So we were discussing why our average monthly Critical Incident Tickets total more than 2x the number of Employees in our [multi-national] company?

And therefore…If each one of our employees is individually affected by 2 major incidents each month, how we are still able to run our business.

That then got us to talking about the fact that:

– the Helpdesk teams (taking the calls and processing the new user and LOB created tickets) are unaware of “causes”

– on the opposite side of the fence, our Event Management team are unaware of the impact, and often unaware of who they should escalate to

– AND (and this was the clincher for us) our infrastructure folks are inundated with an ever increasing list of escalated tickets and sit blissfully unaware of either the causes or the impact but have to work it out.

– while we still get a significant amount of LOB affecting faults where all the proactive monitoring in the world (we have a lot) does not indicate an incident but incidents are triggered by end-users within a LOB calling IT staff directly resulting in no ticket being created but significant IT resources engaged all the same.

A bit about our organization [from the perspective of one of the ITILosaurus Collective]:

– Global business supported by a unified IT infrastructure department

– A handful of LoBs (end customer facing businesses)

– Outsourced L0 and L1 event management and ticketing

In our business, the only constant really is Change. No cliche here, this is reality for us.

We make extensive use of virtualisation and have outsourced commodity infrastructure componentry wherever possible.

We in particular use IBM Tivoli Netcool and are transitioning to a new SaaS ServiceDesk platform [others in our ITILosaurus collective use BMC Event Manager, BMC Remedy and Service-Now] and we have our own inhouse developed CMDB, although it is not a rich CMDB: being only about 40% instrumented for entity relationships and around 80% accurate for managed entity content.

We have a huge amount of filtering in our event management, but even so, an aerial view of our operations shows that the ITIL structured IT operations processes isolates our production IT staff from the monitoring staff, and all of them isolated from the application owners.

ITIL process recommendations are creating bottlenecks in our business and reducing the ability to deliver a consistent quality of service because the tools we use underpinning these processes simply are not up to the task.

1. Our event management cannot handle the amount of events our LOBs could throw at it.

2. We have had to create upwards of 150 individual fields in every Event record to handle our LOB correlation, enrichment and filtering without endangering pre-existing Netcool rules which means our event management schema is getting out of control.

3. Our Service Desk has become a management liability in itself to support the many and varied LOB specific process templates.

4. The CMDB is never going to be accurate and the cost of migration to a commercial product from HP, IBM Tivoli, BMC other other is prohibitive and we know that we’ll never realise the dream anyway and therefore never achieve a return on that investment so we continue to maintain our own CMDB engineering staff.

In all, the cost of ownership of our management tools are beginning to outweigh the value we achieve from those tools.

In fact, our coffee machine discussion ended with someone’s sad but very salient thought that: “because of the legacy of filtering, correlation and auto-ticketing we have put in place, we’re possibly constraining change and the real management of the new technologies and outside relationships [Cloud] we’re currently rolling out.”

So, if you are looking for the reason for this long post…Is there anybody out there? Are we all (and I speak for the ITILosaurus collective here) in the same position of buying and maintaining irrelevant service management tools?

Would it not be more effective if we simply threw away all our legacy management tools when it comes to managing our new technology platforms and just waited for customers to call…when they call, we react so fast and with such positive, friendly zeal that, although our MTTR would not change, our customers and stakeholders perception of us would change for the better?

I am always reminded of a comment I overheard from an Industry Guru several years ago. He said (and I quote): I always buy Jaguar [this was the 1980s!]. Although I know they are going to break down, I also know that when they do, Jaguar will turn up at my door within 30 minutes with a new car that I will drive until they have resolved my problem. Then they deliver my Jaguar back and all with a smile. OK, if I bought a BMW or Mercedes, they would break down less, but when they do, I am treated like a piece of sh1t by them. That’s why I buy Jaguar.”

We are increasingly finding ourselves in a time where our management tools simply do not add value any more. Instead of paying for them to make it look like we are doing something, when clearly we are being hampered and having to cover it up with ineffective processes, let’s throw them away and have great processes. We can use the cash saved on the tools to give that McDonalds smile to our customers!

Have you realised an ROI from your BSM?

We have all given up on the pretense that we (a) can and, (b) will be able to achieve any form of meaningful Business Service Management or return any usable value from it.

1. On average (we are all from enterprise scale businesses) we have on average at least 200 services which should be modeled

2. We’re unable to even create useful models of services that have any reference to the infrastructure utilised by a specific service or the relevant thresholds and events that comprise that represent impact to that service at appropriate times.

3. Even if we could create the models, we would be unable to maintain the changes to them.

Where we have attempted a high level model of a service, it simply represents when ANY event occurs so it is always red!

Have you managed to deliver a real BSM platform that your organization uses every day and shows a return on your investment?

– Tell us about it

– Let us know what you use

– How do you maintain the models

We would really like to know!!! We know that we need this functionality to be able to more effectively represent our work rate and quality of service delivery to our customers.

Have you realised an ROI from your CMDB?

We have not realised a return on investment from our implementation of CMDB technology.

Have you? Please comment.

If you are currently planning to procure and deploy a CMDB because your vendor or ITIL Guru tells you you need it, look your CMDB Vendor in the eye and ask:

– Do you have a customer who has realised a return on investment from the implementation of your CMDB?

– Do they have the same scale as us?

– Can we visit them and see their CMDB implementation in action?

Watch very carefully.

ITIL: A Service Management History Lesson

OK, for those of you who go back a little way…feel free to comment on the accuracy of this, but the timeline is supposed to demonstrate how the ITIL Process Weanies Took Over:

~1989/90

(Network Fault) SNMP, HP OpenView, SunNet Manager, Network Managers, Cabletron SPECTRUM…

(Event Management) Net/Alert, MAXM, NetExpert

(Performance) Concord, Desktalk

(Traffic) Etherview, Netmetrix, Netscout, Sniffer…

~1991/92

(Compute and Application Fault) Patrol, Eco-Tools…

(Systems Administration) Tivoli…

(Trouble Ticketing) Remedy, Clarify…

(Event Management) Boole and Babbage CommandPost

~1994

(Event Management) Micromuse Netcool, IBM Tivoli TEC

(Systems Administration) CA-Unicenter

~1995

(Root-Cause) SMARTS

(Accounting) AMDOCS

(Configuration) Metasolv

~1997

(Root-Cause) RiverSoft

(Event Correlation) Micromuse Netcool Impact

(Server Monitoring) Mercury, Netcool ISMs

(Configuration) Cramer

~1998

(Root-Cause) Micromuse Netcool Precision

(Systems Monitoring) NetIQ

(Systems Administration) Voyence, NetOps, Atrium

(Business Service Management) Managed Objects, IBM Tivoli TBSM

~2000

(Predictive Fault Management) Netuitive

*****

OK, so this is a short and mucho abbreviated summary…but just look at it. There has been no fundamentally new innovations in IT Service Management technologies since 2000.

Oh yes, there are derivative works, lots of multi-variate correlation (Integrien etc.), lots of agents (Abilisoft etc.) but nothing that takes a fundamentally new approach to manage the fundamentally new technologies which we use to build our modern infrastructures.

So, what happens…where tools are not fit for purpose, we pile on the people. And what do people who are not equipped for the job (because of inadequate tools) do? They blame everyone else.

So, with a climate of finger pointing, budget constraint and an IT Department effectively run by accounts, what happens? Outsource!!!

What happens when you outsource?

(1) Invariably you need a measure of the competence of the people for the role they are carrying out – ITIL Maturity!!! (summed up very well here: http://www.itskeptic.org/uselessness-itil-process-maturity-assessment ) to prove that although things are not working very well, the people doing the work cannot be blamed for their lack of knowledge, coherence or lack of delivery.

(2) You put processes in place that have totally meaningless Service Level commitments: We answer the call within 4 rings, we always call back within an hour and then every hour after that for the duration of the incident to let you know the status. Never though do you get a commitment to resolve. I do love getting a call from one of my outsource suppliers, interrupting whatever I am doing at the time (I am very busy), to tell me that they have nothing to tell me!

(3) You get change managed to the edge of any profit your company was hoping to make. No matter what anyone tells you, Outsourcing does not pay. You see, we can never actually first time audit all the things our organization does. We always miss things and they come back to bite us as our outsource “partner” increases their charges to support those ‘little’ things that we missed when we originally wrote the contract. They know it already and they are banking on us not being thorough enough…because if we are thorough, you can be damn sure that they cannot deliver the service any cheaper than we can.

The rise of ITIL and it’s adoption is as a direct consequence of the lack of innovation in IT Service Management technologies that are effective in reducing the cost of managing modern infrastructures. ITIL is simply a ‘cover my arse’ badge of merit.

The sad thing though is that real technical experts and technologists, who we really need to support us, are marginalized by ITIL. Typically these people are not process people, they are doers, but modern IT management is not about doing, it is about being seen to be following and adhering to process. A process followed is better than a problem solved because a problem solved has no ITIL badge of honour!

So, here we are:

ITIL Rules.

CMDB is fundamentally impossible to implement and achieve a return on investment or value.

No CMDB, No BSM (Business Service Management).

No BSM, No impact management.

No BSM. No Service Management.

No CMDB or BSM, No Continuous Service Improvement.

And the value of ITIL is…

No CMDB? How Did We End Up Like This???

So if you’re read our earlier post: https://itilosaurus.wordpress.com/2012/03/02/the-cmdb-whats-not-to-love/ the rest of this post will make sense.

It’s 2012. In our collective companies we all have VLANs, ESX’s, Load Balancers, SANs, WAN Accelerators, distributed Cache-ing etc. Each of our respective employers has outsourced its telecommunications infrastructure. We also use a range of approaches to ‘core infrastructure’ management; either outsourcing to third party [2 and 3 Letter Acronym vendors] or have off-shored our event management to lower cost regions. Most of our companies have some level of Cloud services (either in production or for development and test). We all have the goal of running a homogeneously managed hybrid service delivery infrastructure.

We have all tried to use ITIL principles and the CMDB as the core to our Service offerings.

We have all failed to return value on our investment in the CMDB. Read this: He is Right…

http://www.itskeptic.org/itil-cmdb-skeptic

But why is he right?

A Service Management History Lesson.

The sad thing is that ITIL’s rise has corresponded with a lack of innovation from IT Service Management Software Vendors.

This is all driven by Four Coincident Forces.

Force Number One: IBM Tivoli Netcool, CA, BMC, HP and EMC. Since the late 1990’s these behemoths have swallowed all the original innovators in the IT Service Management market and they have failed to continue to innovate.

Force Number Two: At the turn of the 21st century we had the dotCOM crash which left Venture money no longer willing to fund Enterprise software and consequently no money to fund new innovations in the IT Service Management arena.

Force Number Three: The deskilling of the IT Operations department. It started with “collective support Silos”, where staff have had their expertise narrowed resulting in a neutered support which has no concept of the services being delivered by their silo, no concept of the impact of changes and no knowledge of their relationship with other infrastructure silos (the CMDB was supposed to automate that!).

Force Number Four. The Demand Infrastructure Transformation. Since the year 2000 just look at the philosophical changes and technological innovations that have been introduced:

  1. The “Service” culture
  2. On-demand capacity
  3. Distributed Compute
  4. Virtualization
  5. The Amorphous Network
  6. Cloud
  7. Peer-to-Peer
  8. etc…

The result of Force One: A gradual increase in the ‘custom services’ offered to overcome the limitations of the technologies they offer (and ramp up their revenues and our dependencies upon them and their technology).

The result of Force Two: Service Management tools which are incapable of delivering upon our management requirements.

The result of Force Three: IT Operations run by accountants and non-technical people. (Who become reliant on the expertise of the vendor, who is used to validate [make] decisions on management software and services procurement…the Fox in the Chicken Coup?).

The Result of Force Four: An end-user customer base who perceive that they can have what they want when they want it but who end up with a second rate experience.

The CMDB. What’s Not To Love?

March 2, 2012 2 comments

Oh the promise of it. The CMDB, in principle, is an entity of great aspiration and beauty.

Simply the idea that we could have a single source of truth for our entire managed infrastructure. A model containing all the entities under management and their interdependent relationships to each other.

Have a Fault? Simply use the CMDB to not only isolate the cause but also highlight the impact, internally within our IT realestate and the effects of the fault on our end-users and customers.

Receive a request to support a new service? The CMDB will show us whether we have the capacity to support it, enable us to quickly identify the managed entities which will support the new service or, show us where the changes need to come, what will be impacted by the changes and then contain the changes to the infrastructure once complete.

The CMDB was conceived with the Mainframe, had its gestation with the tail end of Client-Server computing and has become the in-vogue answer to all IT problems since the advent of dynamic computing.

The problem is, it is not the answer.

If you have a CMDB (or a central config reference source database) ask yourself: “How accurate is my data?”

We know from our own experience in our companies that our CMDB completeness ranges from ~15% [I kid you not!] to a claimed ~70%. The average amongst us though is a rolling ~45% accuracy, but, we all note that it is getting harder and harder to even maintain what we have as our infrastructures are partially outsourced and we look to utilize hybrid Cloud services across the piste.

The real problem we have though is that at a given point in time we do not reliably know which of the records in our CMDB are truth.

The reality is that we are unable to complete a Version 1 CMDB which we can then change manage. Unlike painting the Forth Road or Golden Gate Bridges, where once we finish painting the bridge, we start from the beginning and paint it again, if the CMDB were a bridge, we would paint part of the deck, part of the pylons, part of the cables, and while we are painting, parts of what we have painted will be replaced and parts are are in the middle of painting will be replaced. Not only do we never get a model of our infrastructure to start with, we never ever get a model, period.