Metrics. Not Just For Breakfast Anymore

Over the past couple of years, I have found myself being drawn back to my IT roots, looking to solve the same old problems that plagued IT when I was so much younger had a full head of hair, and still had to learn that I hadn’t learned it all quite yet.  Back in the day, my boss asked me how the systems were running, and how IT was performing.

I thought a moment, and responded, “All of the systems appear to be running well, we haven’t had any downtime lately, and the server room is humming along nicely.”  He waited.  I broke the silence with “It’s all good.”  My boss, being the patient and well mannered fellow that he was, reiterated, “So the systems are all up, but how is IT doing?  Are we at capacity on any of the systems, and are our processes working like they should?”  I couldn’t respond honestly, so I admitted it.  He had never asked me before how our processes were working, so it must have been all that golf he had been playing lately that had gotten to him.  We were blind to whether we were doing the right things, and doing them well or poorly.  My engineers and I had put together some fantastic systems and processes for the company, reliable, scalable, capable, but had forgotten to consider how we would be able to measure when we needed to scale, improve, support, or replace them.  DOH!  We did have basic system health gauges, but that was just for monitoring CPU and RAM thresholds.  Time to think bigger, and smaller.

Why do we collect metrics?  Metrics are a critical component of Management, whether it be Information Security, or Projects, and Programs.  If you aren’t monitoring your exposures and measuring your results, how will you know whether you have been successful?  IT is all about strategy.  We implement systems in order to meet business objectives.  IT systems support the objectives of the business.  The business could still run without IT.  Much slower, ineffecively, inefficiently, and at a retarded pace, but the business could still run.  Without metrics, how do you prove the value that your IT or Security team is bringing to the organization?  How do you justify continued spending on improvements, new tools, new technologies?

Another example, my mid-life crisis.  My wife allowed me to buy a Jeep, recently.  Always wanted one, had to have one.  My Jeep is a complex system designed to convey me, my lovely wife, our grandkids, dogs, and our worldly goods from place to place, effectively, efficiently, and at an elevated but controlable pace.  It has a fuel gauge, oil pressure indicator, and a speedometer on it, and it is also outfitted with a computerized system for monitoring fluid levels, time between service checks, tire pressure, and other critical components.  I monitor the speedometer closely, pay attention to the oil pressure indicator (it’s finicky), and keep an eye on the fuel gauge day-to-day.  The other items are hooked up to a computer because although they are critical, they have specific acceptable performance levels set as a baseline to compare against over time that would indicate problems.

The same is true for IT & IS projects, functions and processes.  Let’s look at some of the more critical ones.

Project Metrics:

  • Project / Program Schedule  – Planned versus Actual
    • This is a day-to-day trackable item that should provide indicators of a project’s progress towards success.  Falling behind or moving too fast could be precursors for trouble ahead.  Being ahead of schedule may sound like a good thing, but if you get too far ahead, you may end up with an idle team, wasting time, discouraged or inattentive human resources, and may lose those idle resources as the perception is you don’t really need them.
  • Phase Progress – The percent designed, implemented, tested, released, etc.
    • It is always good to mini-schedule your current phase, and keep your finger on the pulse of current activities.
  • Key Deliverables
    • It is important to watch for progress on specific deliverables, and watch for dependencies that delay other deliverables.  Especially things that are along the “critical path”, the path that indicates the minimum activity and deliverables for success.
  • Milestones Planned versus Actual
    • Again, this is key objectives along the critical path.  Watch again for delayed deliverables or resource conflicts.
  • Cost
    • Planned versus Actual to date
      • This measurement should track as close to plan as possible in order to avoid surprises, but in the eventuality that there are deviations, it is always good to have as much warning as possible.
    • Planned Total at Completion versus Revised Estimate at Completion
      • This measurement is a lessons learned metric that can indicate issues with your original plan, and if tracked over time, can help pinpoint when and why the wheels started to come off.
  • Defects
    • Obviously, no one wants or plans for defects, but it is wise to track their discovery, planning, correction, time and costs.
  • Resources – People, hardware, software, tools, etc.
    • Track availability, conflicts, time spent, associated costs.
    • Compare Planned versus Actual
  • Risks– New, Closed, Open & Trending
  • Issues– New, Closed, Open & Trending
  • Deliverables– Planned versus Actual to date
  • Tests– Planned versus Actual Conducted & Trending

Security Controls

  • Purpose of the control.
    • It is imperative to understand what a given control does.  Although not truly a metric, in that it can’t be clearly measured, it is a success criteria for what you will ultimately be measuring against.  Understand:
      • The Threats – What actors could do your environment harm?
      • The Risks – What could those actors do to cause harm?
      • The Impacts – What would be the result if threats acted?
        • Technical / Operational (SLA, resources, downtime, etc.)
        • Financial
        • Reputational
      • The Vulnerabilities / Attack Surface
        • How much of the control or environment is exposed both externally, and to insiders?
        • What known vulnerabilities does the control expose, introduce, or not protect against?
  • Issues– New, Closed, Open & Trending
    • Uptime
    • Unplanned outages
    • Number of events identified
    • Number of suspicious events
    • Number of events escalated into investigations
    • Number of confirmed incidents
    • Number of incidents opened
    • Number of incidents remaining open
    • Number of incidents closed
    • Number of incidents escalated to Management (impactful)
    • Number of critical incidents
  • Risks– New, Closed, Open & Trending
  • Deliverables– Planned versus Actual to date
    • These can be reports generated, action requests, advisories issued, etc.
  • Tests– Planned versus Actual Conducted & Trending

The bulk of these items are easily quantifiable and applicable across multiple security controls, processes, or products.  Qualitative information can be useful to support these metrics and provide indicators of health and sustainability.  Measure input and output times, CPU and RAM usage, number of users over time to indicate acceptance and value, if replacing a manual system calculate cost savings, etc.

Again, collect metrics that MEAN something, to you and your constituents, to the health of your processes and systems, and to management who pays for your services.  Know why you are collecting metrics in general, and document why you are collecting each SPECIFIC metric.  If management requests a certain number or graph be reported, get the reason and reporting format clearly defined.  Pretty charts are great, until something goes pear-shaped, and management comes to you to explain why you didn’t raise the alarm, and why they weren’t made aware.

Determine based on the purpose of the control or process, what you need to measure to prove success.  Determine along with that KEY PERFORMANCE INDICATOR (KPI) what other metrics are important for monitoring health and performance, and then figure out how you are going to measure these metrics.  Set thresholds for these inidcators, and start looking at how to display them and convey threat, risk, performance levels and other stats.  Look for the ability to generate input into two formats:  Reports and Dashboards.

  • Reports are generally static, snapshots in time.
  • Dashboards are live, changing, and show what is going on now, from this minute to the next.

You now have eyes and ears on the health, utility, and success of your controls and processes.  Congratulations, now plan and act on that information!