Application Security Metrics That Auditors Can Trust

Why Metrics Matter for Audit Assurance

Controls either work or they do not — but determining which requires more than a point-in-time check. Metrics provide the longitudinal evidence that auditors need to assess whether security controls are operating effectively over time, not just on the day of the audit.

An organisation that can produce consistent, system-generated application security metrics demonstrates operational maturity. An organisation that cannot produce metrics — or produces only manually compiled numbers — reveals a fundamental gap in its control environment.

For compliance officers, metrics serve a dual purpose: they satisfy regulatory reporting requirements (DORA, NIS2, and sector-specific mandates increasingly require quantitative security reporting) and they provide early warning when controls are degrading. A rising mean time to remediate or a declining testing coverage percentage signals problems before they become audit findings.

This guide defines the metrics that matter, explains how auditors can assess their trustworthiness, and identifies the most common ways metrics can be manipulated.

Categories of AppSec Metrics

Application security metrics fall into four categories, each serving a different assurance purpose:

  • Coverage metrics: Measure the breadth of the security programme — what percentage of the application portfolio is being tested and monitored
  • Efficiency metrics: Measure how well the organisation responds to findings — how quickly vulnerabilities are remediated and how effectively resources are used
  • Risk metrics: Measure the current risk exposure — how many vulnerabilities are open, how old they are, and how many have been accepted
  • Compliance metrics: Measure adherence to internal policies and external regulatory requirements — whether gates are enforced, SLAs are met, and evidence is complete

Key Metrics Reference

The following table provides a comprehensive reference for the most important application security metrics. For each metric, auditors should understand what it measures, what a reasonable target looks like, where the data should come from, and how it can be manipulated.

Metric Name What It Measures Target / Threshold Data Source Manipulation Risk
SAST Coverage Rate % of applications with active SAST scanning 100% for Tier 1-3; risk-based for Tier 4 SAST platform, CI/CD pipeline records Excluding applications from scope to inflate percentage
DAST Coverage Rate % of applications with DAST scanning at required frequency 100% for Tier 1-2; 100% annually for Tier 3 DAST platform, scan schedule records Running scans with limited scope or authenticated paths
SCA Coverage Rate % of applications with active dependency scanning 100% for Tier 1-3 SCA platform, build pipeline logs Excluding repositories or build configurations
Threat Model Coverage % of Tier 1-2 applications with current threat models 100% for Tier 1; 100% for Tier 2 Threat modelling tool or document repository Creating superficial models that do not reflect actual architecture
Mean Time to Remediate (Critical) Average days from vulnerability identification to verified fix for critical severity ≤15 days Vulnerability management platform Downgrading severity to avoid SLA; closing without verification
Mean Time to Remediate (High) Average days from identification to verified fix for high severity ≤30 days Vulnerability management platform Same as above
Vulnerability Reopen Rate % of vulnerabilities that are closed then reappear <5% Vulnerability management platform Redefining reopen criteria; creating new tickets instead of reopening
False Positive Rate % of findings marked as false positive after triage <20% (varies by tool) Security testing tools, triage records Classifying true positives as false to reduce workload
Open Critical/High Vulnerabilities Count of unresolved critical and high severity vulnerabilities Trending downward; zero critical in Tier 1 apps Vulnerability management platform Downgrading severity; moving to risk acceptance without proper approval
Vulnerability Age (P90) 90th percentile age of open vulnerabilities Within SLA for each severity level Vulnerability management platform Resetting discovery date; closing and re-opening
Exception/Suppression Count Number of active vulnerability exceptions or suppressions Trending stable or downward; each with documented approval Exception register, vulnerability platform Informal suppression outside tracked systems
Risk Acceptance Backlog Count of formally accepted risks with open review dates All within review period; none overdue Risk register Setting review dates far in the future to avoid reassessment
Policy Gate Pass Rate % of releases that pass all security policy gates on first attempt >80% (improving over time) CI/CD pipeline, policy engine logs Weakening gate criteria to increase pass rate
Approval Bypass Rate % of releases that bypassed required security approvals <2%; all with documented exception Change management system, pipeline logs Using emergency procedures routinely
SLA Compliance Rate % of vulnerabilities remediated within defined SLA >90% Vulnerability management platform Adjusting SLA definitions; pausing SLA timers
Evidence Completeness Score % of required audit evidence artifacts that are available and current >95% Evidence repository, GRC platform Generating evidence retrospectively before audit

Coverage Metrics in Detail

Coverage metrics answer the question: is the organisation testing what it should be testing?

The most fundamental coverage metric is the percentage of applications with active security testing relative to the total application inventory. This should be segmented by application tier, because a 90% overall coverage rate could conceal the fact that several Tier 1 critical applications are untested.

Auditors should request coverage data broken down as follows:

  • SAST coverage by application tier
  • DAST coverage by application tier, with frequency verification
  • SCA coverage by application tier
  • Percentage of critical applications with current threat models
  • Security testing coverage by application tier compared to the required frequency defined in the classification framework

A coverage metric is only meaningful if the denominator is accurate. If the application inventory is incomplete, coverage percentages are misleading. Auditors should cross-reference the application inventory used for coverage calculations with other sources (CMDB, deployment platforms, cloud account inventories) to verify completeness.

Efficiency Metrics in Detail

Efficiency metrics answer the question: when the organisation finds vulnerabilities, does it fix them effectively?

Mean Time to Remediate (MTTR) is the most important efficiency metric. It should be tracked by severity level and by application tier, because a 30-day average MTTR is acceptable for high-severity findings but not for critical findings in Tier 1 applications.

MTTR should be measured from the date of identification (when the finding was first reported by a scanning tool or tester) to the date of verified remediation (when a rescan or retest confirms the fix is effective). Organisations that measure to the date the developer closes the ticket — without verification — are reporting an incomplete metric.

Vulnerability reopen rate indicates fix quality. A rate above 5% suggests that remediations are superficial or that root causes are not being addressed.

False positive rate indicates tool effectiveness and triage quality. An unusually high false positive rate may indicate that findings are being dismissed as false positives rather than investigated — auditors should sample false positive classifications to verify they are legitimate.

Scan-to-fix cycle time measures the end-to-end elapsed time from when a scan completes to when all findings from that scan are resolved. This captures delays in triage and assignment that MTTR alone may not reveal.

Risk Metrics in Detail

Risk metrics answer the question: what is the current vulnerability exposure, and is it being managed?

The count of open critical and high-severity vulnerabilities is a baseline risk metric. It should be tracked as a trend over time — a stable or declining count indicates an effective programme; a rising count indicates that findings are being generated faster than they are remediated.

Vulnerability ageing measures how long vulnerabilities remain open. The 90th percentile age is more useful than the average because averages can be skewed by a large number of quickly resolved low-severity findings. If the P90 age exceeds the SLA, the organisation is not meeting its own remediation commitments for a significant portion of findings.

Exception and suppression counts reveal how the organisation handles findings it chooses not to fix. Every exception should have a documented approval, a compensating control, and a review date. A rising exception count without corresponding justification indicates that the exception process is being used to avoid remediation rather than manage genuine risk.

Risk acceptance backlog tracks formally accepted risks. Auditors should verify that each accepted risk has an owner, a documented rationale, a compensating control, and a review date — and that overdue reviews are escalated.

Compliance Metrics in Detail

Compliance metrics answer the question: is the organisation following its own policies and meeting regulatory requirements?

Policy gate pass rate measures how often releases pass security gates on the first attempt. A very low pass rate may indicate that security requirements are unclear or that development teams are not receiving adequate feedback early enough. A suspiciously high pass rate (approaching 100%) may indicate that gates are too lenient.

Approval bypass rate is a critical control metric. In regulated environments, every bypass should be documented with an exception approval. A bypass rate above 2% — or any bypasses without documented approval — is a significant finding.

SLA compliance rate measures whether vulnerabilities are remediated within the timeframes defined in the vulnerability management policy. This should be tracked by severity and tier. Organisations that consistently miss SLAs for critical findings in Tier 1 applications have a material control weakness.

Evidence completeness score measures the organisation’s readiness for audit by tracking what percentage of required evidence artifacts are available, current, and properly stored. This is a meta-metric that indicates governance maturity.

What Makes a Metric Trustworthy for Auditors

Not all metrics are equally reliable. Auditors should assess the trustworthiness of reported metrics against the following criteria:

  • System-generated: The metric is produced automatically by a security tool or platform, not compiled manually in a spreadsheet. Manual compilation introduces both error and manipulation risk.
  • Tamper-resistant: The data source has access controls and audit logs that prevent unauthorised modification. Metrics from a read-only dashboard connected to a controlled data source are more trustworthy than exported reports.
  • Consistent methodology: The metric is calculated the same way every reporting period. Changes in methodology (e.g., changing how MTTR is calculated mid-year) must be documented and justified.
  • Historical trends available: At least 12 months of historical data is available to identify trends. A single data point is a measurement, not a metric — trends reveal whether controls are improving, stable, or degrading.
  • Segmented by risk tier: Aggregate metrics conceal important variations. Metrics should be available by application tier, by business unit, or by severity to support meaningful analysis.
  • Reconcilable: The metric can be traced back to underlying data. If the MTTR is reported as 12 days, auditors should be able to view the individual remediation records that produce that average.

Metrics That Can Be Gamed — and How Auditors Can Detect It

Metrics are only useful if they reflect reality. The following are the most common manipulation techniques and the audit procedures to detect them:

Lowering Severity

Vulnerabilities are downgraded from critical to high, or from high to medium, to avoid triggering SLA requirements or executive reporting thresholds.

Detection: Compare severity distributions over time. A sudden decrease in critical findings with a corresponding increase in high findings is suspicious. Sample downgraded findings and assess whether the severity change was justified and approved.

Bulk Closures Before Audit

A large number of vulnerabilities are closed immediately before an audit period, either through mass risk acceptance or by marking findings as resolved without verification.

Detection: Plot vulnerability closure dates on a timeline. Clustering of closures before audit periods is a clear indicator. Request retest evidence for a sample of recently closed findings.

Excluding Applications from Scope

Applications are removed from the inventory or excluded from scanning to improve coverage percentages and reduce the total vulnerability count.

Detection: Compare the application inventory used for metrics with other authoritative sources (cloud account inventories, deployment platform records, network scans). Any discrepancies require explanation.

Resetting Discovery Dates

Vulnerabilities are closed and re-opened (or new tickets are created for the same finding) to reset the clock on ageing metrics.

Detection: Look for findings with identical descriptions and different creation dates. Check whether the vulnerability management platform tracks original discovery date separately from ticket creation date.

Weakening Gate Criteria

Policy gate thresholds are loosened (e.g., changing the blocking threshold from “no high or critical” to “no critical only”) to improve pass rates without improving actual security.

Detection: Request the change history for policy gate configurations. Any changes to thresholds should be approved through the governance process and documented with rationale.

Recommended Reporting Cadence

Different stakeholders require different levels of detail at different frequencies. The following cadence balances operational needs with governance requirements:

Reporting Level Frequency Audience Content
Operational Weekly AppSec team, Security Champions, Development Team Leads New findings, remediation progress, overdue items, scan failures, immediate action items
Management Monthly CISO, AppSec Lead, Development Directors, Compliance Officer Trend analysis, MTTR by severity/tier, coverage changes, SLA compliance, exception summary, emerging risks
Executive / Audit Quarterly Board / Risk Committee, External Auditors, Regulators Risk posture summary, programme maturity assessment, key metric trends (12-month view), material findings, regulatory compliance status, resource adequacy

For regulated organisations subject to DORA, quarterly reporting to the management body on ICT risk — including application security — is a regulatory expectation, not a best practice recommendation.

Further Reading

For related guidance on application security controls and audit assessment, see:


Related for Auditors

New to CI/CD auditing? Start with our Auditor’s Guide.