Application Security Metrics That Auditors Can Trust

Why Metrics Matter for Audit Assurance

Controls either work or they do not — but determining which requires more than a point-in-time check. Metrics provide the longitudinal evidence that auditors need to assess whether security controls are operating effectively over time, not just on the day of the audit.

An organisation that can produce consistent, system-generated application security metrics demonstrates operational maturity. An organisation that cannot produce metrics — or produces only manually compiled numbers — reveals a fundamental gap in its control environment.

For compliance officers, metrics serve a dual purpose: they satisfy regulatory reporting requirements (DORA, NIS2, and sector-specific mandates increasingly require quantitative security reporting) and they provide early warning when controls are degrading. A rising mean time to remediate or a declining testing coverage percentage signals problems before they become audit findings.

This guide defines the metrics that matter, explains how auditors can assess their trustworthiness, and identifies the most common ways metrics can be manipulated.

Categories of AppSec Metrics

Application security metrics fall into four categories, each serving a different assurance purpose:

Coverage metrics: Measure the breadth of the security programme — what percentage of the application portfolio is being tested and monitored
Efficiency metrics: Measure how well the organisation responds to findings — how quickly vulnerabilities are remediated and how effectively resources are used
Risk metrics: Measure the current risk exposure — how many vulnerabilities are open, how old they are, and how many have been accepted
Compliance metrics: Measure adherence to internal policies and external regulatory requirements — whether gates are enforced, SLAs are met, and evidence is complete

Key Metrics Reference

The following table provides a comprehensive reference for the most important application security metrics. For each metric, auditors should understand what it measures, what a reasonable target looks like, where the data should come from, and how it can be manipulated.

Metric Name	What It Measures	Target / Threshold	Data Source	Manipulation Risk
SAST Coverage Rate	% of applications with active SAST scanning	100% for Tier 1-3; risk-based for Tier 4	SAST platform, CI/CD pipeline records	Excluding applications from scope to inflate percentage
DAST Coverage Rate	% of applications with DAST scanning at required frequency	100% for Tier 1-2; 100% annually for Tier 3	DAST platform, scan schedule records	Running scans with limited scope or authenticated paths
SCA Coverage Rate	% of applications with active dependency scanning	100% for Tier 1-3	SCA platform, build pipeline logs	Excluding repositories or build configurations
Threat Model Coverage	% of Tier 1-2 applications with current threat models	100% for Tier 1; 100% for Tier 2	Threat modelling tool or document repository	Creating superficial models that do not reflect actual architecture
Mean Time to Remediate (Critical)	Average days from vulnerability identification to verified fix for critical severity	≤15 days	Vulnerability management platform	Downgrading severity to avoid SLA; closing without verification
Mean Time to Remediate (High)	Average days from identification to verified fix for high severity	≤30 days	Vulnerability management platform	Same as above
Vulnerability Reopen Rate	% of vulnerabilities that are closed then reappear	<5%	Vulnerability management platform	Redefining reopen criteria; creating new tickets instead of reopening
False Positive Rate	% of findings marked as false positive after triage	<20% (varies by tool)	Security testing tools, triage records	Classifying true positives as false to reduce workload
Open Critical/High Vulnerabilities	Count of unresolved critical and high severity vulnerabilities	Trending downward; zero critical in Tier 1 apps	Vulnerability management platform	Downgrading severity; moving to risk acceptance without proper approval
Vulnerability Age (P90)	90th percentile age of open vulnerabilities	Within SLA for each severity level	Vulnerability management platform	Resetting discovery date; closing and re-opening
Exception/Suppression Count	Number of active vulnerability exceptions or suppressions	Trending stable or downward; each with documented approval	Exception register, vulnerability platform	Informal suppression outside tracked systems
Risk Acceptance Backlog	Count of formally accepted risks with open review dates	All within review period; none overdue	Risk register	Setting review dates far in the future to avoid reassessment
Policy Gate Pass Rate	% of releases that pass all security policy gates on first attempt	>80% (improving over time)	CI/CD pipeline, policy engine logs	Weakening gate criteria to increase pass rate
Approval Bypass Rate	% of releases that bypassed required security approvals	<2%; all with documented exception	Change management system, pipeline logs	Using emergency procedures routinely
SLA Compliance Rate	% of vulnerabilities remediated within defined SLA	>90%	Vulnerability management platform	Adjusting SLA definitions; pausing SLA timers
Evidence Completeness Score	% of required audit evidence artifacts that are available and current	>95%	Evidence repository, GRC platform	Generating evidence retrospectively before audit

Coverage Metrics in Detail

Coverage metrics answer the question: is the organisation testing what it should be testing?

The most fundamental coverage metric is the percentage of applications with active security testing relative to the total application inventory. This should be segmented by application tier, because a 90% overall coverage rate could conceal the fact that several Tier 1 critical applications are untested.

Auditors should request coverage data broken down as follows:

SAST coverage by application tier
DAST coverage by application tier, with frequency verification
SCA coverage by application tier
Percentage of critical applications with current threat models
Security testing coverage by application tier compared to the required frequency defined in the classification framework

A coverage metric is only meaningful if the denominator is accurate. If the application inventory is incomplete, coverage percentages are misleading. Auditors should cross-reference the application inventory used for coverage calculations with other sources (CMDB, deployment platforms, cloud account inventories) to verify completeness.

Efficiency Metrics in Detail

Efficiency metrics answer the question: when the organisation finds vulnerabilities, does it fix them effectively?

Mean Time to Remediate (MTTR) is the most important efficiency metric. It should be tracked by severity level and by application tier, because a 30-day average MTTR is acceptable for high-severity findings but not for critical findings in Tier 1 applications.

MTTR should be measured from the date of identification (when the finding was first reported by a scanning tool or tester) to the date of verified remediation (when a rescan or retest confirms the fix is effective). Organisations that measure to the date the developer closes the ticket — without verification — are reporting an incomplete metric.

Vulnerability reopen rate indicates fix quality. A rate above 5% suggests that remediations are superficial or that root causes are not being addressed.

False positive rate indicates tool effectiveness and triage quality. An unusually high false positive rate may indicate that findings are being dismissed as false positives rather than investigated — auditors should sample false positive classifications to verify they are legitimate.

Scan-to-fix cycle time measures the end-to-end elapsed time from when a scan completes to when all findings from that scan are resolved. This captures delays in triage and assignment that MTTR alone may not reveal.

Risk Metrics in Detail

Risk metrics answer the question: what is the current vulnerability exposure, and is it being managed?

The count of open critical and high-severity vulnerabilities is a baseline risk metric. It should be tracked as a trend over time — a stable or declining count indicates an effective programme; a rising count indicates that findings are being generated faster than they are remediated.

Vulnerability ageing measures how long vulnerabilities remain open. The 90th percentile age is more useful than the average because averages can be skewed by a large number of quickly resolved low-severity findings. If the P90 age exceeds the SLA, the organisation is not meeting its own remediation commitments for a significant portion of findings.

Exception and suppression counts reveal how the organisation handles findings it chooses not to fix. Every exception should have a documented approval, a compensating control, and a review date. A rising exception count without corresponding justification indicates that the exception process is being used to avoid remediation rather than manage genuine risk.

Risk acceptance backlog tracks formally accepted risks. Auditors should verify that each accepted risk has an owner, a documented rationale, a compensating control, and a review date — and that overdue reviews are escalated.

Compliance Metrics in Detail

Compliance metrics answer the question: is the organisation following its own policies and meeting regulatory requirements?

Policy gate pass rate measures how often releases pass security gates on the first attempt. A very low pass rate may indicate that security requirements are unclear or that development teams are not receiving adequate feedback early enough. A suspiciously high pass rate (approaching 100%) may indicate that gates are too lenient.

Approval bypass rate is a critical control metric. In regulated environments, every bypass should be documented with an exception approval. A bypass rate above 2% — or any bypasses without documented approval — is a significant finding.

SLA compliance rate measures whether vulnerabilities are remediated within the timeframes defined in the vulnerability management policy. This should be tracked by severity and tier. Organisations that consistently miss SLAs for critical findings in Tier 1 applications have a material control weakness.

Evidence completeness score measures the organisation’s readiness for audit by tracking what percentage of required evidence artifacts are available, current, and properly stored. This is a meta-metric that indicates governance maturity.

What Makes a Metric Trustworthy for Auditors

Not all metrics are equally reliable. Auditors should assess the trustworthiness of reported metrics against the following criteria:

System-generated: The metric is produced automatically by a security tool or platform, not compiled manually in a spreadsheet. Manual compilation introduces both error and manipulation risk.
Tamper-resistant: The data source has access controls and audit logs that prevent unauthorised modification. Metrics from a read-only dashboard connected to a controlled data source are more trustworthy than exported reports.
Consistent methodology: The metric is calculated the same way every reporting period. Changes in methodology (e.g., changing how MTTR is calculated mid-year) must be documented and justified.
Historical trends available: At least 12 months of historical data is available to identify trends. A single data point is a measurement, not a metric — trends reveal whether controls are improving, stable, or degrading.
Segmented by risk tier: Aggregate metrics conceal important variations. Metrics should be available by application tier, by business unit, or by severity to support meaningful analysis.
Reconcilable: The metric can be traced back to underlying data. If the MTTR is reported as 12 days, auditors should be able to view the individual remediation records that produce that average.

Metrics That Can Be Gamed — and How Auditors Can Detect It

Metrics are only useful if they reflect reality. The following are the most common manipulation techniques and the audit procedures to detect them:

Lowering Severity

Vulnerabilities are downgraded from critical to high, or from high to medium, to avoid triggering SLA requirements or executive reporting thresholds.

Detection: Compare severity distributions over time. A sudden decrease in critical findings with a corresponding increase in high findings is suspicious. Sample downgraded findings and assess whether the severity change was justified and approved.

Bulk Closures Before Audit

A large number of vulnerabilities are closed immediately before an audit period, either through mass risk acceptance or by marking findings as resolved without verification.

Detection: Plot vulnerability closure dates on a timeline. Clustering of closures before audit periods is a clear indicator. Request retest evidence for a sample of recently closed findings.

Excluding Applications from Scope

Applications are removed from the inventory or excluded from scanning to improve coverage percentages and reduce the total vulnerability count.

Detection: Compare the application inventory used for metrics with other authoritative sources (cloud account inventories, deployment platform records, network scans). Any discrepancies require explanation.

Resetting Discovery Dates

Vulnerabilities are closed and re-opened (or new tickets are created for the same finding) to reset the clock on ageing metrics.

Detection: Look for findings with identical descriptions and different creation dates. Check whether the vulnerability management platform tracks original discovery date separately from ticket creation date.

Weakening Gate Criteria

Policy gate thresholds are loosened (e.g., changing the blocking threshold from “no high or critical” to “no critical only”) to improve pass rates without improving actual security.

Detection: Request the change history for policy gate configurations. Any changes to thresholds should be approved through the governance process and documented with rationale.

Recommended Reporting Cadence

Different stakeholders require different levels of detail at different frequencies. The following cadence balances operational needs with governance requirements:

Reporting Level	Frequency	Audience	Content
Operational	Weekly	AppSec team, Security Champions, Development Team Leads	New findings, remediation progress, overdue items, scan failures, immediate action items
Management	Monthly	CISO, AppSec Lead, Development Directors, Compliance Officer	Trend analysis, MTTR by severity/tier, coverage changes, SLA compliance, exception summary, emerging risks
Executive / Audit	Quarterly	Board / Risk Committee, External Auditors, Regulators	Risk posture summary, programme maturity assessment, key metric trends (12-month view), material findings, regulatory compliance status, resource adequacy

For regulated organisations subject to DORA, quarterly reporting to the management body on ICT risk — including application security — is a regulatory expectation, not a best practice recommendation.