Skip to content

NIST AI RMF Measure Function

The Measure function is the analysis-and-tracking function of the NIST AI Risk Management Framework 1.0 (NIST AI RMF). Per NIST AI 100-1 §5.3, Measure employs quantitative, qualitative, or mixed-method tools, techniques, and methodologies to analyze, assess, benchmark, and monitor AI risk and related impacts. It uses knowledge from the Map function and informs the Manage function. AI systems should be tested before deployment and regularly while in operation.

Measure is organized into four categories (MEASURE 1 through MEASURE 4) covering 22 subcategories — including MEASURE 2, the largest single category in the entire AI RMF with 13 subcategories (one per trustworthy AI characteristic plus monitoring, TEVV scaffolding, and meta-evaluation). This page reproduces each category and subcategory statement verbatim from NIST AI 100-1 Table 3 and adds a short note on how each shows up in enterprise practice.

Primary source

This page is a structured guide to the NIST AI RMF Measure function — official NIST documentation. The authoritative framework text is published in NIST AI 100-1 (January 2023), Table 3. The NIST AI RMF Playbook on the AI Resource Center provides suggested actions, transparency and documentation guidance, and references for each subcategory.

How Measure fits into NIST AI RMF 1.0

NIST AI RMF 1.0 organizes AI risk management into four functions: GOVERN, MAP, MEASURE, and MANAGE. Measure is the second of the three system-level functions; it takes its inputs from Map and produces outputs that feed Manage:

  • GOVERN — cross-cutting accountability, policies, oversight, and decision rights
  • MAP — system context and risk identification
  • MEASURE — system analysis, assessment, and tracking (this page)
  • MANAGE — system prioritization, treatment, and response

Per NIST AI 100-1 §5.3, "measurement outcomes will be utilized in the MANAGE function to assist risk monitoring and response efforts. It is incumbent on Framework users to continue applying the MEASURE function to AI systems as knowledge, methodologies, risks, and impacts evolve over time."

The four Measure categories at a glance

MeasureAI system evaluation and tracking — NIST AI RMF 1.0
MEASURE 1Appropriate methods and metrics are identified and applied — 3 subcategories
MEASURE 2AI systems are evaluated for trustworthy characteristics — 13 subcategories
MEASURE 3Mechanisms for tracking identified AI risks over time — 3 subcategories
MEASURE 4Feedback about efficacy of measurement — 3 subcategories

MEASURE 1: Appropriate methods and metrics are identified and applied.

NIST AI 100-1, Table 3: Appropriate methods and metrics are identified and applied.

MEASURE 1 covers the selection of approaches and metrics, the regular re-assessment of those choices, and the use of independent assessors and external perspectives.

MEASURE 1.1

NIST AI 100-1, Table 3: Approaches and metrics for measurement of AI risks enumerated during the MAP function are selected for implementation starting with the most significant AI risks. The risks or trustworthiness characteristics that will not – or cannot – be measured are properly documented.

In practice: Prioritize evaluation methodology against the most significant risks first; document the gaps — what cannot be measured, with what consequence, so reviewers can act on the gap rather than learn about it during audit.

MEASURE 1.2

NIST AI 100-1, Table 3: Appropriateness of AI metrics and effectiveness of existing controls are regularly assessed and updated, including reports of errors and potential impacts on affected communities.

In practice: Metrics and controls drift as the system and its deployment context change; a recurring re-assessment with named owners catches both metric obsolescence and unreported impacts.

MEASURE 1.3

NIST AI 100-1, Table 3: Internal experts who did not serve as front-line developers for the system and/or independent assessors are involved in regular assessments and updates. Domain experts, users, AI actors external to the team that developed or deployed the AI system, and affected communities are consulted in support of assessments as necessary per organizational risk tolerance.

In practice: Separation of duties applied to evaluation: people who built the system shouldn't be the only people grading it, and the breadth of external consultation scales with the risk profile of the system.

MEASURE 2: AI systems are evaluated for trustworthy characteristics.

NIST AI 100-1, Table 3: AI systems are evaluated for trustworthy characteristics.

MEASURE 2 is where the NIST trustworthy AI characteristics become concrete evaluation work. Its 13 subcategories cover TEVV documentation, human-subject protections, performance demonstration, production monitoring, and one subcategory per trustworthy AI characteristic (valid and reliable; safe; secure and resilient; accountable and transparent; explainable; privacy-enhanced; fair with harmful bias managed; environmental impact), closing with a meta-evaluation of the TEVV process itself.

MEASURE 2.1

NIST AI 100-1, Table 3: Test sets, metrics, and details about the tools used during TEVV are documented.

In practice: TEVV documentation includes data fixtures, the metrics that drive pass/fail decisions, and the tools used — enough information that an independent reviewer could re-run the evaluation.

MEASURE 2.2

NIST AI 100-1, Table 3: Evaluations involving human subjects meet applicable requirements (including human subject protection) and are representative of the relevant population.

In practice: Human-subject evaluations (red-teaming with real users, usability studies, fairness audits with affected communities) follow the applicable human-subjects protocols and use representative populations — not whoever was available.

MEASURE 2.3

NIST AI 100-1, Table 3: AI system performance or assurance criteria are measured qualitatively or quantitatively and demonstrated for conditions similar to deployment setting(s). Measures are documented.

In practice: Performance is demonstrated under conditions matching the deployment setting — not just laboratory or in-distribution data — and the measurements are captured for audit.

MEASURE 2.4

NIST AI 100-1, Table 3: The functionality and behavior of the AI system and its components – as identified in the MAP function – are monitored when in production.

In practice: Production monitoring is configured against the components and behaviors identified during Map, not against generic placeholder metrics.

MEASURE 2.5

NIST AI 100-1, Table 3: The AI system to be deployed is demonstrated to be valid and reliable. Limitations of the generalizability beyond the conditions under which the technology was developed are documented.

In practice: Validity-and-reliability claims are demonstrated, not asserted; generalization boundaries are documented so operators know where the system is not validated.

MEASURE 2.6

NIST AI 100-1, Table 3: The AI system is evaluated regularly for safety risks – as identified in the MAP function. The AI system to be deployed is demonstrated to be safe, its residual negative risk does not exceed the risk tolerance, and it can fail safely, particularly if made to operate beyond its knowledge limits. Safety metrics reflect system reliability and robustness, real-time monitoring, and response times for AI system failures.

In practice: Safety evaluation is recurring, not one-off; the system is shown to fail safely (especially out-of-distribution), and the safety metrics are operational signals — reliability, robustness, latency on failure — not aspirational.

MEASURE 2.7

NIST AI 100-1, Table 3: AI system security and resilience – as identified in the MAP function – are evaluated and documented.

In practice: Adversarial testing (prompt injection, data exfiltration, model evasion), security posture review, and resilience checks (component failure, vendor outage) are conducted and recorded.

MEASURE 2.8

NIST AI 100-1, Table 3: Risks associated with transparency and accountability – as identified in the MAP function – are examined and documented.

In practice: Documented examination of where the system is opaque, who is accountable for outputs, and what redress paths exist for affected users.

MEASURE 2.9

NIST AI 100-1, Table 3: The AI model is explained, validated, and documented, and AI system output is interpreted within its context – as identified in the MAP function – to inform responsible use and governance.

In practice: Explanations connect model behavior to the deployment context; output interpretation is grounded in the context Map established, not in generic explainability artifacts.

MEASURE 2.10

NIST AI 100-1, Table 3: Privacy risk of the AI system – as identified in the MAP function – is examined and documented.

In practice: Privacy evaluation tied to the privacy threats Map identified — training-data leakage, inference attacks, sensitive-data inclusion in prompts — with documented findings, not just policy statements.

MEASURE 2.11

NIST AI 100-1, Table 3: Fairness and bias – as identified in the MAP function – are evaluated and results are documented.

In practice: Bias evaluation against the specific bias risks Map identified, with stratified results captured (per the three NIST bias categories — systemic, computational/statistical, human-cognitive).

MEASURE 2.12

NIST AI 100-1, Table 3: Environmental impact and sustainability of AI model training and management activities – as identified in the MAP function – are assessed and documented.

In practice: Training compute footprint, ongoing inference cost, and sustainability of the deployment posture are assessed and documented as part of evaluation, not as an afterthought.

MEASURE 2.13

NIST AI 100-1, Table 3: Effectiveness of the employed TEVV metrics and processes in the MEASURE function are evaluated and documented.

In practice: Meta-evaluation: are the metrics we chose actually useful, and is the evaluation process catching what it should? A recurring check that prevents the evaluation framework itself from going stale.

MEASURE 3: Mechanisms for tracking identified AI risks over time are in place.

NIST AI 100-1, Table 3: Mechanisms for tracking identified AI risks over time are in place.

MEASURE 3 covers risk tracking over time — including for risks that are difficult to quantify — and the feedback loops from end users and impacted communities.

MEASURE 3.1

NIST AI 100-1, Table 3: Approaches, personnel, and documentation are in place to regularly identify and track existing, unanticipated, and emergent AI risks based on factors such as intended and actual performance in deployed contexts.

In practice: Risk tracking covers existing risks and the unanticipated/emergent risks that surface from real-world deployment — named owners, documented approach, recurring cadence.

MEASURE 3.2

NIST AI 100-1, Table 3: Risk tracking approaches are considered for settings where AI risks are difficult to assess using currently available measurement techniques or where metrics are not yet available.

In practice: Acknowledged ambiguity: when no good metric exists, use qualitative tracking and incident-based monitoring rather than pretending the risk does not exist.

MEASURE 3.3

NIST AI 100-1, Table 3: Feedback processes for end users and impacted communities to report problems and appeal system outcomes are established and integrated into AI system evaluation metrics.

In practice: End-user feedback and appeal mechanisms are part of the evaluation surface — not a separate support flow — and the signal they generate is fed back into the metrics that drive Measure decisions.

MEASURE 4: Feedback about efficacy of measurement is gathered and assessed.

NIST AI 100-1, Table 3: Feedback about efficacy of measurement is gathered and assessed.

MEASURE 4 closes the loop on the Measure function itself — is the measurement approach producing useful information that domain experts and stakeholders consider valid?

MEASURE 4.1

NIST AI 100-1, Table 3: Measurement approaches for identifying AI risks are connected to deployment context(s) and informed through consultation with domain experts and other end users. Approaches are documented.

In practice: Measurement methodology is grounded in the actual deployment context and tested against the people who understand it — domain experts and end users — and documented so the approach is reviewable.

MEASURE 4.2

NIST AI 100-1, Table 3: Measurement results regarding AI system trustworthiness in deployment context(s) and across the AI lifecycle are informed by input from domain experts and relevant AI actors to validate whether the system is performing consistently as intended. Results are documented.

In practice: Trustworthiness results are validated by domain experts and relevant AI actors before being relied upon for decisions — not just by the team that ran the measurement.

MEASURE 4.3

NIST AI 100-1, Table 3: Measurable performance improvements or declines based on consultations with relevant AI actors, including affected communities, and field data about context-relevant risks and trustworthiness characteristics are identified and documented.

In practice: Track measurable improvements and declines using both stakeholder consultation and field data; document both so the system's trajectory is auditable.

How to operationalize Measure in Modulos

Measure outcomes are AI-system-level evaluation and monitoring records captured per project. In Modulos they can be represented using:

  • Methods and metrics selection (MEASURE 1): project assets and control narratives that capture the evaluation methodology, plus Runtime Inspection tests with schedules and assignees that put it into practice.
  • Trustworthy-characteristic evaluation (MEASURE 2): Runtime Inspection tests linked to controls and evidence, covering the trustworthy AI characteristics; result history is preserved on each test.
  • Independent assessment (MEASURE 1.3): the project reviewer role approves status changes; teams use this for separation of duties where possible, recognising that Project Owners can also act as reviewers.
  • Risk tracking over time (MEASURE 3): project risks updated as Runtime Inspection signals and field data come in, with decision logs preserved on each change.
  • Feedback efficacy (MEASURE 4): reviews and decision history on the project capture stakeholder input on measurement results so the rationale is durable for audit.

Measure outputs flow forward into Manage (which risks get treated, with what residual). For the broader operating model, see Operationalizing NIST AI RMF in Modulos.

Cross-framework mapping (preview)

The Measure function maps loosely onto two adjacent frameworks that many organizations adopt alongside NIST AI RMF:

  • ISO/IEC 42001:2023 — Measure outcomes correspond most directly to Clause 9 (performance evaluation — monitoring, measurement, analysis, and internal audit) and Annex A controls on AI system performance assessment and verification. NIST AI RMF Measure is often the implementation pattern that produces evidence for the ISO 42001 performance-evaluation requirements.
  • EU AI Act (Regulation (EU) 2024/1689) — for high-risk AI systems, the requirements set out in Article 9 (risk management system, including ongoing testing) and Article 15 (accuracy, robustness, and cybersecurity) are satisfied by providers; Measure outcomes are the practical evidence base. Article 26 sets deployer obligations including monitoring, use according to instructions, and logging; Article 72 sets the provider post-market monitoring system. Measure outcomes feed each of these surfaces.

Preview

Detailed control-by-control mappings are the subject of dedicated pages and are not included here. The deep mapping artifacts will live at /frameworks/nist-ai-rmf/iso-42001-mapping and /frameworks/nist-ai-rmf/eu-ai-act-mapping.

For framework-level comparison rather than control mapping, see ISO/IEC 42001 vs NIST AI RMF.

Disclaimer

This page reproduces and summarises publicly available NIST guidance for orientation and operational use. The authoritative source for the NIST AI Risk Management Framework Measure function is NIST AI 100-1 (January 2023), Table 3, and the NIST AI RMF Playbook. This page does not constitute legal advice.