stage 2 pattern analysis

Raw Data

This file contains raw search retrieval results or agent logs. The content below shows the original markdown source.

---
layout: raw-data.njk
title: "stage 2 pattern analysis"
---

# Stage 2: Cross-Document Pattern Analysis

## Date and Agent
- Date: 2025-11-19
- Agent: Claude (general-purpose agent)

## Objective

Identify systematic patterns, issues, and opportunities across all 30 documents comprising the identification standards framework through deep analysis of annotation sets, Tom's manual review notes, and cross-document relationships using the MCP server. This stage builds on Stage 1's initial exploration to uncover recurring themes, structural problems, and consolidation opportunities that will inform the restructuring approach in Phase 2.

## Methodology

### Tools and Approaches Used

1. **Annotation Set Analysis**:
   - Reviewed TomNotesManualReview.md (40 high-level findings)
   - Examined 6 annotation JSON files in detail (representative sample from 23 total):
     - Authentication Assurance Standard (8 annotations)
     - Identification Terminology (6 annotations)
     - Conforming with Standards (24 annotations - most heavily annotated)
     - Implementing Federation Assurance Standard (10 annotations)
     - Assessing Identification Risk (27 annotations)
     - Overview of Standards (6 annotations)
   - Analyzed annotation structure: JSON format with `bid` (block ID matching docrefId), `a` (annotation text), organized by document

2. **Semantic Neighbor Analysis**:
   - Used `find_semantic_neighbors` with k=15-20, min_score=0.75 on key nodes
   - Traced concepts across document boundaries to identify consolidation opportunities
   - Notable finding: Many nodes link exclusively to "About Identification Management" document

3. **Structural Pattern Queries**:
   - Document connectivity analysis (semantic similarity relationship counts)
   - Virtual node analysis (229 virtual nodes indicating hierarchy gaps)
   - Content type distribution by document
   - Hierarchy depth and complexity analysis

4. **Document Comparison**:
   - Standards vs implementation guides (voice, structure, content)
   - Process guidance vs foundational materials
   - Legal framework integration patterns

### Annotation Set Structure Findings

**JSON Format**:
```json
{
  "name": "nz_identification-management/[slug]_[version]_en",
  "jurisdiction": "nz",
  "slug": "identification-management/[document-name]",
  "language": "en",
  "version": "YYYY",
  "au": "Tom Barraclough",
  "annotations": [
    {
      "id": "N",
      "bid": "part1-section2-para3",  // Maps to docrefId in MCP server
      "a": "Annotation text here",
      "au": "Tom Barraclough",
      "t": [],  // tags
      "u": [],  // users
      "n": [],  // notes
      "r": []   // replies
    }
  ]
}
```

**Annotation Coverage**:
- 23 annotation files for identification management documents (excluding external legal docs)
- Annotation density varies significantly:
  - Highest: Assessing Identification Risk (27 annotations), Conforming with Standards (24)
  - Moderate: Implementing guides (8-10 annotations)
  - Low: Core standards (6-8 annotations) - constrained by non-modification requirement
  - Contact/administrative pages: 1-2 annotations

**Annotation Traceability**:
- `bid` field directly corresponds to `docrefId` property in MCP server nodes
- Enables precise mapping of feedback to specific content blocks
- Critical for Phase 2 verification and citation traceability

## Key Findings

### 1. Tom's Feedback Themes (Categorized)

#### Theme A: Passive Voice Pervades Guidance Materials

**Prevalence**: Appears in 15+ annotations across multiple documents

**Pattern**: Documentation uses passive constructions that obscure agency and responsibility.

**Examples from annotations**:
- "Passive voice is confusing" (Assessing Risk, part1-para2)
- "Why so passive? 'Issuance' - by who?" (Conforming, part3-subpart3-title)
- "Very vague. Very passive" (Conforming, part3-subpart2-section4-para2)
- "Passive" (multiple instances in Conforming document)
- "Would be much clearer in active voice" (Assessing Risk, part4-det7)

**Contrast identified by Tom**:
- "Active voice - much clearer" (Assessing Risk, part3-para1, part3-section1-para1, part3-det5-para1)
- Tom specifically praises active voice sections, noting they are "much clearer" and "better"

**Specific manifestations**:
- "Credential enrolment" rather than "X enrols the credential" (Manual review notes)
- "Application of the controls will contribute..." vs "Apply these controls to..."
- "Is performed" vs "You perform" or "DIA performs"

**Scope of issue**: Passive voice is systemic across guidance materials but less prevalent in some sections of Assessing Identification Risk where active voice appears, demonstrating that alternative is possible.

**Implication for Phase 2**: Guidance materials (not core standards) should be systematically rewritten in active voice. This is feasible given the constraint allows modification of guidance.

#### Theme B: Content Fragmentation and "Tucked Away" Information

**Prevalence**: Appears in 12+ annotations, explicitly noted in manual review

**Pattern**: Critical information is hidden in detail expanders, buried in separate documents, or placed late in documents rather than prominently featured.

**Examples from annotations**:

*Detail Expanders Hiding Essential Content*:
- "The detail expanders here don't help because they're the main point of the content" (Conforming, part1-title)
- "There is no point to burying these in detail expands" (Conforming, part3-subpart3-section7)
- "Detail expands unnecessary here" (Conforming, part3-subpart3-section8)
- "So much essential information in here all buried away" (Assessing Risk, part3-det4)
- "So much information tucked away" (Assessing Risk, part3-det5, part3-det7)
- "Rehashes information... Much clearer, but buried away in a detail expander" (Assessing Risk, part3-det3)
- "This is accidentally buried under a detail expand when it should be its own detail expand" (Assessing Risk, part3-section3-det1)

*Conformance Process Prominence*:
- "It feels like the process of conforming with the standards is the whole point of even reading the standards in the first place. People are coming to the standards to try and conform with them, or to understand others' conformance with them. It's not clear why this is tucked away." (Conforming, part1-para1)
- Manual review notes: "It feels like the process of conforming with the standards is the whole point"

*Threshold Information Buried*:
- "This is again a really important entry threshold consideration for anyone who is considering whether they want to even bother reading the standards" (Conforming, part3-subpart3-section6-para1)
- "Again, re-conformance and point in time assessment is an important threshold consideration for any entity that might be considering the conformance process" (Conforming, part4)
- "This is a really important point to establish much earlier because they seem exclusive" (Conforming, part3-subpart1-section1-para5)
- "This again seems like a precondition for performing this whole exercise and it should be brought up much earlier" (Assessing Risk, part4-det9)

*Important Guidance Placed Late*:
- "Much clearer and better. Should come up much earlier" (Assessing Risk, part3-para3)
- "Really useful information. Good guidance. Tucked away and could be brought up much more prominently" (Assessing Risk, part3-section3-para1)
- "This would be much better said up front" (Assessing Risk, part4-det8)

**Tom's explicit directive**: "Get rid of all detail expanders" (Manual review notes)

**Implication for Phase 2**:
- Eliminate all detail expander syntax (`+++` ... `+++`)
- Restructure content hierarchy to surface critical information
- Make conformance process prominent, possibly as organizing framework
- Place threshold/precondition information at document beginning

#### Theme C: Standards-Guidance Separation Creates Friction

**Prevalence**: Core issue noted in multiple annotations and manual review

**Pattern**: Each core standard has a separate implementation guide of similar size, creating navigation burden and conceptual separation.

**Tom's observations**:
- "Standards are standards. Guidance is guidance. Does having separate pages really help? It's not for reading, it's for working through." (Manual review)
- "The guidance is really useful for understanding the standards, but is stored separately on another page. There is probably a simple way to indicate when something is guidance and when it's a standard so that they can be included in the same document." (Manual review)
- "Given the guidance is authoritative (ie, referred to by the standards maker, and also linked to in the standards themselves, there is not really any reason why they can't be stored together.)" (Manual review)

**Supporting evidence from semantic searches (Stage 1)**:
- Implementation guidance often scored higher than standards in user queries
- Example: "information assurance frameworks" query - implementing guide scored 0.911 vs standard 0.871

**Document pair analysis**:
1. Authentication Assurance Standard (328 nodes) + Implementing guide (280 nodes) = 608 nodes split across 2 documents
2. Federation Assurance Standard (431 nodes) + Implementing guide (438 nodes) = 869 nodes split across 2 documents
3. Information Assurance Standard (169 nodes) + Implementing guide (214 nodes) = 383 nodes split across 2 documents
4. Binding Assurance Standard (161 nodes) + Implementing guide (158 nodes) = 319 nodes split across 2 documents

**Total**: 2,179 nodes across 8 documents for 4 topics

**User impact**: Users must navigate between documents to understand requirements (standard) and how to meet them (guidance), creating cognitive load and navigation friction.

**Implication for Phase 2**: Integrate standards and guidance while maintaining clear visual/structural distinction between normative requirements and explanatory material.

#### Theme D: Terminology Authority and Function Unclear

**Prevalence**: 6 annotations on terminology page + manual review notes

**Pattern**: Terminology page relies on dictionary definitions rather than asserting authority, lacks examples, unclear function.

**Tom's key annotation**:
- "The function of this page needs some thought. If DIA publishes the standards, DIA can declare the meaning of terms. The meanings should be consistent with other relevant standards and instruments, including the law, but relying on a dictionary definition for such a specialised area is not a useful source of authority. Also, the identification terminology page should theoretically play a role in helping people understand complex terms used in the wider standards and guidance. Therefore, if the definitions provided are opaque or difficult to understand, or don't include examples, then they don't serve that function." (Terminology, h1)

**Specific issues**:
- "Not clear why dictionary definitions are useful in such a specialised area" (Terminology, subpart1-para2-tb1)
- "What's the value of the source if you're expanding it in some way?" (Terminology, subpart1-para2-tb1-tr2-td2-line3)
- "'Undergone a process' by who? On what authority?" (Terminology, para1)
- "Developed by who?" (Terminology, subpart2-para1)
- "What does 'its poor interaction with other words' mean?" (Terminology, subpart2-para2-tb1-tr1-td2-line5)

**Manual review observation**: Only 9 terms (from ~60 total) actually point to external standards (8 to ISO 31073 risk management, 1 to NIST digital identity guidelines)

**Implication for Phase 2**: Reconsider terminology approach - should DIA assert definitional authority? Should terms include plain language explanations and examples? Should terminology support understanding rather than just define?

#### Theme E: DISTF Relationship Self-Defeating

**Prevalence**: Multiple annotations across conformance and federation documents

**Pattern**: Standards distance themselves from DISTF despite DISTF being the only mandatory conformance framework.

**Tom's observations**:
- "The standards are at pains to distinguish themselves from the DISTF framework, but when it comes down to it, the only mandatory use of the standards is the DISTF. That seems self-defeating." (Conforming, part2-para2; also Manual review)
- "What is the point of laying it out like this when there is only one mandated conformance framework" (Conforming, part2-para1)
- "The federation standard guidance references digital credentials quite frequently. It's worth using this as a way in to exploring the overlap or not between these standards and the DISTF digital identity framework." (Implementing Federation, h1)
- "Facilitation mechanisms are exclusively digital, meaning there is a strong overlap with DISTF/digital identity as a concept" (Implementing Federation, part3-para4-ex1)

**Stage 1 finding**: Federation Standard has highest semantic similarity with DISTF legal documents, confirming it's the primary linkage point.

**Implication for Phase 2**: Make DISTF relationship explicit and prominent rather than downplaying it. Federation Standard should be positioned as key connection to legal framework.

#### Theme F: Conformance Should Be Central, Not Peripheral

**Prevalence**: Explicit in multiple annotations, core theme in manual review

**Tom's key statement**:
- "It feels like the process of conforming with the standards is the whole point of even reading the standards in the first place. People are coming to the standards to try and conform with them, or to understand others' conformance with them. It's not clear why this is tucked away." (Conforming, part1-para1)

**Supporting observations**:
- "This should be the way in to understanding the standards and the guidance" (Conforming, part3-subpart1-section1)
- "If the main purpose of the identification standards is to demonstrate conformance, and the main way of demonstrating conformance is to generate evidence and perform assessments, then these checklists should be the most important and predominant way into all of this material, and the material should be directed toward explaining these assets." (Conforming, part3-subpart2-section5)

**Checklists observation**:
- "There are existing checklists here for collating evidence for each standard. These need to be factored in to any design of implementation assets." (Conforming, part3-subpart2-section5-para3)
- Checklists currently exist as separate Word documents rather than integrated markdown

**Stage 1 finding**: Conformance semantic searches returned lower scores (0.85-0.88) than other topics (0.89-0.93), suggesting conformance content is less semantically connected to standards content.

**Implication for Phase 2**:
- Make conformance process more prominent, possibly as organizing framework
- Integrate checklists as markdown rather than separate downloads
- Structure materials to support conformance assessment workflow
- Consider conformance process as primary user journey

#### Theme G: Structure and Formatting Issues

**Prevalence**: Multiple annotations across documents

**Pattern**: Various structural problems that impede scanability and comprehension.

**Specific issues identified**:

*Heading/Formatting*:
- "Should be a heading" (Authentication Standard, part4-subpart2-section9-para1)
- "Formatting is not easy to skim for the reader here - headings or bold better" (Assessing Risk, part1-para2)
- "This would be easier to read as a matrix rather than a table with words" (Conforming, part3-subpart1-section2-tb1)

*Reference Clarity*:
- "When requirements state 'this control' without specifying the identifier, the meaning of the requirement has to be inferred from the position of the paragraph in relation to other paragraphs/the heading. Using specific in-text numbering or DocRef cross referencing would be better." (Authentication Standard, part5-section25-para2)
- "Looking at FA3.01 here in the text, it's not clear if this applies to the standard or to the guidance" (Implementing Federation, part4-subpart2-section45-para1)

*Missing Definitions*:
- "This paragraph and those that immediately follow do not define a facilitation mechanism" (Implementing Federation, part3-para1)
- "This subpart should state very clearly up front who a relying party is and very simply explain the concept" (Overview, part1-subpart1)

*Unclear Language*:
- "'Robustness' is not a useful concept here. It is robust enough, or it's not, so the question is 'Whether it is robust'" (Overview, part1-subpart1-para4-tb1)
- "Why 'usually'? What is the point of mentioning this?" (Conforming, part3-subpart2-section5-para1)
- "What is meant by 'potentially a new concept'? To who? In what context?" (Conforming, part3-subpart2-section5-para4)

*Inconsistent Terminology*:
- "Should be consistent in referring to this as the Federation Assurance Standard not the federation standard" (Implementing Federation, part4-subpart1-section39-para8)

**Implication for Phase 2**: Improve heading hierarchy, use specific cross-references rather than "this control", ensure key concepts are defined early, use precise language.

#### Theme H: Privacy/Identification Relationship Inconsistent

**Prevalence**: Noted in manual review, relates to Information Assurance Standard

**Tom's observation**:
- "In the guidance materials on other pages, a lot is made of the fact that identification and privacy overlap but aren't the same, but then a lot of the standards are explicitly stated to be an application of an information privacy principle. That seems inconsistent and makes things harder than they need to be in terms of comprehension and familiarity." (Manual review)
- Example cited: "This is the application of Information privacy principle 1 of the Privacy Act 2020" ([DocRef](https://docref.digital.govt.nz/nz/identification-management/implementing-the-information-assurance-standard/2024/en/#part2-subpart1-section3-para1))

**Stage 1 finding**: Privacy Act 2020 is largest document in database (3,611 nodes, 38.5% of all nodes), suggesting significant integration.

**Implication for Phase 2**: Clarify relationship between identification standards and Privacy Act - are they an application of privacy principles, or distinct but overlapping concerns?

#### Theme I: Implementation Assets and Structured Tools Needed

**Prevalence**: Multiple annotations on Assessing Risk document

**Pattern**: Recognition that structured implementation tools (workbooks, forms) are valuable and could be enhanced.

**Tom's observations**:
- "This is a good implementation asset to turn into a web form" (Assessing Risk, part3-section3-det1-summary)
- "This would be a great structured exercise to turn into a transparent web form of some kind that tallies up the scores in front of you" (Assessing Risk, part3-section3-det1-summary)
- "These work books are a good indication of the need for structured implementation assets. It is probably worth considering whether this level of structured implementation is a better place to start than describing the process in the abstract" (Assessing Risk, part6-para3)
- "This Excel workbook includes some calculation of scores... the same thing could be a simple rules as code programme" (Assessing Risk, part6-para3-2)

**Implication for Phase 2**: Consider how restructured materials can better support creation of structured implementation tools. Document structure should facilitate rather than hinder tool development.

#### Theme J: Good Practices Identified

**Prevalence**: Several annotations praise effective approaches

**Pattern**: Tom identifies sections that work well and should be models.

**Examples**:
- "Active voice - much clearer" (multiple annotations in Assessing Risk)
- "Really useful information. Good guidance" (Assessing Risk, part3-section3-para1)
- "Useful and important as a design approach for any re-generated or re-drafted standards" (Overview, part2-para1)
- "Good way in to understanding federation, credentials, other aspects of identification" (Implementing Federation, part2-subpart3-para1)
- "In a way, this federation assurance standard is the easiest and most relatable way in to understanding identification as a whole, because most people think of a driver's licence when they think of 'ID'" (Implementing Federation, part2-para2)

**Implication for Phase 2**: Replicate successful patterns - active voice, clear explanations, relatable examples (driver's licence). Federation Standard may be best entry point for users.

### 2. Semantic Neighbor Analysis: Cross-Document Patterns

#### Pattern: "About Identification Management" as Universal Neighbor

**Finding**: Multiple semantic neighbor queries from different documents returned exclusively nodes from "About Identification Management" (60 nodes total) document.

**Queries demonstrating this**:
1. Authentication Assurance Standard (#part4-title) → 10 results, all from "About Identification Management"
2. Federation Assurance Standard (#part5-subpart2) → 10 results, all from "About Identification Management"
3. Implementing Authentication Standard (#part2-subpart1-section3) → 10 results, all from "About Identification Management"

**Content returned**: Repeatedly the same nodes about identification management elements, diagram describing entity-credential-relying party relationships.

**Analysis**:
- "About Identification Management" document (only 60 nodes) has extremely high semantic connectivity (74,870 total connections - highest in database)
- This tiny document serves as conceptual hub explaining foundational concepts
- Standards and guidance documents semantically link to foundational explanations rather than to each other
- Suggests current structure separates conceptual understanding from requirements

**Implication for Phase 2**: Foundational concepts from "About Identification Management" should be integrated into restructured materials rather than separated. Users shouldn't need to navigate to separate document for conceptual understanding.

#### Pattern: Conformance Node Lacks Semantic Neighbors

**Finding**: Semantic neighbor query for conformance section (#part3-subpart1-section1) returned 0 results above 0.75 threshold.

**Analysis**:
- Conformance content is semantically isolated from other materials
- Despite being "the whole point" (per Tom), conformance guidance doesn't connect to standards content semantically
- Confirms Tom's observation that conformance is "tucked away"

**Implication for Phase 2**: Conformance content needs better integration with standards requirements, not just separate guidance document.

#### Pattern: Levels of Assurance Node Lacks Semantic Neighbors

**Finding**: Semantic neighbor query for Levels of Assurance (#part2) returned 0 results above 0.75 threshold.

**Analysis**:
- LoA is crosscutting concept but semantically isolated
- Despite being fundamental framework, LoA content doesn't connect to control specifications
- Suggests LoA is explained abstractly rather than integrated into requirements

**Implication for Phase 2**: LoA framework should be more prominently integrated throughout standards and guidance, not just explained in separate document.

### 3. Structural Pattern Analysis: Cypher Query Results

#### Document Connectivity (Semantic Similarity Relationships)

**Top 20 most connected documents**:

1. About Identification Management: 74,870 connections (extreme outlier despite only 60 nodes)
2. Privacy Act 2020: 28,440 connections (3,611 nodes - external document)
3. DISTF Act: 7,960 connections (935 nodes - external legal framework)
4. Implementing Federation Standard: 4,150 connections (438 nodes)
5. Federation Assurance Standard: 3,160 connections (431 nodes)
6. DISTF Rules: 2,970 connections (348 nodes - external)
7. Implementing Authentication Standard: 2,630 connections (280 nodes)
8. Authentication Assurance Standard: 2,600 connections (328 nodes)
9. Identification Terminology: 2,240 connections (383 nodes)
10. Implementing Information Standard: 1,860 connections (214 nodes)
11. DISTF Regulations: 1,760 connections (242 nodes - external)
12. Assessing Identification Risk: 1,740 connections (225 nodes)
13. Conforming with Standards: 1,690 connections (205 nodes)
14. Counter Fraud Techniques: 1,640 connections (212 nodes)
15. Derived Information: 1,470 connections (177 nodes)
16. Implementing Binding Standard: 1,430 connections (158 nodes)
17. Authenticator Types: 1,300 connections (204 nodes)
18. Information Assurance Standard: 1,300 connections (169 nodes)
19. Authority to Act: 1,260 connections (154 nodes)
20. Binding Assurance Standard: 1,230 connections (161 nodes)

**Key observations**:

*Connectivity Analysis*:
- "About Identification Management" has 1,248 connections per node (74,870 ÷ 60) - extremely dense
- Core standards average 2,073 connections (8,290 ÷ 4 documents = avg 2,073 per doc)
- Implementation guides average 2,568 connections (10,270 ÷ 4 documents = avg 2,568 per doc)
- Implementation guides are MORE connected than standards they implement
- Conformance has only 1,690 connections despite 205 nodes (8.2 per node) - relatively isolated

*External Document Integration*:
- Privacy Act (2nd), DISTF Act (3rd), DISTF Rules (6th), DISTF Regulations (11th) all in top 20
- External legal documents are highly integrated into semantic network
- Supports need to clarify relationship with Privacy Act and DISTF

*Federation Documents Dominance*:
- Federation Standard (5th) and Implementing Federation (4th) are most connected ID management standards
- Confirms Federation Standard is primary interface to DISTF
- Supports Tom's observation that federation is "easiest way in to understanding identification"

#### Virtual Nodes Indicating Structural Gaps

**Finding**: 229 virtual nodes (containers) created to fill missing hierarchy parents.

**Distribution by document** (first 50 results):
- Assessing Identification Risk: 7 virtual containers at level 0
- Authentication Assurance Standard: 8 virtual containers at level 0
- Authenticator Types: 7 virtual containers at level 0
- Authority to Act: 9 virtual containers at level 0
- Binding Assurance Standard: 7 virtual containers at level 0
- Conforming with Standards: 5 virtual containers at level 0
- Counter Fraud Techniques: 14 virtual containers at level 0
- DISTF Regulations: 29 virtual containers at level 0 (highest)

**All virtual nodes have**:
- `content: "[Container: partN]"` or similar
- `level: 0` (root level)
- `isVirtual: true`

**Analysis**:
- Virtual nodes are structural placeholders, not content
- Indicate documents have content nodes but missing structural parent nodes
- Counter Fraud (14) and DISTF Regulations (29) have most structural gaps
- Suggests inconsistent hierarchy across documents

**Implication for Phase 2**: Ensure consistent, complete hierarchical structure in restructured materials.

#### Content Type Distribution

**Key patterns by document type**:

*Core Standards Pattern* (Authentication example):
- text/text: 147 nodes (structural narrative)
- metadata/metadata: 58 nodes (headings, labels)
- text/structural: 50 nodes (requirements organization)
- text/list: 40 nodes (enumerated requirements)
- table/table: 15 nodes (control specifications)
- container/virtual: 8 nodes (hierarchy gaps)

*Guidance Pattern* (Assessing Risk example):
- text/text: 65 nodes (explanatory content)
- text/list: 65 nodes (guidance steps)
- table/table: 40 nodes (assessment matrices)
- metadata/metadata: 22 nodes (labels)
- text/figure: 11 nodes (diagrams)
- text/example: 5 nodes (scenarios)

*Implementation Guide Pattern* (Implementing Authentication example):
- Similar mix to guidance but more examples and practical content

**Observations**:
- Standards are heavy on structural content (requirements, controls, tables)
- Guidance is heavy on lists, examples, and assessment tools
- Both use substantial metadata (headings, labels)
- Supports integration approach - standards provide structure, guidance provides understanding

#### Hierarchy Depth and Complexity

**Documents by maximum hierarchy depth**:

*Most complex* (depth 6-7):
1. Implementing Information Standard: maxDepth 7, avgDepth 3.42, 214 nodes
2. Assessing Identification Risk: maxDepth 6, avgDepth 2.70, 225 nodes
3. Authority to Act: maxDepth 6, avgDepth 2.42, 154 nodes
4. Conforming with Standards: maxDepth 6, avgDepth 3.27, 205 nodes
5. Derived Information: maxDepth 6, avgDepth 2.57, 177 nodes
6. Implementing Authentication: maxDepth 6, avgDepth 2.75, 280 nodes
7. Implementing Binding: maxDepth 6, avgDepth 2.93, 158 nodes
8. Implementing Federation: maxDepth 6, avgDepth 2.89, 438 nodes

*Moderate complexity* (depth 4-5):
- Core standards: depth 4-5, avgDepth 2.2-2.6
- Identification Terminology: depth 5, avgDepth 4.17 (highest average - many nested definitions)
- Federation Standard: depth 5, avgDepth 2.63

*Shallow* (depth 1-3):
- Navigation pages (Guidance, Identification Standards, Superseded): depth 1
- Introductory materials (About, Training): depth 2-3

**Observations**:
- Implementation guides are MORE hierarchically complex than standards they implement
- Conforming and Assessing Risk are among most complex (depth 6)
- Depth 6-7 may be excessive for scanability
- Average depth 2.7-3.4 suggests deep nesting in some sections

**Implication for Phase 2**:
- Review hierarchy depth - is 7 levels necessary?
- Consider flatter structure with better heading hierarchy
- Balance detail with scanability

### 4. Standards vs Guidance Comparison

#### Voice and Tone Differences

**Core Standards Characteristics**:
- Passive voice predominates: "This standard applies to...", "Application of controls will contribute..."
- Third person: "A Relying Party", "A Credential Provider"
- Abstract: "The scope of requirements is related to..."
- Technical: "Authenticators", "Binding Assurance", "Facilitation Mechanisms"
- Regulatory tone: "must", "shall", "requirements"

**Guidance Characteristics** (where active voice used):
- More direct: "You should", "Consider", "Use"
- Examples and scenarios: driver's licence example
- Explanatory: "This means...", "For example..."
- Practical: workbooks, checklists, assessment tools
- Still some passive voice: "is performed", "are provided"

**Inconsistency**: Not all guidance uses active voice - Assessing Risk has mix of passive and active sections.

**Tom's preference**: Clear from annotations that active voice is "much clearer" and preferred.

#### Structure and Content Design Differences

**Standards Structure**:
- Part/Section/Subsection hierarchy
- Controls organized by level of assurance (LoA1, LoA2, LoA3)
- Tables specifying requirements
- Cross-references to other standards
- Objectives stated first, then controls

**Guidance Structure**:
- Explanatory sections
- Step-by-step processes
- Examples and use cases
- Detail expanders (marked for removal)
- Assessment matrices and workbooks
- Checklists for evidence gathering

**Effectiveness**: Stage 1 semantic searches showed guidance often scores higher than standards for user queries, suggesting guidance is more semantically aligned with user information needs.

#### Integration Opportunities Identified

**Current separation creates problems**:
1. User must navigate between documents to understand requirement (standard) and how to meet it (guidance)
2. Guidance is authoritative (referenced by standards) but treated as secondary
3. No clear visual distinction needed - could be in same document with formatting
4. Tom's annotation: "no reason why they can't be stored together"

**Potential integration approaches**:
1. **Interleaved**: Guidance immediately follows relevant standard requirement
2. **Parallel columns**: Standard in one column, guidance in adjacent column
3. **Expandable sections**: Standard visible, guidance optionally visible (but not hidden by default)
4. **Clearly marked sections**: Visual styling to distinguish standard from guidance

**Constraint consideration**: Core standards text cannot be modified, but structure and presentation can be improved. Guidance can be fully rewritten.

### 5. Crosscutting Themes Across Multiple Documents

#### Theme: Threshold Information Placement

**Pattern**: Information users need BEFORE engaging with standards is placed late in documents or buried in expanders.

**Examples**:
- Conformance assessment types (qualified vs audited) - should be upfront decision point
- Whether risk assessment is required - user needs to know before performing exercise
- Re-conformance requirements - affects whether to pursue initial conformance
- Relationship to DISTF - users need to know if DISTF is their context

**Impact**: Users may invest time in wrong approach or miss critical context.

**Implication for Phase 2**: Create "before you start" or "is this relevant to you" sections at beginning of materials.

#### Theme: Circular References Without Clear Entry Point

**Pattern**: Documents reference each other without establishing user journey or entry point.

**Examples**:
- Standards reference guidance, guidance references standards
- Conformance references risk assessment, risk assessment references conformance
- All standards reference terminology, terminology assumes familiarity with standards
- Tom's annotation: "When all pages are split into so many separated documents, you lose control over how people are approaching the information"

**Impact**: Users don't know where to start or how to navigate relationships.

**Implication for Phase 2**:
- Establish clear entry points for different user types
- Create guided pathways through materials
- Consider single cohesive document vs multi-document navigation

#### Theme: Biometrics Emerging as Important but Under-Addressed

**Annotations noting biometrics**:
- "Note importance of biometrics" (Authentication Standard, part5-subpart2-section30)
- "Note importance of biometrics in terms of adding other biometric guidance materials" (Authentication Standard, part5-subpart3-title)
- Privacy Biometrics Code in OtherMaterialsToEvaluate

**Implication for Stage 3**: Biometrics materials should be evaluated for integration.

#### Theme: Examples and Practical Scenarios Valued

**Tom's annotations**:
- Federation as "easiest way in" because people relate to driver's licence
- Praise for example scenarios
- Suggestion to turn assessment matrices into interactive tools

**Pattern**: Concrete examples make abstract requirements comprehensible.

**Implication for Phase 2**: Increase use of examples, particularly relatable ones like driver's licence.

## Supporting Evidence

### Evidence of Systematic Passive Voice

**From Authentication Assurance Standard** (core standard - cannot be modified):
- "This standard applies to any Relying Party (RP)." ([DocRef](https://docref.digital.govt.nz/nz/identification-management/authentication-assurance-standard/2024/en/#part1-para3))
- "Application of the controls in this standard will contribute to..." ([DocRef](https://docref.digital.govt.nz/nz/identification-management/authentication-assurance-standard/2024/en/#part3-para1))
- "The scope of the requirements in this standard is explicitly related to..." ([DocRef](https://docref.digital.govt.nz/nz/identification-management/authentication-assurance-standard/2024/en/#part1-para3))

**From Conforming with Standards** (guidance - can be modified):
- Multiple annotations noting "passive", "very passive", "why so passive"
- Contrast with sections Tom praises: "Active voice - much clearer"

**Scope of issue**: Passive voice is systemic across guidance materials, not just isolated instances.

### Evidence of Content Fragmentation Impact

**Document size comparison**:
- 4 core standards: 1,089 nodes total
- 4 implementation guides: 1,090 nodes total
- Nearly 1:1 ratio of standard to guidance content

**Navigation burden**: User seeking to understand and implement authentication requirements must navigate:
1. Authentication Assurance Standard (328 nodes)
2. Implementing Authentication Assurance Standard (280 nodes)
3. Possibly Authenticator Types guidance (204 nodes)
4. Possibly Conforming with Standards (205 nodes)
5. Possibly Assessing Risk (225 nodes)

**Total**: 1,242 nodes across 5 documents for single topic.

### Evidence of "About Identification Management" as Conceptual Hub

**Connectivity density**:
- About ID Management: 1,248 connections per node
- Next highest (Privacy Act): 7.9 connections per node
- 158x more dense than next highest

**Content**: Explains entity, credential provider, relying party, facilitation mechanisms - foundational concepts.

**Problem**: Foundational concepts separated from requirements, forcing navigation.

### Evidence of Conformance Isolation

**Semantic connectivity**:
- Conforming with Standards: 1,690 total connections, 205 nodes = 8.2 connections per node
- Compare to Federation Standard: 3,160 connections, 431 nodes = 7.3 connections per node
- Similar density but lower absolute connectivity

**Semantic neighbor query**: Conformance section returned 0 results above 0.75 threshold.

**Tom's annotation**: "It feels like the process of conforming with the standards is the whole point"

**Gap**: Most important user goal has weakest semantic integration.

### Evidence of Privacy Act Integration Confusion

**Size**: Privacy Act is 3,611 nodes (38.5% of database).

**Connectivity**: 28,440 connections (2nd highest).

**Explicit reference**: Information Assurance implementation guidance states "This is the application of Information privacy principle 1 of the Privacy Act 2020" ([DocRef](https://docref.digital.govt.nz/nz/identification-management/implementing-the-information-assurance-standard/2024/en/#part2-subpart1-section3-para1))

**Contradiction**: Manual review notes identify tension between "identification and privacy overlap but aren't the same" messaging vs explicit IPP1 application statement.

## Cross-Document Patterns Identified

### Pattern 1: Detail Expanders Hide Essential Information

**Prevalence**: Systematic across guidance documents

**Documents affected**: Conforming, Assessing Risk, possibly others

**Content hidden**:
- Conformance process explanation
- Risk assessment methodology
- Threshold decision information
- Important caveats and context

**Tom's directive**: "Get rid of all detail expanders"

**Phase 2 action**: Remove all `+++` ... `+++` syntax, surface all content with proper heading hierarchy.

### Pattern 2: Implementation Guides Are More Accessible Than Standards

**Evidence**:
- Semantic searches score guidance higher than standards
- Guidance has more examples, scenarios, practical content
- Tom praises federation implementation guide as "easiest way in"
- Guidance documents average higher semantic connectivity per node

**Paradox**: More accessible content treated as secondary to less accessible standards.

**Phase 2 opportunity**: Make guidance prominence match its value to users.

### Pattern 3: Foundational Concepts Separated from Application

**Evidence**:
- "About Identification Management" (60 nodes) is semantic hub but separate document
- Terminology (383 nodes) is separate reference
- Levels of Assurance (90 nodes) is separate framework
- Users must navigate to separate documents for conceptual understanding

**Impact**: Increased cognitive load, navigation burden.

**Phase 2 opportunity**: Integrate foundational concepts into main materials.

### Pattern 4: No Clear User Journey or Entry Points

**Evidence**:
- Tom's annotation: "lose control over how people are approaching the information"
- No "start here" guidance for different user types
- No clear distinction between "read first" and "reference later" materials
- Conformance (likely primary user goal) is "tucked away"

**User types not explicitly addressed**:
- Credential providers seeking conformance
- Assessors evaluating conformance
- Relying parties evaluating credential providers
- Policy makers understanding framework
- Technical implementers building systems

**Phase 2 opportunity**: Create explicit user journeys and entry points.

### Pattern 5: Checklists and Practical Tools Separate from Guidance

**Evidence**:
- 6 Word document checklists for conformance evidence
- Excel workbook for risk assessment
- Tools referenced in guidance but not integrated
- Tom's annotation: "checklists should be the most important and predominant way into all of this material"

**Impact**: Users must download separate files, switch contexts.

**Phase 2 opportunity**: Integrate checklists as markdown, make them navigation/organization framework.

## Opportunities for Improvement

### Immediate Wins (Low Complexity, High Impact)

1. **Eliminate detail expanders**: Remove all `+++` ... `+++` syntax, surface content with heading hierarchy
2. **Systematic active voice conversion**: Rewrite guidance materials (not core standards) in active voice
3. **Surface threshold information**: Move "before you start" information to document beginnings
4. **Consistent terminology usage**: "Federation Assurance Standard" not "federation standard"
5. **Improve cross-references**: Use specific identifiers not "this control"

### Medium Complexity Improvements

6. **Integrate "About Identification Management" concepts**: Weave foundational explanations into main materials
7. **Make conformance prominent**: Structure materials to support conformance workflow
8. **Clarify DISTF relationship**: Explicitly position Federation Standard as linkage to legal framework
9. **Convert checklists to markdown**: Integrate assessment tools into main materials
10. **Add examples and scenarios**: Particularly relatable ones like driver's licence
11. **Improve heading hierarchy**: Ensure scanability, avoid excessive depth (>5 levels)

### Complex Restructuring Needs

12. **Integrate standards and guidance**: Present requirements and implementation guidance together with clear visual distinction
13. **Flatten hierarchy where excessive**: Review depth 6-7 sections for consolidation opportunities
14. **Create user journey pathways**: Explicit entry points for credential providers, assessors, relying parties
15. **Clarify Privacy Act relationship**: Resolve tension between "overlap but distinct" and "application of IPP1" messaging
16. **Establish terminology authority**: Move from dictionary definitions to DIA-asserted definitions with examples
17. **Organize by conformance process**: Consider making conformance workflow primary organizational framework

## Decisions Made

### Annotation Sampling Approach

**Decision**: Examined 6 of 23 annotation files in detail (26% sample) plus comprehensive review of TomNotesManualReview.md.

**Rationale**:
- Conforming (24 annotations) and Assessing Risk (27 annotations) are most heavily annotated - capture majority of detailed feedback
- Core standards (Authentication, Binding, Federation, Information) have fewer annotations due to non-modification constraint
- Guidance documents show more variation in feedback
- Sample covers standards, implementation guides, guidance, and overview materials
- Manual review notes provide synthesis across all documents

**Confidence**: High - patterns are consistent across sampled documents and align with manual review synthesis.

### Semantic Neighbor Query Strategy

**Decision**: Targeted queries on representative nodes from different document types rather than exhaustive coverage.

**Rationale**:
- Stage 1 already conducted broad semantic searches on topics
- Stage 2 focus is on cross-document patterns, not comprehensive content mapping
- Queries revealed important patterns ("About ID" as hub, conformance/LoA isolation)
- Diminishing returns from additional queries given clear patterns

**Confidence**: Sufficient for pattern identification, though Phase 2 content retrieval will require more comprehensive queries.

### Cypher Query Prioritization

**Decision**: Focused on connectivity, virtual nodes, content types, and hierarchy depth rather than other possible analyses.

**Rationale**:
- These queries directly address Tom's concerns (fragmentation, structure, complexity)
- Virtual nodes indicate hierarchy gaps relevant to Phase 2 structure design
- Connectivity analysis reveals integration opportunities
- Other analyses (e.g., tag distribution, embedding coverage) less relevant to restructuring goals

## Questions and Uncertainties

### Question 1: Electronic Identity Verification Act Integration

**Context**: Tom asks in manual review: "Should we add the electronic identity verification act and regulations?"

**Observations**:
- EIVA only referenced in Federation Standard (per Tom's notes)
- EIVA doesn't acknowledge existence of identification standards
- Unclear if identification standards conformance satisfies EIVA obligations

**Resolution**: Stage 3 (Other Materials Evaluation) will specifically address EIVA relevance.

### Question 2: Extent of Standards-Guidance Integration

**Options identified**:
1. Same document with visual styling to distinguish standard from guidance
2. Interleaved content (guidance after each requirement)
3. Parallel presentation (columns or side-by-side)
4. Separate but tightly linked (improved navigation)

**Constraint**: Core standards text cannot be modified, only structure/presentation.

**Resolution**: Stage 6 (Final Recommendations) will propose specific integration approach after evaluating against AI guidance principles in Stage 5.

### Question 3: Conformance as Organizing Framework

**Tom's suggestion**: Conformance should be central, possibly organizational framework.

**Uncertainty**:
- Would organizing entire resource around conformance workflow overwhelm users who just want to understand concepts?
- How to balance conformance-focused organization with other user needs?
- Should there be multiple pathways (conformance-focused, concept-focused, reference-focused)?

**Resolution**: Stage 4 (Thematic Synthesis) and Stage 5 (AI Guidance Evaluation) will inform this decision.

### Question 4: Single Document vs Multi-Document Structure

**Tom's observation**: "It would be better in some ways to have a single document that is well structured"

**Trade-offs**:
- Single document: Better control of user journey, easier to maintain consistency, harder to navigate
- Multi-document: Modular, easier to update independently, harder to ensure coherent journey

**Considerations**:
- markdown processing capabilities (how large can single file be?)
- DocRef system requirements (does it expect separate documents?)
- User navigation patterns (search vs browse)

**Resolution**: Stage 6 (Structure Proposal) will address this based on technical constraints and user needs.

### Question 5: Terminology Approach - Authority vs. Alignment

**Tom's critique**: Dictionary definitions insufficient for specialized area.

**Options**:
1. DIA asserts definitional authority
2. Align with DISTF legal definitions
3. Align with international standards (ISO, NIST)
4. Use plain language with examples rather than formal definitions
5. Hybrid approach - legal definition + plain language explanation + examples

**Uncertainty**: What approach best serves users while maintaining necessary precision?

**Resolution**: Stage 4 will synthesize terminology needs, Stage 5 will evaluate against content design principles.

### Question 6: Audience Segmentation Strategy

**Identified audiences**:
- Credential providers seeking conformance (primary?)
- Assessors performing conformance assessment
- Relying parties evaluating credential providers
- Policy makers understanding framework
- Technical implementers building systems

**Uncertainty**:
- Should restructured materials have explicit audience pathways?
- Can single structure serve all audiences or need multiple views?
- Which audience should be prioritized if trade-offs needed?

**Resolution**: Stage 4 (Thematic Synthesis) will develop audience analysis.

### Question 7: Biometrics Guidance Integration

**Context**: Multiple annotations note "importance of biometrics", Privacy Biometrics Code in OtherMaterialsToEvaluate.

**Uncertainty**:
- Is biometrics guidance essential for identification management?
- Should Privacy Commissioner's Biometrics Code be integrated or referenced?
- Is current authentication standard biometrics coverage sufficient?

**Resolution**: Stage 3 (Other Materials Evaluation) will assess biometrics materials.

### Question 8: Level of Assurance Prominence

**Current state**: LoA is foundational concept but separate 90-node document with no semantic neighbors above 0.75 threshold.

**Uncertainty**:
- Should LoA be taught earlier and more prominently?
- Should controls be organized by LoA level rather than current organization?
- Should LoA expression be scaffolded through examples before abstract framework?

**Resolution**: Stage 4 will assess LoA role in information architecture, Stage 6 will propose specific treatment.

### Question 9: Implementation Tools Strategy

**Tom's suggestions**: Web forms, rules as code, interactive assessment tools.

**Uncertainty**:
- Are interactive tools in scope for this project?
- Or should restructured materials simply better support tool development?
- What role do Excel workbooks play (familiar to orgs, offline, but not automated)?

**Resolution**: Likely out of scope for Phase 1-2, but restructured content should facilitate tool development.

### Question 10: Historical Material Handling

**Documents**:
- Resource Material Evidence of Identity Standard (2021, 32 nodes)
- Superseded Standards (2021, 20 nodes)

**Uncertainty**:
- Include in restructured resource or archive separately?
- Is there value in showing evolution of standards?
- Do some organizations still reference superseded versions?

**Resolution**: Stage 6 will address based on stakeholder needs.

## Next Steps for Stage 3

Based on Stage 2 cross-document pattern analysis, Stage 3 (Evaluation of Other Materials Relevance) should focus on:

### 1. Electronic Identity Verification Act and Regulations Priority

**Rationale**:
- Tom explicitly asks about EIVA integration
- EIVA is referenced in Federation Standard
- Federation Standard is primary DISTF linkage point
- Unclear if standards conformance satisfies EIVA obligations

**Questions to answer**:
- Can users meaningfully engage in identification management practice without knowing EIVA requirements?
- Does conformance with identification standards satisfy EIVA obligations?
- Should EIVA be integrated into restructured materials or referenced externally?
- How does EIVA relate to DISTF framework?

### 2. Privacy Biometrics Code Evaluation

**Rationale**:
- Multiple annotations note biometrics importance
- Authentication Standard covers biometrics but may be insufficient
- Privacy implications are significant

**Questions to answer**:
- Does Privacy Biometrics Code fill gaps in current standards?
- Should biometrics guidance be expanded in restructured materials?
- Is Privacy Commissioner's code essential reading for identification practitioners?

### 3. Content Design Guidance Relevance

**Rationale**:
- 40+ documents on content design best practices available
- Stage 2 identified systematic voice, structure, and navigation issues
- Phase 2 will require content design decisions

**Questions to answer**:
- Which content design principles are most relevant to technical standards documentation?
- Can content design guidance inform standards-guidance integration approach?
- Should plain language principles be applied to guidance materials?

### 4. 10 Minimum Cybersecurity Standards Assessment

**Rationale**:
- Tom notes standards link to security standards "without specificity"
- Security and identification overlap significantly
- NCSC guidance is authoritative for NZ public service

**Questions to answer**:
- Are cybersecurity standards essential for identification management conformance?
- Should specific NCSC standards be referenced or integrated?
- Is current "link without specificity" approach insufficient?

### 5. Consideration of Integration vs. Reference Trade-offs

**For each material set, evaluate**:
- **Integrate directly**: Content becomes part of restructured standards
- **Reference as dependency**: Acknowledged as prerequisite reading
- **Extract key content**: Incorporate essential elements, reference rest
- **Leave external**: Acknowledge but don't integrate

**Criteria**:
- Can user demonstrate conformance without this material?
- Does this material fill gaps in current standards?
- Would integration improve or complicate restructured materials?
- Is this material stable or frequently updated (integration feasibility)?

### 6. Alignment with Four Core Standards Constraint

**Critical consideration**:
- Some external materials may have requirements that would necessitate changes to core standards text
- But core standards text cannot be modified
- Identify any conflicts or tensions

### 7. Output for Stage 3

Create `/WorkingFolder/03_other_materials_evaluation.md` with:
- Assessment of each material set's relevance
- Integration recommendations (integrate, reference, extract, or leave external)
- Gaps identified in current standards that materials could fill
- Conflicts or tensions with existing standards
- Priority order for materials that should be integrated
- Implications for Phase 2 content retrieval and structure

This will set up Stage 4 (Thematic Synthesis) to develop comprehensive improvement recommendations based on both internal patterns (Stage 2) and external materials assessment (Stage 3).