Raw Data

This file contains raw search retrieval results or agent logs. The content below shows the original markdown source.

---
layout: raw-data.njk
---
# Project Summary: NZ API Standard Creation

## Technology Used

### GraphRAG (Graph Retrieval-Augmented Generation)

GraphRAG combines two components:

1. **Knowledge Graph**: A database structure (Neo4j) that stores documents as connected nodes with relationships, properties, and metadata tags
2. **Retrieval-Augmented Generation**: AI querying technique that retrieves relevant information from the graph before generating content

In this project, the NZ API Guidelines were stored as 5,612 interconnected nodes in a Neo4j graph database. Each node contained document content, metadata tags (like "must-or-shall", "definition", "good-practice"), and DocRef citation links. The graph structure preserved the hierarchical relationships between sections, subsections, and paragraphs.

GraphRAG enabled semantic search across this graph - asking questions in natural language and retrieving relevant nodes with their citations intact. The system used vector embeddings (Nomic AI embeddings) to find content by meaning rather than just keywords.

### MCP Server (Model Context Protocol)

MCP (Model Context Protocol) is a standardized protocol that allows AI systems to connect to external data sources through dedicated servers. MCP servers act as intermediaries between the AI and specialized databases or APIs.

**How it works:**
1. The AI sends a structured query to the MCP server
2. The server translates the query into database operations (Cypher queries, semantic searches)
3. The server retrieves results from the database
4. Results are formatted and returned to the AI with proper citations

**This project's setup:**
- **Server name**: `docref-graphrag`
- **Connection**: Remote MCP server at `https://api-standard-re-draft.vercel.app/mcp`
- **Access method**: Via `npx mcp-remote` command
- **Database backend**: Neo4j graph database with 5,612 document nodes
- **Search capabilities**: 6 semantic search functions plus direct Cypher query access

The MCP server provided pre-approved tools for schema inspection, semantic search (filtered, balanced, cross-document), and context-aware searches without requiring user confirmation for each query.

## How the Work Was Performed

### The Problem

The source material consisted of 5,612 document nodes across three parts of the NZ API Guidelines (Parts A, B, C). While the full content (~176,000 tokens) could fit within the 200,000 token context window, a file-based approach was chosen to enable systematic research organization and efficient context management.

### The Solution

A file-based context management strategy combined with GraphRAG queries:

1. **External Memory**: Use the file system as external memory by saving search results to organized markdown files
2. **Systematic Research**: Execute all searches upfront and save results locally before drafting
3. **Selective Loading**: Read only relevant research files during drafting, never loading all research at once
4. **Citation Preservation**: Maintain DocRef citations throughout the process for traceability

### The Execution Process

**Phase 1: Planning and Configuration (13:05:50)**
- Created execution plan documenting the context management strategy
- Configured MCP server connection to docref-graphrag database
- Documented LLM instructions for the project
- Files created: `.mcp.json`, `EXECUTION_PLAN.md`, `LLM_INSTRUCTIONS_NZ_API_STANDARD.md`

**Phase 2: Research (13:05:50 - 13:41:21, 35 minutes)**
- Executed 47 semantic searches across 8 topical areas
- Organized results into research files by topic: design, development, security, deployment, operations, definitions, patterns, good practices
- Captured approximately 1,073 search results with full content and DocRef citations
- Files created: `CLAUDE.md`, 8 research files in `research/` directory

**Phase 3: Drafting (13:41:21 - 14:16:22, 35 minutes)**
- Read research files selectively during drafting
- Generated main standard document organized by API lifecycle phases
- Applied requirement elevation criteria (SHOULD → MUST where justified)
- Created project completion summary
- Output: Version 1.0 (13,805 words, 232 citations)
- Files created: `nz-api-standard.md`, `Completion Report and Deliverables.md`

**Phase 4: Enhancement (14:16:22 - 14:32:15, 16 minutes)**
- Executed additional search for external reference links (`links-out-external` tag)
- Expanded Appendix E with 50+ complete URLs
- Updated completion summary
- Output: Version 1.1 (14,657 words, 280 citations)

**Phase 5: Formatting (14:32:15 - 15:04:32, 32 minutes)**
- Created Strucdown format variant
- Converted bullet styles and adjusted citation formatting
- Files created/modified: `strucdown-nz-api-standard.md`

### Tools and Technologies

- **MCP Server**: docref-graphrag (remote server via npx)
- **Database**: Neo4j graph database with vector embeddings
- **Embeddings**: Nomic AI embeddings (nomic-embed-text-v1.5)
- **Search Methods**: Semantic search, filtered search, cross-document search, Cypher queries
- **AI System**: Claude Code with 200,000 token context window
- **Context Management**: File-based external memory system
- **Version Control**: Git (6 commits tracking progression)

## Project Files

### Configuration

**`.mcp.json`**
- Purpose: MCP server configuration
- Content: Defines connection to docref-graphrag remote server
- Usage: Enables AI to query the NZ API Guidelines database through standardized protocol

### Planning and Instructions

**`LLM_INSTRUCTIONS_NZ_API_STANDARD.md`**
- Purpose: Original project requirements and specifications
- Content: Search strategy, output format requirements, content organization instructions, requirement elevation criteria
- Created: Initial commit (13:05:50)

**`EXECUTION_PLAN.md`**
- Purpose: Context management strategy documentation
- Content: Three-phase approach, file organization structure, research file specifications, token budget management plan
- Created: Initial commit (13:05:50)
- Key contribution: Solved the context window problem by defining external memory strategy

**`CLAUDE.md`**
- Purpose: Project-specific guidance for Claude Code
- Content: Database structure overview, MCP server configuration, essential content tags, project workflow phases, search strategy best practices, quality checklist
- Created: Research phase commit (13:41:21)
- Function: Provides context for AI working in this repository

### Research Files

**`research/01_design.md`**
- Purpose: Design phase research results
- Content: 5 searches covering design principles, co-design, consumer focus, business process analysis (142 results)
- Topics: Design-driven development, API usability, dogfooding, granularity

**`research/02_development.md`**
- Purpose: Development phase research results
- Content: 7 searches covering REST APIs, HTTP methods, headers, error handling, testing, data validation (153 results)
- Topics: HTTP verbs, Content-Type headers, error response patterns, API testing requirements

**`research/03_security.md`**
- Purpose: Security phase research results
- Content: 7 searches covering OAuth 2.0, authentication, TLS, threat protection (158 results)
- Topics: OAuth grant types, OpenID Connect, API keys, security frameworks, 8 authentication patterns

**`research/04_deployment.md`**
- Purpose: Deployment phase research results
- Content: 5 searches covering versioning, API gateways, documentation, releases (124 results)
- Topics: Header-based vs URL-based versioning, gateway capabilities, catalogue requirements

**`research/05_operations.md`**
- Purpose: Operations phase research results
- Content: 7 searches covering monitoring, performance, lifecycle management, governance (190 results)
- Topics: Analytics, SLAs, API Manager functions, deprecation processes

**`research/06_definitions.md`**
- Purpose: Terminology definitions
- Content: 3 searches retrieving all definition-tagged content (107 results)
- Topics: 40 essential API terms with standardized definitions

**`research/07_patterns.md`**
- Purpose: Authentication patterns
- Content: 3 searches for authentication and authorization patterns (82 results)
- Topics: 8 documented patterns from internal use to CIBA flows

**`research/08_good_practices.md`**
- Purpose: Best practice guidance
- Content: 6 searches for good-practice tagged content by topic (117 results)
- Topics: Elevation candidates (SHOULD → MUST), practices across lifecycle phases

### Data Quality Note: Search Count Discrepancy

**Observed discrepancy**: The individual research file headers report a total of 43 searches (5+7+7+5+7+3+3+6), but verification via counting actual `## Search` markers in the files shows 47 searches total (5+7+7+5+8+4+4+7).

**Explanation**: The research files were generated with claimed totals in their headers that do not match the actual number of search sections documented within each file. Specifically:
- `05_operations.md`: Header says 7, contains 8 `## Search` markers
- `06_definitions.md`: Header says 3, contains 4 `## Search` markers
- `07_patterns.md`: Header says 3, contains 4 `## Search` markers
- `08_good_practices.md`: Header says 6, contains 7 `## Search` markers

The research files retain their original headers as historical artifacts. Summary documents (Methodology and Technology Overview.md, Timeline and Metrics Report.md, Completion Report and Deliverables.md) now correctly report 47 total searches based on actual file content verification.

### Primary Output Documents

**`nz-api-standard.md`**
- Purpose: Main deliverable - The New Zealand API Standard
- Content: 14,657-word technical standard organized by API lifecycle phases
- Structure: 7 main sections (Design, Development, Security, Deployment, Operations) + 5 appendices
- Citations: 280 DocRef citations providing traceability to source material
- Versions: v1.0 (13,805 words, 232 citations), v1.1 (+852 words, +48 citations)
- Created: Drafting phase (14:16:22), enhanced (14:32:15), formatted (15:04:32)

**`strucdown-nz-api-standard.md`**
- Purpose: Strucdown format variant of the standard
- Content: Same content as main standard, formatted for Strucdown conversion
- Format differences: Bullet style adjustments, inline DocRef citation formatting
- Created: Formatting phase (14:49:24)

### Summary and Reporting Documents

**`Completion Report and Deliverables.md`**
- Purpose: Detailed project completion documentation
- Content: Executive summary, objectives, methodology, deliverables breakdown, key achievements, challenges overcome, lessons learned, version 1.1 enhancements
- Statistics: Document structure, quality metrics, execution metrics, content metrics
- Created: Drafting phase (14:16:22), updated during enhancement (14:32:15)
- Length: 969 lines

**`Timeline and Metrics Report.md`**
- Purpose: Factual timeline of work performed
- Content: Commit-by-commit timeline with timestamps, elapsed times, activities, work period analysis, output metrics
- Format: Tables with quantitative data, no promotional language
- Created: Post-project documentation
- Total duration tracked: 1 hour 59 minutes (13:05:50 - 15:04:32)

**`Methodology and Technology Overview.md`** (this document)
- Purpose: Methodology documentation and file context reference
- Content: GraphRAG and MCP Server explanations, execution process description, file-by-file documentation
- Created: Post-project documentation

## Key Metrics

- **Source material**: 5,612 document nodes across 3 parts
- **Searches executed**: 47 semantic searches
- **Research results captured**: ~1,073 findings
- **Research files created**: 8 organized markdown files
- **Output word count**: 14,657 words
- **DocRef citations**: 280
- **Total files created**: 11+ markdown files
- **Total duration**: 1 hour 59 minutes
- **Peak context usage**: 152,929 / 200,000 tokens (76%) (claimed, unverified)
- **Git commits**: 6 commits tracking progression

## Context Management Outcome

The file-based context management strategy enabled efficient organization and retrieval throughout the project. By storing research results externally and loading files selectively during drafting, the system maintained organized access to source material while preserving full citation traceability. The approach enabled systematic research and produced a comprehensive technical standard without data loss.