About MCP Servers
About DocRef MCP Servers
This page explains how DocRef datasets are made available to AI systems through MCP servers, enabling a technique called GraphRAG (Graph Retrieval-Augmented Generation).
What is an MCP Server?
MCP (Model Context Protocol) is a standardized protocol developed by Anthropic that allows AI systems to connect to external data sources through dedicated servers. MCP servers act as intermediaries between the AI and specialized databases or APIs.
How it works:
- The AI sends a structured query to the MCP server
- The server translates the query into database operations (searches, traversals)
- The server retrieves results from the database
- Results are formatted and returned to the AI with proper citations
The key benefit is that AI systems can access far more information than fits in their context window, retrieving only what's relevant to each specific query.
What is GraphRAG?
GraphRAG combines two components:
- Knowledge Graph: A database structure (typically Neo4j) that stores documents as connected nodes with relationships, properties, and metadata tags
- Retrieval-Augmented Generation (RAG): An AI technique that retrieves relevant information from external sources before generating content
In DocRef's implementation:
- Documents are broken into individual elements (paragraphs, sections, definitions, tables)
- Each element becomes a node in a graph database
- Nodes are connected by relationships: CHILD_OF (hierarchical structure) and SEMANTIC_SIMILARITY (meaning-based connections)
- Vector embeddings enable semantic search - finding content by meaning, not just keywords
MCP Servers Used in This Project
identification-management-standards
This MCP server provides access to New Zealand's identification management standards.
Database Statistics:
- 9,374 DocumentNode entities across 30 documents
- 7,454 nodes with embeddings (79.5% coverage using 768-dimensional vectors)
- Relationships:
- CHILD_OF: 10,208 hierarchical parent-child links
- SEMANTIC_SIMILARITY: 89,735 precomputed semantic similarity connections (K=10 neighbors per node)
Content includes:
- 4 core identification standards (Federation, Information, Authentication, Binding)
- 4 implementation guides (one per standard)
- Supporting materials: terminology, risk assessment, conformance guidance, counter-fraud techniques
- Related legislation: Privacy Act 2020, Digital Identity Services Trust Framework Act
Available tools:
| Tool | Purpose |
|---|---|
semantic_search |
Natural language queries for content discovery |
find_semantic_neighbors |
Fast traversal of precomputed semantic similarity links |
search_by_document |
Retrieve all nodes from a specific document |
get_hierarchical_context |
Navigate parent/child/sibling relationships |
run_cypher_query |
Custom Neo4j queries for advanced analysis |
get_document_stats |
Metadata and collection statistics |
get_schema |
Node properties and available relationships |
How it was used: This server was the primary source for content retrieval when generating the consolidated identification management standard. AI agents queried the server to retrieve relevant content with full citations, which were then synthesized into the output document.
generative-ai-guidance-gcdo
This MCP server provides access to the Government Chief Digital Office's guidance for responsible AI use in the public service.
Database Statistics:
- 1,393 DocumentNode entities across 23 documents
- 1,027 nodes with embeddings (73.7% coverage using 768-dimensional vectors)
- Relationships:
- CHILD_OF: 1,212 hierarchical links
- SEMANTIC_SIMILARITY: 10,215 connections (K=10, threshold=0.7)
Content includes:
- Public Service AI Framework (foundational principles)
- Responsible AI Guidance for GenAI
- Topic-specific guidance: governance, security, privacy, transparency, bias/discrimination, accessibility
- Implementation guidance: procurement, skills/capabilities
- Supporting materials: glossary of AI terms, cloud jurisdictional risk guidance
How it was used: This server was used for evaluation, not generation. It allowed validation of AI-generated content against government AI principles.
How MCP Servers Enable AI-Assisted Drafting
The combination of DocRef structured data and MCP servers creates a powerful workflow:
Source Documents
↓
[DocRef]
Structure + Citations + Embeddings
↓
[MCP Server]
Semantic Search + Graph Traversal
↓
[AI System]
Query → Retrieve → Generate
↓
Output with Citations
↓
[Human Review]
Click citations to verify
Key capabilities enabled:
-
Overcome context limitations: Source documents often exceed AI context windows (e.g., 9,374 nodes across 30 documents). MCP servers allow selective retrieval of only relevant content.
-
Preserve citations: DocRef URLs travel with the content through every step. AI outputs include clickable links to exact source locations.
-
Semantic understanding: Vector embeddings enable finding content by meaning. A search for "authentication requirements" returns relevant content even if it uses different words.
-
Structural navigation: Graph relationships allow traversing from a specific paragraph to its parent section, sibling paragraphs, or related content across documents.
-
Systematic research: AI agents can execute comprehensive searches, save results, then synthesize - ensuring no relevant content is missed.
Question-and-Answer Capabilities
MCP servers also enable human-AI interaction for document understanding:
- Human asks question: "What are the requirements for biometric authentication?"
- AI queries MCP server: Semantic search across all documents
- AI provides answer: Natural language response with pinpoint citations
- Human verifies: Click DocRef links to see exact source text
This creates an "explainable knowledge base" where AI answers can always be traced back to authoritative source documents.
Technical Implementation
For those interested in the technical details:
- Database: Neo4j graph database
- Embeddings: 768-dimensional vectors (Nomic AI embeddings)
- MCP Protocol: Standardized JSON-RPC interface
- Access: Via
npx mcp-remotecommand or direct API calls - Context management: File-based external memory for systematic research
For more details on the API Standard project's implementation, see the Methodology and Technology Overview.
Learn More
- For an introduction to DocRef itself, see About DocRef
- For transparency materials on how these servers were used, see the individual project pages: API Standard, Identification Management, AI Guidance