DocRef

What is it?

DocRef is a digital regulatory infrastructure system created by Syncopate Lab. DocRef converts regulatory text into structured, machine-readable data with precise, granular citations. This allows users to work with documents as datasets, and maintain traceable links between downstream digital implementation systems and upstream natural language documents.

What Problem Does DocRef Solve?

When regulation is published as PDF or HTML on a website, a lot of irrelevant material is introduced (navigation, footers, formatting artifacts). The actual content that matters gets lost, and can be overlooked by AI systems. Traditional documents also don't provide technical tools to precisely cite or reference specific paragraphs, sub-paragraphs, or table cells.

DocRef solves these problems by:

  1. Converting documents into clean, structured data, separating presentation from content
  2. Providing a unique, permanent URL for every document element (paragraphs, bullet points, table cells, definitions) which serves as a foundation layer for other documents or for code and digital systems

Importantly, we also emphasise the use of portable widely available formats, as well as easy methods for writing and converting documents that minimise the need for custom processes.

Key Capabilities

1. Machine-Readable Format

Documents on DocRef are published in a structured format where every element is individually addressable. This enables:

  • Block-by-block review by human experts
  • Sets of annotations and tags to be stored as their own structured datasets
  • Integration into automated systems and AI workflows

2. Granular Citation System

DocRef's most distinctive feature is its precise and customisable citation system. Every part of a document gets a unique URL, which is easy to read for humans and machines:

  • Specific paragraphs: https://docref.digital.govt.nz/.../en/#part1-para3
  • Sub-paragraphs: https://docref.digital.govt.nz/.../en/#part2-subpart1-section2-para1
  • Table cells: https://docref.digital.govt.nz/.../en/#part3-tb1-tr2-td1

These links travel with the data at every step in any process, providing full traceability. They also include version numbers, allowing quick assessment of whether a downstream system is relying on documents which are out-of-date.

3. Annotation and Tagging System

Users can manually add notes and tags to key parts of documents:

  • Mark paragraphs as imposing "mandatory requirements"
  • Tag content as "best practice" or "definition"
  • Note where expectations are stated using "should" vs "must"
  • Add custom tags relevant to specific projects

The annotation system produces its own datasets, which can also be integrated into downstream systems. We are also exploring automated tagging and annotation systems.

4. Structural Modeling

Because of its pinpoint hierarchical identifier system, DocRef models the structure of documents, tracking:

  • Cross-references between paragraphs
  • Parent-child relationships between sections, subsections, and paragraphs
  • Links between separate parts of multi-part documents
  • Connections between bullet points in lists

This gives AI systems a way of understanding document structure and relationships, rather than treating everything as a "big blob of text."

5. Version control and comparison

The DocRef identifier system, as well as the practice of working with documents as datasets, means we can show two documents side-by-side and illustrate the changes between them. These changes are available as their own dataset – a programmatic change-log.

Unlike a "tracked changes" system in typical document systems, a list of changes can be exported and used to quickly identify which downstream systems require updates when an upstream document has changed.

How DocRef Works in Practice

The general process for using DocRef with AI systems:

  1. Collate a document collection - Define the scope of your digital regulatory system, pulling together documents from diverse sources and preparing them for establishment as a reusable infrastructure layer
  2. Convert documents to structured data - Transform regulation from PDF/DOCX/HTML/Markdown into precise structured data with metadata, hierarchies, and element-level citations, or use our editor to write your document for DocRef from the start
  3. Add annotations and tags - Manually enrich key parts with notes identifying requirements, definitions, best practices, and other categories
  4. Model document structure - Track cross-references, hierarchies, and relationships between elements
  5. Add semantic search capabilities - Generate vector embeddings for each element to enable meaning-based search
  6. Incorporate document references into downstream datasets and code-bases - design and deploy rules as code systems, or create document archives that are searchable and maintainable in multiple formats
  7. Deploy as MCP server - Make the enriched dataset available to AI systems through a Model Context Protocol server (see About MCP Servers)
  8. AI queries and generates - AI systems search the dataset, retrieve relevant content, and produce outputs with precise citations
  9. Human review with traceability - Reviewers can click any citation to verify against the original source

Why This Matters

For Regulatory Drafting

The datasets and tooling created for one project are fully reusable. The same system can power:

  • A search tool for finding relevant requirements
  • A publishing system for structured documents
  • A question-and-answer chatbot with pinpoint citations
  • AI-assisted drafting of new regulatory instruments

For Compliance

Once regulatory documents are on DocRef, AI systems can:

  • Evaluate code, datasets, or documents against requirements
  • Provide opinions on compliance with specific provisions
  • Generate structured checklists and navigation systems
  • Produce "rules as code" compliance assets

For Transparency

Every AI-generated output can include DocRef citations. If you see (DocRef) or ([DocRef](https://docref.digital.govt.nz/...)) in a document, you can click it to see exactly which source material a human or machine author has purported to rely upon.

DocRef Sites in This Project

Learn More

Contact

To learn more about DocRef or discuss licensing and re-use: