Open Causal JSON Data Standard

Overview

The Open Causal platform proposes a standardized JSON format for sharing and exchanging causal graphs of all types, including directed acyclic graphs (DAGs), cyclic causal models, and other causal structures. This data standard enables interoperability across different tools, platforms, and research workflows, making causal knowledge more accessible and machine-readable.

Purpose

The JSON standard serves several key purposes:

  • Interoperability: Facilitate seamless exchange of causal graphs between different software tools, platforms, and research workflows.
  • Flexibility: Support diverse graph types including DAGs, cyclic causal models, undirected graphs, and mixed structures.
  • Reproducibility: Enable precise documentation of causal assumptions in a machine-readable format with full metadata.
  • Accessibility: Provide a human-readable yet structured format that is easy to parse, generate, and validate.
  • Rich Semantics: Capture detailed information about variables, causal roles, measurement details, and temporal relationships.
  • Standardization: Promote consistent representation of causal knowledge across disciplines and research contexts.

Schema Specification (version 0.1)

The complete JSON schema for specifying causal graphs includes metadata, node definitions, and edge definitions. Below is the full structure with all possible fields:

Complete JSON Structure


{
  "graph": {
    "id": "unique-identifier",
    "date": "2024-01-01T00:00:00Z",
    "title": "Graph Title",
    "description": "A description of the causal graph",
    "creator": {
      "name": "Creator Name",
      "orcid": "0000-0000-0000-0000"
    },
    "contributors": [
      {
        "name": "Contributor Name",
        "orcid": "0000-0000-0000-0001",
        "role": "author"
      }
    ],
    "licence": "cc-by-4.0",
    "type": "dag",
    "cyclicity": "acyclic",
    "related_publication": [
      {
        "doi": "10.1234/example",
        "title": "Related Publication Title",
        "year": 2024
      }
    ],
    "statistical_unit": "individual"
  },
  "nodes": [
    {
      "id": "node1",
      "name": "Treatment",
      "description": "The treatment or exposure variable",
      "position": {"x": 100, "y": 150},
      "causal_role": "exposure",
      "variable_type": "continuous",
      "time_point": "baseline",
      "structural_model": "linear",
      "ontology_ids": ["ONTOLOGY:12345"]
    },
    {
      "id": "node2",
      "name": "Outcome",
      "description": "The primary outcome",
      "position": {"x": 300, "y": 150},
      "causal_role": "outcome",
      "variable_type": "binary"
    }
  ],
  "edges": [
    {
      "source": "node1",
      "target": "node2",
      "type": "directed",
      "position": {"curve": 0.5},
      "sign": "positive"
    }
  ]
}

Field Definitions

Graph Object

Field Type Required Description
id string Yes Unique identifier for the graph. Should be URL-safe and immutable.
date ISO 8601 datetime Yes Creation or last modified date of the graph in ISO 8601 format.
title string Yes Human-readable title of the causal graph.
description string No Detailed description of the graph, its context, and assumptions.
creator object Yes Person who created the graph. Contains 'name' (required) and 'orcid' (optional).
contributors array of objects No List of people who contributed to the graph with their roles.
licence string Yes License under which the graph is distributed (e.g., 'CC BY 4.0', 'CC BY-SA 4.0').
type enum Yes Graph type: 'dag' (Directed Acyclic Graph), 'cyclic' (with cycles), 'undirected', 'mixed'.
cyclicity enum Yes Cyclicity status: 'acyclic', 'cyclic', or 'unknown'.
related_publication array of objects No List of publications that discuss or use this causal graph.
statistical_unit string No Unit of analysis for the graph (e.g., 'individual', 'household', 'population', 'time-series').

Nodes Array

Field Type Required Description
id string Yes Unique identifier for the node within the graph.
name string Yes Human-readable name of the variable or concept.
description string No Detailed description of what the node represents.
position object {x, y} No X and Y coordinates for visualization. Allows tools to preserve layout.
causal_role enum No Role in the causal structure: 'exposure', 'outcome', 'confounder', 'mediator', 'collider', 'other'.
variable_type enum No Data type: 'binary', 'categorical', 'continuous', 'count', 'survival', 'other'.
time_point string No Time point at which the variable is measured (e.g., 'baseline', 'month_6', 'follow-up').
structural_model string No Assumed structural model: 'linear', 'nonlinear', 'multiplicative', 'threshold', 'other'.
ontology_ids array of strings No References to ontologies (e.g., SNOMED-CT, UMLS, MeSH) for standardized variable definitions.

Edges Array

Field Type Required Description
source string Yes ID of the source (from) node.
target string Yes ID of the target (to) node.
type enum Yes Edge type: 'directed', 'undirected', 'bidirected', 'dashed' (unknown direction).
position object No Visualization metadata for edge layout (e.g., curve parameter for curved edges).
sign enum No Direction of effect: 'positive' (increases), 'negative' (decreases), 'unknown'.

Example:

{
  "graph": {
    "id": "12345678",
    "date": "2026-04-13T00:00:00Z",
    "title": "Ice Cream Sales and Drowning Deaths",
    "description": "A DAG showing that ice cream sales and drowning deaths are correlated due to a common cause (warm weather), not a direct causal relationship.",
    "creator": {
      "name": "Abdullah Ademoğlu",
      "orcid": "0000-0000-0000-0000"
    },
    "licence": "cc-by-4.0",
    "type": "dag",
    "cyclicity": "acyclic",
    "statistical_unit": "population"
  },
  "nodes": [
    {
      "id": "weather",
      "name": "Warm Weather",
      "description": "Temperature and seasonality (summer months)",
      "position": {"x": 50, "y": 50},
      "causal_role": "confounder",
      "variable_type": "continuous"
    },
    {
      "id": "ice_cream",
      "name": "Ice Cream Sales",
      "description": "Weekly ice cream sales in units sold",
      "position": {"x": 200, "y": 150},
      "causal_role": "exposure",
      "variable_type": "continuous"
    },
    {
      "id": "drowning",
      "name": "Drowning Deaths",
      "description": "Weekly drowning fatalities",
      "position": {"x": 350, "y": 150},
      "causal_role": "outcome",
      "variable_type": "count"
    }
  ],
  "edges": [
    {
      "source": "weather",
      "target": "ice_cream",
      "type": "directed",
      "sign": "positive"
    },
    {
      "source": "weather",
      "target": "drowning",
      "type": "directed",
      "sign": "positive"
    },
    {
      "source": "ice_cream",
      "target": "drowning",
      "type": "dashed"
    }
  ]
}

Use Cases

  • Cross-Platform Exchange: Share causal graphs between Open Causal and other analysis tools (R, Python packages like DoWhy, causalml).
  • API Integration: Enable programmatic access to graph definitions via REST and GraphQL APIs.
  • Version Control: Track changes to causal assumptions over time using Git and version control systems.
  • Validation: Implement automated consistency checking of causal structures and identifying d-separation relationships.
  • Meta-Analysis: Systematically review and compare causal assumptions across multiple studies and disciplines.
  • Sensitivity Analysis: Evaluate robustness of findings to violations of causal assumptions.
  • Domain Mapping: Link variables to biomedical ontologies (SNOMED, UMLS, MeSH) for standardized interpretation.
  • Cyclic Models: Support feedback loops and equilibrium dynamics in dynamic systems and complex networks.

Design Principles

The standard is built on the following principles:

  • Universality: Support all types of causal graphs: DAGs, cyclic models, undirected graphs, and mixed structures.
  • Simplicity: Easy to understand and implement for developers and researchers across disciplines.
  • Completeness: Capture all essential information about causal structures including temporal dynamics and measurement details.
  • Semantic Richness: Include causal roles, variable types, ontology links, and structural assumptions.
  • Flexibility: Support diverse modeling contexts without sacrificing clarity or standardization.
  • Compatibility: Align with existing standards where possible (DAGitty, Tetrad, causal-graph libraries).
  • Machine-Readability: Optimized for automated parsing, validation, and computational analysis.
  • Human-Readability: Remain understandable and editable by researchers without requiring specialized tools.

Contributing

We welcome contributions from the research community to improve and extend this standard. If you have suggestions, feedback, or would like to contribute to the development of this data standard, please reach out to us or visit our GitHub repository.