Open Causal JSON Data Standard
Overview
The Open Causal platform proposes a standardized JSON format for sharing and exchanging causal graphs of all types, including directed acyclic graphs (DAGs), cyclic causal models, and other causal structures. This data standard enables interoperability across different tools, platforms, and research workflows, making causal knowledge more accessible and machine-readable.
Purpose
The JSON standard serves several key purposes:
- Interoperability: Facilitate seamless exchange of causal graphs between different software tools, platforms, and research workflows.
- Flexibility: Support diverse graph types including DAGs, cyclic causal models, undirected graphs, and mixed structures.
- Reproducibility: Enable precise documentation of causal assumptions in a machine-readable format with full metadata.
- Accessibility: Provide a human-readable yet structured format that is easy to parse, generate, and validate.
- Rich Semantics: Capture detailed information about variables, causal roles, measurement details, and temporal relationships.
- Standardization: Promote consistent representation of causal knowledge across disciplines and research contexts.
Schema Specification (version 0.1)
The complete JSON schema for specifying causal graphs includes metadata, node definitions, and edge definitions. Below is the full structure with all possible fields:
Complete JSON Structure
{
"graph": {
"id": "unique-identifier",
"date": "2024-01-01T00:00:00Z",
"title": "Graph Title",
"description": "A description of the causal graph",
"creator": {
"name": "Creator Name",
"orcid": "0000-0000-0000-0000"
},
"contributors": [
{
"name": "Contributor Name",
"orcid": "0000-0000-0000-0001",
"role": "author"
}
],
"licence": "cc-by-4.0",
"type": "dag",
"cyclicity": "acyclic",
"related_publication": [
{
"doi": "10.1234/example",
"title": "Related Publication Title",
"year": 2024
}
],
"statistical_unit": "individual"
},
"nodes": [
{
"id": "node1",
"name": "Treatment",
"description": "The treatment or exposure variable",
"position": {"x": 100, "y": 150},
"causal_role": "exposure",
"variable_type": "continuous",
"time_point": "baseline",
"structural_model": "linear",
"ontology_ids": ["ONTOLOGY:12345"]
},
{
"id": "node2",
"name": "Outcome",
"description": "The primary outcome",
"position": {"x": 300, "y": 150},
"causal_role": "outcome",
"variable_type": "binary"
}
],
"edges": [
{
"source": "node1",
"target": "node2",
"type": "directed",
"position": {"curve": 0.5},
"sign": "positive"
}
]
}
Field Definitions
Graph Object
| Field | Type | Required | Description |
|---|---|---|---|
| id | string | Yes | Unique identifier for the graph. Should be URL-safe and immutable. |
| date | ISO 8601 datetime | Yes | Creation or last modified date of the graph in ISO 8601 format. |
| title | string | Yes | Human-readable title of the causal graph. |
| description | string | No | Detailed description of the graph, its context, and assumptions. |
| creator | object | Yes | Person who created the graph. Contains 'name' (required) and 'orcid' (optional). |
| contributors | array of objects | No | List of people who contributed to the graph with their roles. |
| licence | string | Yes | License under which the graph is distributed (e.g., 'CC BY 4.0', 'CC BY-SA 4.0'). |
| type | enum | Yes | Graph type: 'dag' (Directed Acyclic Graph), 'cyclic' (with cycles), 'undirected', 'mixed'. |
| cyclicity | enum | Yes | Cyclicity status: 'acyclic', 'cyclic', or 'unknown'. |
| related_publication | array of objects | No | List of publications that discuss or use this causal graph. |
| statistical_unit | string | No | Unit of analysis for the graph (e.g., 'individual', 'household', 'population', 'time-series'). |
Nodes Array
| Field | Type | Required | Description |
|---|---|---|---|
| id | string | Yes | Unique identifier for the node within the graph. |
| name | string | Yes | Human-readable name of the variable or concept. |
| description | string | No | Detailed description of what the node represents. |
| position | object {x, y} | No | X and Y coordinates for visualization. Allows tools to preserve layout. |
| causal_role | enum | No | Role in the causal structure: 'exposure', 'outcome', 'confounder', 'mediator', 'collider', 'other'. |
| variable_type | enum | No | Data type: 'binary', 'categorical', 'continuous', 'count', 'survival', 'other'. |
| time_point | string | No | Time point at which the variable is measured (e.g., 'baseline', 'month_6', 'follow-up'). |
| structural_model | string | No | Assumed structural model: 'linear', 'nonlinear', 'multiplicative', 'threshold', 'other'. |
| ontology_ids | array of strings | No | References to ontologies (e.g., SNOMED-CT, UMLS, MeSH) for standardized variable definitions. |
Edges Array
| Field | Type | Required | Description |
|---|---|---|---|
| source | string | Yes | ID of the source (from) node. |
| target | string | Yes | ID of the target (to) node. |
| type | enum | Yes | Edge type: 'directed', 'undirected', 'bidirected', 'dashed' (unknown direction). |
| position | object | No | Visualization metadata for edge layout (e.g., curve parameter for curved edges). |
| sign | enum | No | Direction of effect: 'positive' (increases), 'negative' (decreases), 'unknown'. |
Example:
{
"graph": {
"id": "12345678",
"date": "2026-04-13T00:00:00Z",
"title": "Ice Cream Sales and Drowning Deaths",
"description": "A DAG showing that ice cream sales and drowning deaths are correlated due to a common cause (warm weather), not a direct causal relationship.",
"creator": {
"name": "Abdullah Ademoğlu",
"orcid": "0000-0000-0000-0000"
},
"licence": "cc-by-4.0",
"type": "dag",
"cyclicity": "acyclic",
"statistical_unit": "population"
},
"nodes": [
{
"id": "weather",
"name": "Warm Weather",
"description": "Temperature and seasonality (summer months)",
"position": {"x": 50, "y": 50},
"causal_role": "confounder",
"variable_type": "continuous"
},
{
"id": "ice_cream",
"name": "Ice Cream Sales",
"description": "Weekly ice cream sales in units sold",
"position": {"x": 200, "y": 150},
"causal_role": "exposure",
"variable_type": "continuous"
},
{
"id": "drowning",
"name": "Drowning Deaths",
"description": "Weekly drowning fatalities",
"position": {"x": 350, "y": 150},
"causal_role": "outcome",
"variable_type": "count"
}
],
"edges": [
{
"source": "weather",
"target": "ice_cream",
"type": "directed",
"sign": "positive"
},
{
"source": "weather",
"target": "drowning",
"type": "directed",
"sign": "positive"
},
{
"source": "ice_cream",
"target": "drowning",
"type": "dashed"
}
]
}
Use Cases
- Cross-Platform Exchange: Share causal graphs between Open Causal and other analysis tools (R, Python packages like DoWhy, causalml).
- API Integration: Enable programmatic access to graph definitions via REST and GraphQL APIs.
- Version Control: Track changes to causal assumptions over time using Git and version control systems.
- Validation: Implement automated consistency checking of causal structures and identifying d-separation relationships.
- Meta-Analysis: Systematically review and compare causal assumptions across multiple studies and disciplines.
- Sensitivity Analysis: Evaluate robustness of findings to violations of causal assumptions.
- Domain Mapping: Link variables to biomedical ontologies (SNOMED, UMLS, MeSH) for standardized interpretation.
- Cyclic Models: Support feedback loops and equilibrium dynamics in dynamic systems and complex networks.
Design Principles
The standard is built on the following principles:
- Universality: Support all types of causal graphs: DAGs, cyclic models, undirected graphs, and mixed structures.
- Simplicity: Easy to understand and implement for developers and researchers across disciplines.
- Completeness: Capture all essential information about causal structures including temporal dynamics and measurement details.
- Semantic Richness: Include causal roles, variable types, ontology links, and structural assumptions.
- Flexibility: Support diverse modeling contexts without sacrificing clarity or standardization.
- Compatibility: Align with existing standards where possible (DAGitty, Tetrad, causal-graph libraries).
- Machine-Readability: Optimized for automated parsing, validation, and computational analysis.
- Human-Readability: Remain understandable and editable by researchers without requiring specialized tools.
Contributing
We welcome contributions from the research community to improve and extend this standard. If you have suggestions, feedback, or would like to contribute to the development of this data standard, please reach out to us or visit our GitHub repository.