2 CKG Dev Workflow
Status: Version 1.0 (Sept ‘25’)
CKG - Knowledge graph and publishing system development workflow.
Goal
The goal is to enable easier knowledge use of complex document corpus by using a knowledge graph to enable the following:
- Publishing of search results as multi-format publication, and
- enable data analysis by providing FAIR linked open data.
2.1 Workflow
The workflow represents the stages that go from harvesting an unstructured document corpus, web or PDF, and converting it to structured data.
Wikibase is used for storage and knowledge graph creation to support the following features:
- community annotation,
- search for outputting ‘publication ready documents’ including via LLMs, and for
- providing knowledge graph data services.
2.1.1 Workflow steps (summary)
Working with IPCC Sixth Assessment Report (AR6) corpus.
- Locate AR6 report data sources - authors, glossary, acronyms list, etc.
- AR6 web scrape report texts (corpus harvest)
- Design initial knowledge graph data model
- Wikibase AR6 import
- Wikibase to Mediawiki report navigation mapping to Mediawiki Categories for report browsing in Mediawiki
- Harvest data:
- Authors
- Glossary
- Acronyms list
- References
- Bibliographic
- etc.
- Import above AR6 data to Wikibase
- Annotate report using above AR6 data
- Community annotation: #semanticClimate, Stockholm Climate Institute (SEI), Potsdam Climate Institute (PIK), UNESCO, UNFCCC, etc.
- Wikibase to Wikidata data mapping
- Wikibase data analysis and visualisations
- Publications generated from Wikibase of A& available as:
- REST API
- Command line
- Jupyter Noteooks
- Python CMS (e.g., Wagtail)
- Graph RAG LLM
- All of the above publishing channels use the following framework. Computational Publishing Service (CPS) using the publishing engine (CPS_Impress). CPS_Impress publishes from Wikibase to HTML using Jinja templating in a ‘Model View Controller’ architecture. Paged Media CSS styles are used to create PDF like layouts. Publications are saved back to the knowledge graph and online as sharable resources.
- Knowledge Graph - FAIR linked open data, and semantic outputs from Wikibase:
- REST API
- RDF export
- Dokieli RDFa
- Wikidata export