2 Workflow
2.1 Goal
The goal is to enable easier knowledge use of complex document corpus by using a knowledge graph to enable the following:
- Publishing of search results as multi-format publication, and
- enable data analysis by providing FAIR linked open data.
2.2 Workflow
The workflow represents the stages that go from harvesting an unstructured document corpus, web or PDF, and converting it to structured data.
Wikibase is used for storage and knowledge graph creation to support the following features:
- community annotation,
- search for outputting ‘publication ready documents’ including via LLMs, and for
- providing knowledge graph data services.
2.2.1 Workflow steps
- Report sources
- IPCC web scrape report texts (corpus harvest)
- Generate initial knowledge graph data model (iterativ development)
- Wikibase report import
- Wikibase to Mediawiki report navigation mapping for report browsing
- Harvest data:
- Authors
- Glossary
- Acronyms list
- References
- Bibliographic
- Import above data to Wikibase
- Annotate report using above data
- Community annotation: #semanticClimate, Stockholm Climate Institute (SEI), Potsdam Climate Institute (PIK), UNESCO, UNFCCC, etc.
- Wikibase to Wikidata data mapping
- Wikibase data analysis and visualisations
- Publications available as:
- REST API
- Command line
- Jupyter Noteooks
- Python CMS (e.g., Wagtail)
- Graph RAG LLM
- All of the above publishing channels use the following framework. Computational Publishing Service (CPS) using the publishing engine (CPS_Impress). CPS_Impress publishes from Wikibase to HTML using Jinja templating in a ‘Model View Controller’ architecture. Paged Media CSS styles are used to create PDF like layouts. Publications are saved back to the knowledge graph and online as sharable resources.
- Knowledge Graph - FAIR linked open data, and semantic outputs from Wikibase:
- REST API
- RDF export
- Dokieli RDFa
- Wikidata export