Data Repositories

Open science resources from the International Digital Oral History Lab

We are committed to the principles of open science and FAIR (Findable, Accessible, Interoperable, Reusable) data. Our GitHub repositories house the technical infrastructure, research tools, datasets, and documentation that underpin our projects — enabling researchers worldwide to reproduce, extend, and build upon our work.

MeDoraH Project

Transforming Oral History Research Through Semantic Technologies
Semantic Web Natural Language Processing FAIR Data Knowledge Graph

MeDoraH is a collaborative research project between UCL and TU Darmstadt, developing innovative digital methods for oral history research. The project integrates semantic web technologies with historical-interpretative analysis to understand the evolution of Digital Humanities.

Bridging computational methods and humanistic inquiry, MeDoraH provides a comprehensive framework for representing oral history interviews, their metadata, and associated analytical data — designed to support advanced content analysis, facilitate interdisciplinary research, and adhere to FAIR data principles.

  • File Management Digital library capturing technical metadata, provenance, and file relationships.
  • Content Modelling Detailed models representing structure, semantics, and relationships.
  • Enrichment Representing complex relationships for advanced querying and knowledge discovery.
View on GitHub

MDOH Project

Multimodal Digital Oral History
Multimodal Analysis Sound as Data Digital Hermeneutics Laughter Detection

MDOH develops methodologies and technical workflows for active engagement with the oral, aural, and sonic affordances of oral history collections — across both retro-digitised and born-digital materials. The project treats oral history artifacts as multifaceted resources rather than text-only objects, working across multiple representational modalities.

Central to MDOH is a commitment to reflexive digital practice. While leveraging computational approaches, the project remains attuned to oral history as a subjective and intersubjective meaning-making process, situated within specific cultural, temporal, and technological contexts.

Repository Structure data/ — released datasets and documentation
docs/ — methodology notes and workflow descriptions
src/ — reusable code, pipelines, and utilities
notebooks/ — exploratory analysis and prototypes
outputs/ — reproducible figures, tables, and exports
View on GitHub

MeDoraH_NLP

NLP Toolkit & Text Mining Suite
Workbench Workflow Hermeneutic Analysis Clustering Visualisation

A comprehensive suite of text mining and knowledge graph construction tools — enabling researchers to transform unstructured historical narratives into structured, semantically-rich knowledge representations.

  • Workflows Workflows for information extraction and knowledge graph construction.
  • LLM Workbench Cross-platform desktop app for Hermeneutic Analysis.
  • Preprocessing Segmentation, sentence boundary detection, and context-aware pair generation.
  • Visualiser Interactive visualisation of the ontology structure and graph data.
  • Hybrid Clustering LLM Embedding + Prompt based Clustering
View on GitHub

MeDoraH_Ontology

Ontology & Schema Definitions
Ontology Design Guidelines FAIR Data Metadata

Ontology and schema definitions supporting FAIR data principles and semantic enrichment for oral history research. This repository provides the formal knowledge representation layer that underpins the entire MeDoraH technical infrastructure.

  • Core Ontology OWL/RDF ontology with domains: Actor, Event, Artefact, ConceptualItem, SpatialEntity, TemporalEntity.
  • Metadata Schemas Standardised schemas for technical metadata, provenance, and relationships.
  • Properties Relation definitions, domains, ranges, and specialisation hierarchies.
View on GitHub

Contribute to Open Research

We welcome contributions from the community — whether bug reports, documentation improvements, methodological critiques, or new implementations. All repositories follow open-source best practices.

Visit our GitHub Organisation  ·  medorah@ucl.ac.uk