rdf:type foaf:Person

Tathagata Ghosh

Research data · FAIR stewardship · Knowledge graphs

I work at the intersection of research data management, metadata and ontologies, and knowledge graphs—making scientific data easier to integrate, interpret, and reuse. My deepest experience is in FAIR-oriented infrastructure for experimental workflows (DRACO at HZDR); I also ship robust data and software where implementation quality matters.

Open to research data management, data stewardship, research software engineering, and PhD-related opportunities · Germany & EU · Hybrid or remote

Get in touch Download CV

Graph · Key triples

A slice of the career graph: phases, study and research-support roles (OVGU RDM, MIRACUM, HZDR), how they connect, and outputs such as DRACO DataMaster and the thesis. Same facts as the interactive graph below—structured like triples for clarity.

tg:Person pursued Bachelor phase

tg:Person pursued Master phase

tg:Person pursued Current phase

bach:phase studiedAt GITAM · B.Tech

ms:phase studiedAt OVGU · M.Sc.

tg:Person transitionedTo Master phase

tg:Person transitionedTo Current phase

ms:phase parallel work @ OVGU / UMMD / HZDR

tg:Person workedAt HZDR

tg:Person worksAt OWIPL

tg:Person locatedIn Magdeburg

tg:Person built DRACO DataMaster

tg:Person authored M.Sc. thesis

th:Thesis about DRACO DataMaster

th:Thesis relatedTo HZDR · Knodel

tg:Person created OttoBot

tg:Person developed OCR · Partner rec.

tg:Person strengthened Data Eng. · ops roots

DRACO RDM platform Ontology-driven · HZDR

FAIR stewardship DMPs · CARE · OVGU RDM

M.Sc. Digital Engineering Thesis · knowledge graphs

EU Open to opportunities RDM · stewardship · research software · PhD paths

ex:About

I focus on research data infrastructure and data stewardship: how metadata, standards, and semantics help scientific data stay findable, interoperable, and reusable—not only at publication time, but throughout active experiments.

At HZDR, I built DRACO DataMaster as a student assistant: a FAIR-aligned stack for the DRACO laser experiment, with ontology-driven modeling, RDF/OWL in Protégé, automated Python ETL, MongoDB and MinIO at scale, and interactive exploration for validation—so heterogeneous experimental data can be integrated and interpreted with clear provenance. I presented this research data management work at OUTPUT2024 (TU Dresden).

Earlier at OVGU, I supported Research Data Management as a Hilfskraft: Data Management Plans, FAIR and CARE, evaluation of RDMO and RADAR, and bilingual RDM web content—work that sits close to what universities expect from stewardship and policy-facing roles. At Universitätsmedizin Magdeburg (MIRACUM), I worked on clinical data integration with FHIR, OMOP, and PostgreSQL—exposure to standardized health data that complements my later scientific RDM focus.

Alongside that trajectory, I have delivered end-to-end software and data systems in other settings (including a multi-tenant SaaS build with Docker-based services and document pipelines). That implementation experience supports the same goal: dependable, documented systems researchers and operators can actually run.

I am looking for roles where I can contribute to metadata-driven infrastructures, semantic interoperability, and reproducible workflows—in research groups, RDM teams, or doctoral projects that need both conceptual clarity and solid engineering.

ex:ResearchDataInterests

Areas I want to grow in and contribute to—aligned with my HZDR, OVGU, and thesis work, not a separate wish list.

FAIR data stewardship and responsible handling of research data (including CARE where communities require it)
Metadata-driven and policy-aware research infrastructures
Semantic interoperability and ontology-based integration across instruments, databases, and workflows
Knowledge graphs and linked data for discovery, validation, and reuse in science
Research software that is maintainable, documented, and embedded in real lab or institutional processes
Reproducible pipelines (containers, versioning, observability) from acquisition through publication support
Scientific data lifecycle support—from planning and DMPs to long-term access and cross-team handover

ex:ProfileGraph

This is an interactive career graph, not a static org chart. It traces how earlier engineering and operations experience feeds into a path centered on research data, stewardship, and semantic systems—HZDR, OVGU, MIRACUM, thesis work, and supporting software delivery. Zoom, pan, and click nodes to explore connections; use filters or expand a phase for detail.

Why research data & stewardship?

Across Brandix and TTEC, I kept seeing the same pattern in very different industries: data sat behind operational improvement, hiring and planning, fraud handling, product decisions, and how management actually steered the work. It was never decorative—it was what people used to decide and improve. That repeated exposure is what pushed me toward M.Sc. Digital Engineering and, over time, toward research data infrastructure—systems where metadata, semantics, and FAIR-minded practice make data usable, interoperable, and trustworthy for science and institutions, not only for operations.

The shift was deliberate. I wanted breadth across domains rather than a single sector, and to contribute early where digitalization meets real workflows—now especially in university and lab settings. I still take on demanding, varied projects (cloud pipelines, retrieval, research platforms, careful software delivery) because curiosity and learning by building are central to how I work.

In the graph, open the Motivation filter or tap “Why research data?” to see this thread in the nodes.

ex:Experience

Full-Stack Developer (Freelance)

OWIPL Oct 2024 – Present Visakhapatnam, India (Remote)

Context

Freelance engineering on a multi-tenant SaaS product (workforce and operations)—supporting evidence of full-stack delivery, not my primary research identity. Platform in active development (~70% complete).

Systems & data

Microservices stack (React, Node.js, FastAPI, Flask, MongoDB) with Docker Compose, health checks, tenant-isolated storage, audit logging, and RBAC—patterns relevant to operational research software and multi-user data boundaries.
JWT with refresh, rate limiting, and documented APIs—emphasis on maintainable services and clear data access paths.
Workflow-centric dashboards; document pipelines with PaddleOCR and local LLM steps for classification and extraction—applied structured information from unstructured inputs.

Tools

Ant Design, Mantine, Recharts, Framer Motion.

Data Science and Visualization Student Assistant

Helmholtz-Zentrum Dresden-Rossendorf (HZDR) Apr 2024 – Mar 2025 Dresden, Germany

Impact

Primary research-data role: built DRACO DataMaster—FAIR-oriented infrastructure for the DRACO laser experiment—so experimental data, metadata, and semantics stay aligned for integration, retrieval, and reuse across the workflow.

RDM · semantics · infrastructure

Ontology-driven schema design; migration from PostgreSQL to MongoDB; object storage on CEPH/MinIO at terabyte scale—supporting the scientific data lifecycle and large-file handling.
Knowledge graph (RDF/JSON-LD, OWL in Protégé) for semantic interoperability and experiment-wide linking of entities.
Python ETL (including ThreadPoolExecutor), reproducible service layout, InfluxDB and Grafana for monitoring—transparent, operable pipelines for the group.

Community

Presented RDM work at OUTPUT2024 (TU Dresden).

Student RPA Developer — GCP focus

Otto-von-Guericke-Universität Magdeburg Feb 2024 – Nov 2024 Magdeburg, Germany

Impact

University proof-of-concept for structured document intelligence in HR workflows: forms and CVs ingested, entities extracted, ranked matches returned. Project documentation reported 75% reduction in initial screening time for the evaluated flow—relevant to metadata extraction, semi-structured data, and orchestrated pipelines in institutional settings.

Pipelines · cloud · governance hooks

Flask portal with Google Document AI and Vertex AI NLP for skills/education extraction (project documentation cites ~40% better parsing than typical ATS for evaluated inputs).
BigQuery, TF-IDF/cosine similarity, Cloud Functions; Airflow (Composer) for end-to-end orchestration—patterns transferable to reproducible ETL and scheduled research jobs.
Terraform, Docker, Cloud Run; Looker Studio and Stackdriver—observable, deployable services.

Tools

GCP (Document AI, Vertex AI, BigQuery, Composer, Cloud Run), Terraform, Python, Looker Studio.

Wissenschaftlicher Hilfskraft (Research Data Management)

Otto-von-Guericke-Universität Magdeburg Mar 2023 – Sep 2023 Magdeburg, Germany

Impact

Direct research data stewardship support: clearer bilingual RDM presence for researchers, practical Data Management Plans, and alignment with FAIR and CARE expectations.

Stewardship · documentation · tooling

DMP drafting and iteration; evaluation of RDMO and RADAR for institutional research data workflows.
Revamped bilingual RDM web content; supported legal and ethical data handling questions alongside scientific teams.

Technischer Mitarbeiter IT (Datenintegrationszentrum, MIRACUM)

Universitätsmedizin Magdeburg May 2022 – Dec 2022 Magdeburg, Germany

Impact

Contributed to clinical research data integration (Project MIRACUM): standards-based exchange, reproducible ETL, and deployment in trusted environments—foundational exposure to interoperable health data that complements scientific RDM.

Standards · ETL · deployment

FHIR server research and implementation; OMOP for medical informatics support—semantic and structural conventions for multi-site data.
Python ETL and PostgreSQL for transform, storage, and reporting; local and intranet deployment.

Accenture Google Cloud Winter School

Magdeburg Feb 2023 – Mar 2023 Germany

Intensive GCP foundations—useful background for cloud-hosted research services, managed pipelines, and operational monitoring in RDM contexts.

Real-Time Analyst · Fraud Prevention Representative

TTEC Jan 2020 – Sep 2021 Ahmedabad, India

Real-Time Analyst

Nov 2020 – Sep 2021: workforce scheduling and planning with IEX NICE and Aspect WFM; intraday service-level monitoring; coordination across remote teams.

Fraud Prevention Representative

Jan – Oct 2020: Airbnb fraud prevention via chat and email; investigation, resolution, and quality targets.

Trainee Mechanical Engineer

Pooja Priya Construction Aug 2019 – Dec 2019 Visakhapatnam, India

Mechanical execution against contract; client coordination; safety standards; work breakdown structure; site supervision.

Executive Trainee

Brandix Apparel Solutions Ltd. Feb 2019 – Aug 2019 Visakhapatnam, India

Impact

Production workflow redesign using Lean (Kanban), Six Sigma (DMAIC), and JIT: roughly 45% defect reduction in four months, ~30% improvement in on-time shipment, and near-zero missing-garment incidents; cutting-to-stitching handover held to ≤30 minutes.

Tools & methods

Dynamic line segregation by destination; color-coded Kanban; real-time WIP and throughput dashboards; QA hub consolidation with barcode scanners and shipment staging.

Co-Founder & President

GITAM Aeromodelling Club Sep 2016 – Apr 2019 Visakhapatnam, India

Founded the club; ran a national-level ornithopter workshop with 150+ participants from three states; served as president through 2019.

Summer Intern (Quality Assurance)

Tata Motors May 2018 – June 2018 Jamshedpur, India

Process validation for CNC equipment as part of the QA department internship project.

ex:Education

M.Sc. Digital Engineering

Otto-von-Guericke Universität Magdeburg

Apr 2021 – Mar 2025 · Magdeburg, Germany

Focus: databases, system architecture, DevOps. Master’s thesis: DRACO DataMaster at HZDR—FAIR-aligned research data infrastructure, ontology-driven modeling, and knowledge graphs (supervision: OVGU & HZDR).

B.Tech Mechanical Engineering

GITAM University

Sep 2015 – Apr 2019 · Visakhapatnam, India

Focus: operations management, materials, mechanics, statistics.

ex:Thesis

Master’s thesis · Core research contribution

DRACO DataMaster

A metadata-driven approach using ontologies and knowledge graphs for laser particle acceleration research

Otto-von-Guericke Universität Magdeburg, February 2025 · Supervisors: Prof. Dr.-Ing. Bernhard Preim (OVGU), Dr. Oliver Knodel (HZDR)

Problem

DRACO at HZDR generates rich, heterogeneous experimental data. Without strong metadata and semantics, integration, retrieval, and reuse lag behind what the science demands.

Contribution

An automated path from messy tabular inputs to knowledge graphs enriched with experiment-specific ontologies, aligned with FAIR so teams can integrate, structure, and visualize data for data-driven research.

Methods

Graph exploration — Force-directed layout (Barnes–Hut, O(N log N)) linking devices, shots, and measurements.
Interactive analytics — Filtering, parameterized measurement plots, device-activity views for validation and anomaly review.
Ontology-driven data management — RDF/OWL in Protégé with RDFLib; MongoDB and MinIO for structured and object storage.
UX & reproducibility — Shneiderman-style interaction patterns, colorblind-safe palettes (Paul Tol), Git and Docker for repeatable runs.

Research questions

RQ1 — How do knowledge graphs improve interpretability of complex experimental datasets?
RQ2 — What role does interactive visualization play in anomaly detection and validation?
RQ3 — How does an ontology-based framework improve integration, retrieval, and interoperability?

Tools

Python 3.9
MongoDB
MinIO
Protégé (OWL)
RDFLib
Streamlit
PyVis (Vis.js)
Plotly
Pandas
NumPy
NetworkX
Docker

Download full thesis (PDF)

ex:Projects

The thesis section above is the authoritative write-up of DRACO DataMaster. Below: the same flagship summarized as a project card, plus further OVGU work on knowledge access, retrieval, and applied data systems—unchanged facts, research-oriented framing.

DRACO DataMaster — FAIR research data & knowledge graph

Apr 2024 – Mar 2025 · HZDR · Master’s thesis

Problem DRACO at HZDR produces large, heterogeneous experimental datasets; without strong metadata and semantics, integration and reuse lag behind what the science needs.

Approach Metadata-driven pipelines, experiment-specific ontologies (Protégé / OWL), RDF and RDFLib, MongoDB and MinIO, and interactive graph and chart tooling so teams can validate, explore, and link data across the workflow.

Thesis Full treatment, methods, and research questions are in the Thesis section; PDF available for download there.

OttoBot — Transformer-based Educational Chatbot

Nov 2023 – Feb 2024 · OVGU Magdeburg

Problem Students and staff need fast, accurate answers about OVGU policies without hunting PDFs and portals—an institutional knowledge access problem.

Approach RAG-style system: LangChain, Llama-2, HuggingFace embeddings, FAISS, crawlers and Unstructured URL Loader, Streamlit UI—grounding answers in ingested university content (open-source end to end).

Outcome Context-aware answers grounded in ingested university content (see paper).

Read paper (PDF)

OCR Strategy for Keyword Extraction & Slide Recommendation

Nov 2022 – Feb 2023 · OVGU Magdeburg

Problem Learners using SQLValidator need feedback tied to the right lecture material—linking exercise errors to relevant slides.

Approach OCR (Tesseract) to recover text from slides, keywording, TF-IDF and cosine similarity for information retrieval-style ranking; bilingual stop-word handling and tuned thresholds.

Outcome 72% precision on English slide recommendations in the evaluated setup.

Read paper (PDF)

Project Partner Recommendation System (Big Five)

May 2022 – Oct 2022 · IEEE WCCCT 2023

Problem Course projects suffer when team composition ignores preferences and performance constraints—an applied data and decision-support question in an academic setting.

Approach Big Five questionnaire plus collaborative filtering and utility-based recommendation so teams respect score bands while matching collaboration style.

Publication Co-authored with Chukwuka Victor Obionwu, Damanpreet Singh Walia, Taruna Tiwari, David Broneske, and Gunter Saake.

Read paper (PDF)

Other projects

Credit Card Fraud Detection (BQML, GCP, Google Data Studio) · Apr–Sep 2021

ex:Skills

Grouped for research data and stewardship contexts; technical depth reflects HZDR, OVGU RDM, MIRACUM, thesis, and supporting software work—see the career graph for how they connect.

Research data & stewardship

FAIR and CARE principles · Data Management Plans · metadata modeling and documentation · bilingual RDM web content · evaluation of tools (e.g. RDMO, RADAR) · research data lifecycle support · presentation of RDM work (e.g. OUTPUT2024)

Semantic technologies

RDF, RDFS, OWL · ontology engineering in Protégé · RDFLib · JSON-LD · knowledge graphs · semantic interoperability for scientific data

Data engineering & infrastructure

Python · SQL · ETL (batch and threaded) · MongoDB · MinIO · PostgreSQL · Docker · Docker Compose · Git / GitLab CI · GCP (Document AI, Vertex AI, BigQuery, Composer, Cloud Run, Terraform)

Visualization & research support

Streamlit · PyVis (Vis.js) · Plotly · NetworkX · Grafana · Looker Studio · interactive exploration and validation of complex datasets

Collaboration & coordination

Stakeholder communication · clear documentation and process notes · project coordination · cross-functional collaboration · support mindset alongside researchers and operators · earlier experience in operations, QA, and team leadership (student club)

Certifications

A Hands-on Introduction to Engineering Simulations · SQL · Introduction to programming with MATLAB · HTML Essential Training · Introduction to CSS

ex:References

… fulfilled the student assistance tasks to the fullest satisfaction. His behavior towards colleagues and research partners was always exemplary.

— Annette Strauch-Davey, OVGU (RDM website & Egotech training)

We thank Mr. Ghosh for his performance and wish him all the best for the future.

— Prof. Dr. J. Bernarding & Dr.-Ing. T. Herrmann, IBMI / DIZ, Otto-von-Guericke-Universität Magdeburg

ex:Contact

Open to research data management, data stewardship, research software / infrastructure, and PhD or project opportunities in Germany or the EU (hybrid or remote). Happy to discuss how DRACO DataMaster, OVGU RDM, and MIRACUM experience maps to your group. Work authorization available.

tghosh12101997@gmail.com gh.tathagata@gmail.com +49 152 0825 8194 LinkedIn · t-ghosh GitHub

Magdeburg, Saxony-Anhalt, Germany