Architecting and developing a multi-tenant SaaS platform for workforce and operations management (~70% complete). Designed scalable microservices with React, Node.js, FastAPI, Flask, MongoDB, and Docker. Built an AI-assisted document processing pipeline using PaddleOCR and local LLM inference for classification and structured extraction. Implemented JWT auth with refresh tokens, RBAC, rate-limiting, audit logging, and user-isolated storage. Developed multi-role dashboards (user, organization, admin) with Ant Design, Mantine UI, Recharts, and Framer Motion; job/workflow pipelines with multi-state tracking; and orchestrated 4+ services with Docker Compose and health-check monitoring.
rdf:type → Person
Tathagata Ghosh
Data Engineer | Knowledge Graphs, ETL, Python, SQL, Cloud & Analytics. Building reliable, scalable, and well-governed data systems.
Open to Work — Data Engineer / Cloud Data Engineer / Analytics Engineer · Germany / EU (Hybrid or Remote)
Graph · Key triples
Subject — predicate → object (semantic relations)
ex:ProfileGraph
Instance of the profile ontology — central node and relations.
ex:About
I'm a Data Engineer at heart, focused on building reliable, scalable, and well-governed data systems that turn complex, unstructured information into usable products.
My core background is in data engineering, cloud platforms, and metadata-driven architectures—working with Python, SQL, distributed processing, and modern storage layers. At HZDR, I built a FAIR-aligned research data platform (DRACO DataMaster) using ontologies, knowledge graphs, MongoDB, and object storage (MinIO) to structure and serve large-scale scientific data for high-energy physics workflows.
Alongside this, I work as a freelance engineer for a stealth SaaS startup, contributing to an ERP-style platform where data modeling, visualization, workflow automation, and DevOps converge—including system design, analytics dashboards, API-driven data services, and containerized deployments with Docker.
What drives me is building end-to-end data products: from ingestion and modeling, to governance, visualization, and operational reliability. I'm seeking opportunities in the data and cloud engineering domain where I can keep growing while contributing to impactful, production-grade platforms.
Focus areas: Data Engineering & ETL · Python & SQL · Metadata systems & Knowledge Graphs · Cloud & DevOps (Docker, CI/CD, GCP) · Analytics & Visualization
ex:Experience
Built DRACO DataMaster—a FAIR-compliant, open-source research data infrastructure for a laser experiment. Migrated PostgreSQL to MongoDB with an ontology-driven schema; stored terabytes in CEPH/MinIO. Developed a knowledge graph with RDF/JSON-LD and Protégé (OWL) for semantic interoperability. Implemented ETL pipelines (Python, ThreadPoolExecutor), InfluxDB + Grafana dashboards for real-time monitoring. Presented RDM solutions at OUTPUT2024 (TU Dresden).
Architected a Cognitive RPA proof-of-concept: Flask-based portal where candidates submit forms and CVs and get instant role matching via Google Cloud. Document AI & Vertex AI NLP for extracting and normalizing skills/education (≈40% better parsing than typical ATS). BigQuery pipelines with TF-IDF and cosine similarity; Python Cloud Function ranks top-5 applicant–job matches. Cloud Composer (Airflow) orchestrated form → Document AI → Vertex AI → TF-IDF → BigQuery. Terraform, Docker, Cloud Run for IaC and CI/CD. Looker Studio dashboard for application volumes and fit-score distributions; Stackdriver for pipeline monitoring. 75% reduction in initial screening time.
Drafted and implemented Data Management Plans (DMPs) following FAIR and CARE principles. Evaluated RDM tools (RDMO, RADAR) and revamped bilingual RDM websites for improved accessibility. Supported legal and ethical data handling for scientific projects.
Utilized healthcare data, standardized documentation, and supported clinical research for Project MIRACUM. Deployed software in local environment and intranet; completed FHIR server research and implementation; implemented OMOP data model for medical informatics students. Python ETL scripts, PostgreSQL for data transformation; collect, store, and retrieve medical data for analysis and reporting.
Theoretical foundations for cloud operation and use; hands-on with Google Cloud technologies and complex application examples.
Real-Time Analyst (Nov 2020 – Sep 2021): Workforce scheduling & planning with IEX NICE & Aspect WFM; real-time management, absenteeism tracking, PTO, intraday SL monitoring; managed teams in remote locations. Fraud Prevention Representative (Jan–Oct 2020): AirBnB fraud prevention—customer chats/emails, fraud identification and resolution, productivity and quality targets, customer experience.
Compliance of mechanical execution per contract; client follow-up; safety standards; work breakdown structure; site supervision.
End-to-end redesign of garment production workflow using Lean (Kanban), Six Sigma (DMAIC), and JIT. Dynamic line segregation by destination country; color-coded Kanban and batch labels; real-time dashboards (WIP, throughput, staging). Cutting-to-stitching handover ≤30 mins. ~45% defect reduction in 4 months; near-zero missing-garment incidents; ~30% improvement in on-time shipment. QA hub consolidation, barcode scanners, and shipment-ID staging.
Started Aeromodelling Club at GITAM University; national-level Ornithopter workshop (150+ participants from 3 states); president until 2019.
Process validation of CNC machine as part of internship project in QA department.
ex:Projects
Three selected projects from my work at Otto-von-Guericke Universität Magdeburg: educational chatbots, slide recommendation systems, and team-formation tools.
OttoBot — Transformer-based Educational Chatbot
Nov 2023 – Feb 2024 · OVGU Magdeburg
OttoBot is a university guide at your fingertips: a Transformer-based educational chatbot for Otto-von-Guericke University that answers questions about OVGU policies and procedures using retrieval-augmented generation (RAG). We integrated LangChain with Llama-2, HuggingFace embeddings, and FAISS for semantic search, plus web crawlers and Unstructured URL Loader for ingesting live content. The system combines document loaders, character text splitting, vector stores, and a Streamlit UI—all with open-source tools—to deliver tailored, context-aware answers for students and staff.
Read paper (PDF)OCR Strategy for Keyword Extraction & Slide Recommendation
Nov 2022 – Feb 2023 · OVGU Magdeburg
A recommendation subsystem for SQLValidator that delivers automatic instructional feedback during online exercise sessions. We used optical character recognition (Tesseract) to extract keywords from lecture slides and exercise sheets, then applied TF-IDF and cosine similarity to map SQL tasks to the most relevant course slides. When a student submits an incorrect solution, the system recommends specific lecture slides to review. The pipeline includes preprocessing (cropping, logo masking), NLTK stop-word removal, keyword dictionaries for German and English, and a 0.2 similarity threshold—achieving 72% precision for English slides.
Read paper (PDF)Project Partner Recommendation System (Big Five)
May 2022 – Oct 2022 · IEEE WCCCT 2023
A team-formation recommendation system for university course projects that addresses the challenge of project breakdowns due to mismatched personalities and preferences. We used a Big Five personality questionnaire to elicit collaboration-relevant traits (neuroticism, agreeableness, conscientiousness, extraversion, openness), then combined collaborative filtering—grouping students with similar personality profiles—with utility-based recommendation so that team academic scores fall within a chosen threshold. The result is academically balanced teams better suited to productive collaboration. Co-authored with Chukwuka Victor Obionwu, Damanpreet Singh Walia, Taruna Tiwari, David Broneske, and Gunter Saake.
Read paper (PDF)Other projects
Credit Card Fraud Detection (BQML, GCP, Google Data Studio) · Apr–Sep 2021
ex:Education
M.Sc. Digital Engineering
Otto-von-Guericke Universität Magdeburg
Apr 2021 – Mar 2025 · Magdeburg, Germany
Specializations: Databases, In-Memory Technology, System Architecture, DevOps. Master's thesis: DRACO DataMaster — FAIR-compliant research data infrastructure and knowledge graph (supervised by HZDR).
B.Tech Mechanical Engineering
GITAM University
Sep 2015 – Apr 2019 · Visakhapatnam, India
Specializations: Operation Management, Material Technology, Mechanics, Statistics.
ex:Thesis
DRACO DataMaster: A Metadata-Driven Approach Utilizing Ontologies and Knowledge Graphs for the Laser Particle Acceleration
Otto-von-Guericke Universität Magdeburg, February 2025. Supervised by Prof. Dr.-Ing. Bernhard Preim (OVGU) and Dr. Oliver Knodel (HZDR).
Abstract
DRACO (Dresden Laser Acceleration Source) is a state-of-the-art high-power ultra-short pulse laser experiment at the Helmholtz-Zentrum Dresden-Rossendorf (HZDR). This thesis develops a DRACO DataMaster extension for advanced data handling: an automated pipeline that builds knowledge graphs from unsorted tabular data, enriched with metadata via ontologies tailored for DRACO experiments. The approach aligns with FAIR principles (Findable, Accessible, Interoperable, Reusable), enabling deeper scientific insight through improved data integration, structuring, and visualization—and a robust toolset for data-driven research at HZDR.
Key contributions
- Knowledge graph–based exploration — Force-directed layout (Barnes-Hut, O(N log N)) to visualize relationships between experimental entities (devices, shots, measurements).
- Interactive visualization — Real-time filtering, parameterized measurement plots, and device-activity views for anomaly detection and validation.
- Ontology-driven data management — RDF/OWL ontology (Protégé, RDFLib); MongoDB and MinIO for structured storage; semantic consistency and efficient retrieval.
- UX & accessibility — Shneiderman’s mantra, colorblind-safe palettes (Paul Tol), and reproducible pipelines (Git, Docker).
Research questions addressed
- RQ1: How can knowledge graphs enhance the interpretability of complex experimental datasets?
- RQ2: What role does interactive visualization play in anomaly detection and validation?
- RQ3: How can an ontology-based framework improve data integration, retrieval, and interoperability?
Tech stack
Python 3.9 · MongoDB · MinIO · Protégé (OWL) · RDFLib · Streamlit · PyVis (Vis.js) · Plotly · Pandas · NumPy · NetworkX · Docker
ex:Skills
Top skills
JavaScript · PHP · Data Warehouse Architecture · Python · SQL · ETL · Knowledge Graphs
Scientific SW & Data
RDF/OWL, FAIR/CARE principles, ontology engineering, semantic traceability, knowledge graphs, Protégé
Languages & Backend
Python (Pandas, NumPy), SQL, Bash · Flask, FastAPI · Git/GitLab CI, Docker, GCP, MongoDB, MinIO
Frontend & Visualization
React, PyVis (Vis.js), Streamlit · Looker Studio, Power BI, Grafana · Ant Design, Mantine UI, Recharts
Data & AI
Vertex AI, Document AI, PaddleOCR, LLM inference · Reproducible research, CI/CD, automated testing
Certifications
A Hands-on Introduction to Engineering Simulations · SQL · Introduction to programming with MATLAB · HTML Essential Training · Introduction to CSS
ex:References
… fulfilled the student assistance tasks to the fullest satisfaction. His behavior towards colleagues and research partners was always exemplary.
We thank Mr. Ghosh for his performance and wish him all the best for the future.
ex:Contact
Open to Data Engineer / Cloud Data Engineer / Analytics Engineer roles in Germany or EU (Hybrid or Remote). Work authorization available. Say hello or share an idea.
Magdeburg, Saxony-Anhalt, Germany