Our Work

Projects That Run in Production

Every project listed here was delivered to a real client, runs on real infrastructure, and handles real data. No concept projects. No internal experiments.

CASE / 01

Automotive Data Platform for Swedish E-commerce

automotive_etl.py

5M+ Parts Records149 BrandsReal-Time Sync

CLIENT TYPE: Swedish Automotive E-commerce Platform
INDUSTRY: Automotive / E-commerce / Retail
SERVICES: Data Engineering, Web Scraping, ETL Pipeline
STACK: Python, Apache Airflow, PostgreSQL, Scrapy, PrestaShop, TecDoc, Docker

THE CHALLENGE

A Swedish automotive parts retailer needed to integrate over 5 million parts records from TecDoc and multiple supplier feeds into their PrestaShop storefront. Manual catalog management was causing product gaps, pricing errors, and inventory mismatches that were costing them sales. Existing processes involved overnight manual batch uploads that frequently failed without notification.

WHAT WE BUILT

— A full TecDoc integration pipeline extracting parts data, vehicle compatibility mappings, and brand hierarchies across 149 automotive brands
— Automated supplier feed ingestion from multiple European parts distributors with schema normalization and conflict resolution
— A data enrichment layer matching TecDoc article numbers to supplier stock and pricing data
— A custom PrestaShop PHP plugin for real-time catalog sync from the PostgreSQL warehouse
— Apache Airflow DAGs for scheduled pipeline execution with retry logic, failure alerting, and full task logging
— Data quality checks at every transformation stage catching null anomalies and schema drift before records reach the storefront

THE OUTCOME

5 million+ parts records fully integrated and searchable by vehicle registration number
149 brands covered with complete vehicle compatibility data
Pipeline runtime reduced from overnight manual batch to sub-hourly automated updates
Zero silent failures: every pipeline error surfaces immediately via Slack alerting
Client team operates the system independently with runbooks and observability dashboards

PythonApache AirflowPostgreSQLScrapyPrestaShopTecDocDocker

CASE / 02

Medallion Architecture Pipeline for Healthcare Claims Data

medallion_dag.py

3 Layer ArchitectureFull ObservabilityApache Airflow Orchestrated

CLIENT TYPE: Healthcare Data Platform
INDUSTRY: Healthcare / Data Engineering
SERVICES: Data Engineering, Data Architecture, ETL Pipeline
STACK: Python, Apache Airflow, PostgreSQL, dbt, Docker

THE CHALLENGE

A healthcare data platform needed to process large volumes of CMS Medicare claims data across inpatient, outpatient, carrier, and prescription drug domains. The existing pipeline was a collection of unorchestrated scripts with no data quality enforcement, no observability, and no clear separation between raw and analytical data. Data engineers spent significant time debugging failures that were discovered only when downstream reports broke.

WHAT WE BUILT

— A three-layer medallion architecture with clear separation between raw ingestion, transformation, and analytical output
— Bronze layer: raw file ingestion with schema validation, audit logging, and idempotent load patterns
— Silver layer: standardized transformations, deduplication, business rule enforcement, and data type normalization
— Gold layer: clean, aggregated analytical tables structured for direct consumption by reporting and AI systems
— Apache Airflow DAGs with task-level logging, SLA monitoring, and failure alerting across all three layers
— dbt models for declarative, version-controlled transformations with automated testing
— Full data lineage tracking showing the path from raw source file to analytical table for every record

THE OUTCOME

Zero undetected pipeline failures since deployment: every error surfaces with context before downstream impact
Data engineering team time spent debugging reduced by over 60 percent
New claim types added to the pipeline without restructuring existing DAGs
Complete data lineage from raw source to analytical layer for governance and audit requirements

PythonApache AirflowPostgreSQLdbtDocker

CASE / 03

RAG Knowledge Pipeline over Legal Document Corpus

rag_retrieval.py

Semantic SearchCitation Grounded90%+ Retrieval Precision

CLIENT TYPE: Legal Technology Platform
INDUSTRY: Legal / AI / Knowledge Management
SERVICES: RAG Systems, AI Development, Data Engineering
STACK: Python, LangChain, OpenAI API, pgvector, PostgreSQL, FastAPI, Docker

THE CHALLENGE

A legal technology platform needed to enable semantic search and accurate question answering over a large corpus of legal documents. Users needed to query complex legal content in natural language and receive precise, cited answers without the hallucination risk of a general-purpose LLM. The document corpus was large, varied in format, and required careful chunking to preserve legal context across section boundaries.

WHAT WE BUILT

— A full document ingestion pipeline handling PDF, HTML, and structured text formats with layout-aware parsing
— A chunking strategy with overlap and section boundary preservation to maintain legal context
— Embedding generation using OpenAI text-embedding-3-large with batch processing for the full corpus
— pgvector integration in PostgreSQL for vector storage, cosine similarity search, and metadata filtering by document type and date range
— LangChain retrieval chain with custom prompt templates engineered for legal accuracy and mandatory citation formatting
— A ground truth evaluation framework testing retrieval precision and answer groundedness across hundreds of real queries
— FastAPI deployment with JWT authentication and rate limiting for production use

THE OUTCOME

Retrieval precision above 90 percent on the ground truth evaluation set
Every answer includes citations linking directly to the source document and paragraph
Average query response time under two seconds including retrieval and generation
Fully modular architecture: swap embedding model, vector store, or LLM without rewriting the pipeline

PythonLangChainOpenAI APIpgvectorPostgreSQLFastAPIDocker

CASE / 04

AI Automation Pipeline for Business Operations

automation_flow.json

80% Manual Work EliminatedSelf-MonitoringMulti-Channel Output

CLIENT TYPE: Business Operations Platform
INDUSTRY: Operations / AI Automation
SERVICES: AI Automation, Agent Development, Workflow Engineering
STACK: n8n, Python, LangChain, OpenAI API, PostgreSQL, Slack API, Docker

THE CHALLENGE

A business operations team was spending multiple hours per day on a multi-step data handling process that involved fetching data from external sources, applying classification logic, writing results to a database, and notifying relevant stakeholders. The process was manual, error-prone, and delayed decisions because the data was always hours behind.

WHAT WE BUILT

— An end-to-end n8n workflow replacing the entire manual process with a fully autonomous pipeline
— Custom Python function nodes for data transformation and business logic that exceeded n8n's native capabilities
— LangChain integration for LLM-powered classification and decision steps within the workflow
— Webhook and scheduled polling triggers replacing manual process initiation
— PostgreSQL write-back with structured output and full audit trail of every automated decision
— Slack alerting for both successful completions and failures with full context for rapid resolution
— Error handling, retry logic, and circuit breaker patterns preventing cascade failures

THE OUTCOME

Manual processing time reduced by over 80 percent
Data freshness improved from hours to minutes
Zero missed processing cycles since deployment
The same automation pattern is now reused across three additional workflows for the same client

n8nPythonLangChainOpenAI APIPostgreSQLSlack APIDocker

RESPONSE TIME < 24H

Ready to build something that actually works in production?

Tell us about your data challenge. We will respond within 24 hours with a clear assessment and a practical plan.

Start a Conversation