basix | data, software and the universe

The Annoying Truth About ‘Serverless’ Data

Serverless mostly means ‘someone else runs the servers’. You still pay.

Lambda Architecture Without the Trauma

Hybrid batch+streaming can work—if you pick a single source of truth and stop duplicating business logic.

Vector Search Pipelines: Embeddings Are Data Engineering Too

Embeddings drift; treat them like any other dataset.

Feature Stores: Centralize Reuse, Decentralize Blame

A feature store is a contract system with extra steps.

Observability: Trace IDs for Data Pipelines (Yes, It Works)

Correlate events across ingest → transform → serve. Debugging gets boring.

Serving Layers: Materialized Views, Caches, and the Myth of ‘Realtime’

Realtime is a budget decision.

Metadata-Driven Pipelines: Dynamic Doesn’t Mean Uncontrolled

Drive config from metadata, but validate like a paranoid adult.

Bronze Table Quality Gates: Yes, Even Bronze

If you ingest garbage, you’ll analyze garbage. That’s not ‘agile’.

Kubernetes for Data Jobs: The Part Where YAML Becomes a Lifestyle

It’s great until you run 5000 pods and discover quotas.

Change Data Capture on Azure: Event Hubs, Debezium, and Reality

Azure can do CDC fine—if you respect throughput units and partition keys.