Data pipelines.
Written plainly.
By an engineer.

I'm Arnau Villoro, a data engineer in Barcelona. I write about Python, dbt, AWS — and the quiet habits that keep production boring. 8 years of notes, 69 posts, zero filler.

Read the blog About me

Solving the problem with small files in the Data Lake

Solving the problem with small files in the Data Lake

This post addresses the common problem of small files in data lakes, which can lead to significant performance degradation and increased costs. It provides an in-depth guide on understanding the issues caused by small files, determining optimal file sizes, and effectively managing file sizes using tools like Apache Spark, AWS Athena, Delta Lake, and Apache Iceberg. The post also covers strategies for tracking file sizes and partitioning tables to optimize data processing and storage efficiency.

New post · One Environment to Rule Them All

69 posts published

8 years writing

Crafting solutions. Delivering efficiency.

Latest: … · DE

Latest writing.

Honest notes from production · updated monthly

See all 69 posts

Latest⚙️ DE · 10 Jun 2026 · 03 Mins read

One Environment to Rule Them All

Do we really need dev, test, and sandbox? Explore the case for a single prod environment, how dbt handles it, how to run Python this way, and the pros and cons of going prod-only.

How to Be Nice to the Data Team

⚙️ DE

How to Be Nice to the Data Team

05 May 2026 · 09 Mins read

Marimo notebooks for Python projects

🛠️ Tools/Utils

Marimo notebooks for Python projects

15 Apr 2026 · 06 Mins read

Recovering files from S3 using Delete Markers

⚙️ DE

Recovering files from S3 using Delete Markers

17 Mar 2026 · 05 Mins read

Protecting Production Tables in dbt

⚙️ DE

Protecting Production Tables in dbt

17 Feb 2026 · 04 Mins read

Scalable GitHub Actions for Modern Repos

☁️ Cloud/DevOps

Scalable GitHub Actions for Modern Repos

14 Jan 2026 · 04 Mins read

Scaling ECS Python Deployments with a Modular Monorepo

☁️ Cloud/DevOps

Scaling ECS Python Deployments with a Modular Monorepo

09 Dec 2025 · 04 Mins read