Lars dove into data pipelines, and emerged bearing arrows and wishing for a lot fewer copies.

What is there to think about regarding data pipelines, what is interesting about them?

Which tools are out there, and why might you want to use them?

Why all this talk about making fewer copies of data?

What does Lars' current ideal pipeline look like, and where does Elixir fit in?

Links
Quotes
  • I've been reading a lot about data pipelines
  • What's so special about data pipelines?
  • There's a lot of special tooling
  • There's a lot of bad, bad tooling
  • Less than optimal tooling
  • Converging on something biggerlk
  • He got me eventually
  • All of your steps in one bucket
  • What tools do you associate with data?
  • I inherited a data pipeline
  • BashReduce
  • Iterate on the L and the T
  • The modern data stack
  • And then you demand more work
  • No unnecessary copies
  • Barely a copy
  • Reconnecting with my Python roots