Single-man data platform | Readings

The definition of what is important to build a data platform is a tricky one. At this point, I think there are some aspects related to how we profile the data that could be automated and I see that there already some data profiling tools on platforms like Power BI (Azure) and Datadog. And aside from this I also think that if there was a bit of standardization into how to write some of our model’s that we’d be able to automate the generation of some given metrics. Revenue per month? If we set up a table with structured names where we know the revenue and define which are the dimensions, why couldn’t this be generated automatically? More, I read an article (on the list below) that insisted that we are missing a crucial breakthrough to enable self-learning AI. And that is how to automatically generate datasets that are useful. We are focusing on the tooling and architecture of said data platforms but I see this as the missing piece to enable our lives to be much more productive. And this applies to both work and daily activities. If what we do is automatically processed and fed into a data system that knows what to do with this data, we then can enter a feedback loop. That’s something that is already occurring with ads. There’s no man in the loop on how to decide which ads to show to you on instagram. We are just fed, given our internet history, the ads that we are more likely to watch and click on.

This is a very hard problem and one which I think makes data engineering relevant for the years to come. If I could do a comparison it would be to the battery industry. The world is becoming more and more reliant on it and there’s immense potential, but it’s a very difficult problem which is requiring slow advances that will eventually lead to the “single-man data platform” and constant feedback loops on how we can improve ourselves and our work.

Readings of the week