Reading Feed

Articles I've read with my notes and highlights

If you thought the speed of writing code was your problem - you have bigger problems by Michiel Scholten
AI CoE Tech Stack Squad - Agile Board
  • S3 Files works best when you need interactive, shared access to data that lives in Amazon S3 through a high performance file system interface. It’s ideal for workloads where multiple compute resources—whether production applications, agentic AI agents using Python libraries and CLI tools, or machine learning (ML) training pipelines—need to read, write, and mutate data collaboratively. You get shared access across compute clusters without data duplication, sub-millisecond latency, and automatic synchronization with your S3 bucket.
The Markdown File That Beat a $50M Vector Database by Micheal Lanham
  • Derived retrieval layers. When scale demands semantic search, you build an index over the files. OpenClaw does this with SQLite and sqlite-vec. The files remain the source of truth. The index is a search optimization.
  • Start with a Markdown file. You can always add a database later.
How we build evals for Deep Agents by LangChain Accounts
  • More evals ≠ better agents. Instead, build targeted evals that reflect desired behaviors in production.
Beyond the dashboard: how BlaBlaCar PMs use AI to self-serve data by Dorothée Clerc
  • The Barrier: It lacked our internal map. Users were forced to replace generic table placeholders with real column names manually; a tedious process that was highly prone to human error. To be able to scale this, the AI needed to know our specific architecture, not just general SQL syntax.
Context Anchoring
Breaking the Microbatch Barrier: The Architecture of Apache Spark Real-Time Mode
  • Microbatch mode processes batches of data called epochs. Epoch boundaries are decided upfront using start and end offsets. Real-time mode instead processes longer duration epochs but modifies how data flows within each epoch.
  • We essentially evolved the micro-batch in Structured Streaming into a checkpoint interval.
Still Missing Critical Pieces by Julien Simon
Building an MCP Ecosystem at Pinterest by Pinterest Engineering
  • Contrast with the MCP OAuth StandardThe MCP specification defines an OAuth 2.0 authorization flow where users explicitly authenticate with each MCP server, typically involving consent screens and per-server token management. Our approach is different: users already authenticate against our internal auth stack when they open a surface like the AI chat interface, so we piggyback on that existing session. There is no additional login prompt or consent dialog when a user invokes an MCP tool
Cognitive Helmets for the AI Bicycle Part 2: The Sometimes-Wrong Bot by Cat Hicks
  • developers in my interviews have pointed out that in their first months using Claude Code in a more “raw” way, counting on themselves to manage and monitor every output for sustained hours, they have felt a creeping sense of fatigue. One person called this “over-monitoring,” and multiple people have used the metaphor of “becoming a manager
  • If our goal is to help our junior colleagues integrate into organizational goals to use this tooling, we also need to listen to them about their challenges and friction points, and believe in their potential for learning.
Cognitive Helmets for the AI Bicycle: Part 1 by Cat Hicks
  • Avoid the temptation to spin up so many parallel tasks that you are in constant “cram.”
  • Another metacognitive strategy is something with the unglamorous name of pretesting. But it’s actually a fascinating window of insight into that “functional architecture” of our problem-solving minds. Simply put, if we prompt ourselves to try to generate an answer for something we don’t know before we go try to learn it, we learn better.
Your Data Agents Need Context
  • What agentic coding tools such as Claude Code are doing is making data engineers vastly more productive
Beyond Hypermodern: Python is easy now
  • Or do it dynamically: [project] name = “postmodern” dynamic = [“version”] …

[tool.hatch.version] source = “vcs”

Rethinking open source mentorship in the AI era by Abigail Cabunoc Mayes
  • CImplementationComprehensionRequire issue before pull requestHost an in-person code sprint for live discussionsContextAdd AI disclosure or AGENTS.mdContinuityWatch who comes back
  • AI tools are here to stay. The question is whether we adapt our practices to maintain what makes open source work: human relationships, knowledge transfer, and the multiplier effect.
OpenClaw and the Dream of Free Labour by The Daemon
Variant Type in Apache Parquet for Semi-Structured Data
  • Variant type—a feature that brings native support for semi-structured data to Parquet, significantly improving efficiency compared to less efficient formats such as JSON
  • Traditional approaches that store JSON as text strings require full parsing to access any field, making queries slow and resource-intensive. Variant solves this by storing data in a structured binary format that enables direct field access through offset-based navigation. Query engines can jump directly to nested fields without deserializing the entire document, dramatically improving performance.
  • Binary encodings like BSON improve upon plain JSON by storing data in binary format, but they still redundantly store field names like “timestamp”, “user”, and “event” in every row, wasting storage space
  • Variant data can be shredded by extracting frequently accessed fields into separate, strongly-typed columns
  • If the field matches the expected schema, its value is written to the strongly typed field.If the field does not match, the original representation is written as a Variant-encoded binary field and the corresponding strongly typed field is left NULL.
The Art of Learning in the AI Age by Jose Blanca
  • Exercises are opportunities to practice. It is through this practice that you develop your problem-solving skills. You will be tempted to let the AI write the code for you, but if you want to grow, you must resist that urge. If your objective is learning, do not use AI to write code you don’t understand—unless you intend to study that code until you do.
  • You won’t learn German or Chinese just by reading a grammar book or a dictionary. Similarly, you won’t become a good programmer just by reading about syntax. At the start of your journey prog
The Reviewer Isn’t the Bottleneck by Rishi Baldawa
  • Whether you can systematically extract what a good reviewer knows and run it at CI speed, I genuinely don’t know. Every check you write is one less thing a human has to catch. But reviewers don’t just catch bugs
  • They catch drift, intent mismatches, architectural decisions that look fine locally and cause problems three services away
ETL is Dead by Ananth Packkildurai
The Ivy Lee Method: Focus Better with This 100-Year-Old Strategy
Porto (OPO) Airport Delays
Making Retrospectives Effective with Small Concrete Actions and Rotating Facilitators by Ben Linders
  • He also encouraged rotating retrospective facilitators. It is challenging to fairly represent one’s own ideas and opinions while facilitating the retrospective, as every person brings their own unique perspective, Žabkar Nordberg said. Having different people who facilitate retrospectives helps build ownership and engagement
Using Git with coding agents by Simon Willison
  • Git has a mechanism called the reflog which can often capture details of code that hasn’t been committed to a permanent branch. Agents can search that, and search other branches too.
  • When you run a bisect operation you provide Git with some kind of test condition and a start and ending commit range. Git then runs a binary search to identify the earliest commit for which your test condition fails.