Reading Feed

Articles I've read with my notes and highlights

Google DeepMind Introduces ATLAS Scaling Laws for Multilingual Language Models by Robert Krzaczyński
  • Google DeepMind researchers have introduced ATLAS, a set of scaling laws for multilingual language models that formalize how model size, training data volume, and language mixtures interact as the number of supported languages increases.
  • Results show that fine-tuning is more compute-efficient at lower token budgets, while pre-training becomes advantageous once training data and compute exceed a language-dependent threshold. For 2B-parameter models, this crossover typically occurs between about 144B and 283B tokens, providing a practical guideline for selecting an approach based on available resources
  • Rather than an enormous model that is trained on redundant data from every language, how large would a purely translation model need to be, and how much smaller would it make the base model?
Google Introduces TranslateGemma Open Models for Multilingual Translation by Daniel Dominguez
Why DuckDB is my first choice for data processing
How to parametrize exception testing in PyTest? by Kacper Borucki
Tips for getting coding agents to write good Python tests by Simon Willison
Why Stoicism is one of the best mind-hacks ever devised by Lary Wallace
  • Only by envisioning the bad can we truly appreciate the good; gratitude does not arrive when we take things for granted. It’s precisely this gratitude that leaves us content to cede control of what the world has already removed from our control anyway.
AI’s trillion-dollar opportunity: Context graphs by Ashu Garg, Jaya Gupta
  • We call the accumulated structure formed by those traces a context graph: not “the model’s chain-of-thought,” but a living record of decision traces stitched across entities and time so precedent becomes searchable. Over time, that context graph becomes the real source of truth for autonomy – because it explains not just what happened, but why it was allowed to happen.
  • Once you have decision records, the “why” becomes first-class data. Over time, these records naturally form a context graph: the entities the business already cares about (accounts, renewals, tickets, incidents, policies, approvers, agent runs) connected by decision events (the moments that matter) and “why” links. Companies can now audit and debug autonomy and turn exceptions into precedent instead of re-learning the same edge case in Slack every quarter.
  • The orchestration layer sees the full picture: what inputs were gathered, what policies applied, what exceptions were granted, and why. Because it’s executing the workflow, it can capture that context at decision time – not after the fact via ETL, but in the moment, as a first-class record.

That’s the context graph, and that will be the single most valuable asset for companies in the era of AI.

  • High headcount. If a company has 50 people doing a workflow manually (routing tickets, triaging requests, or reconciling data between systems), that’s a signal. The labor exists because the decision logic is too complex to automate with traditional tooling.
Memory: How Agents Learn
  • Here’s the dirty secret: when building agents with the API, we’ve made them capable, but we haven’t yet figured out how to make them learn.
  • Pattern 1: Session Memory Store messages in a database, retrieve them before every response, add them to the context. Agno gives you this out of the box — just give your agent a database.
  • Pattern 2: User Memory Remember facts about the user across sessions. The MemoryManager extracts preferences automatically and stores them in the database.
  • Pattern 3: Learned Memory Now let’s add learned memory: insights that apply beyond just one user. The key is a custom tool that saves learnings to a knowledge base
  • The quality of your knowledge base determines the quality of learning. Garbage in, garbage out. The solution: the agent proposes learnings, but only saves with explicit user approval.
  • A learning is worth saving if it’s:

Specific: “Tech P/E ratios typically range 20-35x” not “P/E varies” Actionable: Can be applied to future queries Generalizable: Useful beyond this one conversation

10 Predictions for Data Infrastructure in 2026
My LLM coding workflow going into 2026 by Addy Osmani
  • the first step is brainstorming a detailed specification with the AI
  • The key point is to avoid huge leaps. By iterating in small loops, we greatly reduce the chance of catastrophic errors and we can course-correct quickly. LLMs excel at quick, contained tasks - use that to your advantage.
  • think Claude Skills have potential because they turn what used to be fragile repeated prompting into something durable and reusable by packaging instructions, scripts, and domain specific expertise into modular capabilities that tools can automatically apply when a request matches the Skill
  • automated tests, do code reviews - both manual and AI-assiste
  • No matter how much AI I use, I remain the accountable engineer.
  • Frequent commits are your save points - they let you undo AI missteps and understand changes.
  • spin up a fresh git worktree for a new feature or sub-project. This lets me run multiple AI coding sessions in parallel on the same repo without them interfering, and I can later merge the changes
  • Use your CI/CD, linters, and code review bots - AI will work best in an environment that catches mistakes automatically.
  • one of my goals is to bolster the quality gates around AI code contribution: more tests, more monitoring, perhaps even AI-on-AI code reviews. It might sound paradoxical (AIs reviewing AIs), but I’ve seen it catch things one model missed.
  • Treat every AI coding session as a learning opportunity - the more you know, the more the AI can help you, creating a virtuous cycle.
  • Dunning-Kruger on steroids (it may seem like you built something great, until it falls apart
Claude Code On-The-Go