Reading Feed

Articles I've read with my notes and highlights

10 Predictions for Data Infrastructure in 2026
My LLM coding workflow going into 2026 by Addy Osmani
  • the first step is brainstorming a detailed specification with the AI
  • The key point is to avoid huge leaps. By iterating in small loops, we greatly reduce the chance of catastrophic errors and we can course-correct quickly. LLMs excel at quick, contained tasks - use that to your advantage.
  • think Claude Skills have potential because they turn what used to be fragile repeated prompting into something durable and reusable by packaging instructions, scripts, and domain specific expertise into modular capabilities that tools can automatically apply when a request matches the Skill
  • automated tests, do code reviews - both manual and AI-assiste
  • No matter how much AI I use, I remain the accountable engineer.
  • Frequent commits are your save points - they let you undo AI missteps and understand changes.
  • spin up a fresh git worktree for a new feature or sub-project. This lets me run multiple AI coding sessions in parallel on the same repo without them interfering, and I can later merge the changes
  • Use your CI/CD, linters, and code review bots - AI will work best in an environment that catches mistakes automatically.
  • one of my goals is to bolster the quality gates around AI code contribution: more tests, more monitoring, perhaps even AI-on-AI code reviews. It might sound paradoxical (AIs reviewing AIs), but I’ve seen it catch things one model missed.
  • Treat every AI coding session as a learning opportunity - the more you know, the more the AI can help you, creating a virtuous cycle.
  • Dunning-Kruger on steroids (it may seem like you built something great, until it falls apart
Claude Code On-The-Go
How uv got so fast by Andrew Nesbitt
  • HTTP range requests for metadata. Wheel files are zip archives, and zip archives put their file listing at the end. uv tries PEP 658 metadata first, falls back to HTTP range requests for the zip central directory, then full wheel download, then building from source. Each step is slower and riskier. The design makes the fast path cover 99% of cases. This is HTTP protocol work, not Rust.
  • Zero-copy deserialization. uv uses rkyv to deserialize cached data without copying it. The data format is the in-memory format. This is a Rust-specific technique.
  • v is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still.
  • pip could implement parallel downloads, global caching, and metadata-only resolution tomorrow. It doesn’t, largely because backwards compatibility with fifteen years of edge cases takes precedence. But it means pip will always be slower than a tool that starts fresh with modern assumptions.
Beyond Indexes: How Open Table Formats Optimize Query Performance by Jack Vanlightly
  • All rows sharing the same partition key values are written into the same directory or group of files. This creates data locality such that when a query includes a filter on the partition column (for example, WHERE EventDate = ‘2025-10-01’), the engine can identify exactly which partitions contain relevant data and ignore the rest. This process is known as partition pruning
  • Delta supports Z-ordering, which is a multidimensional clustering technique. Instead of sorting by a single column, Z-order interleaves the bits of multiple column values (for example, Country, Nationality, and EventDate) into a single composite key that preserves spatial locality
  • Z-ordering is particularly effective for queries that filter on multiple dimensions simultaneously
  • But when data isn’t organized by that column and all countries are mixed randomly across files, the statistics show it. Each file’s min/max range becomes broad, often spanning most of the column’s domain. The result is poor selectivity and less effective pruning, because the query planner can’t confidently skip any files
  • A Bloom filter is a compact probabilistic data structure that can quickly test whether a value might exist in a data file (or row group). If the Bloom filter says “no,” the engine can skip reading that section entirely
  • They effectively trade storage and maintenance cost for speed, just like secondary indexes do in the RDBMS
  • Open table formats like Iceberg, Delta, and Hudi store data in immutable, columnar files, optimized for large analytical scans.Query performance depends on data skipping (pruning), which is the ability to avoid reading irrelevant files or row groups based on metadata.Pruning effectiveness depends on data layout.Data layout levers:Partitioning provides strong physical grouping across files, enabling efficient partition pruning when filters match partition keys.Sorting improves data locality within partitions, tightening column value ranges and enhancing row-group-level pruning.Compaction consolidates small files and enforces consistent sort order, making pruning more effective (and reducing the small file cost that partitioning can sometimes introduce).Z-ordering (Delta) and Liquid Clustering (Databricks) extend sorting to multi-dimensional and adaptive clustering strategies
  • Column statistics in Iceberg manifest files and Parquet row groups drive pruning by recording min/max values per column. The statistics reflect the physical layout.Bloom filters add another layer of pruning, especially for unsorted columns and exact match predicates. Some systems maintain sidecar indexes such as histograms or primary-key-to-file maps for faster lookups (e.g., Hudi, Paimon).Materialized views and precomputed projections further accelerate queries by storing data in the shape of common query patterns (e.g., Dremio Reflections). These require some data duplication and data maintenance, and are the closest equivalent (in spirit) to the secondary index of an RDBMS
  • Both rely on structure and layout to guide access, the difference is that instead of maintaining B-tree structures, OTFs lean on looser data layout and lightweight metadata to guide search (pruning being a search optimization)
  • Secondary indexes, so valuable in OLTP, add little to data warehousing. Analytical queries don’t pluck out individual rows, they aggregate, filter, and join across millions
  • ou can’t sort or cluster the table twice without making a copy of it. But it turns out that copying, in the form of materialized views, is a valuable strategy for supporting diverse queries over the same table, as exemplified by Dremio Reflections. These make the same cost trade offs as secondary indexes: space and maintenance for read speed
DEW - The Year in Review 2025 by Ananth Packkildurai
Prompt caching: 10x cheaper LLM tokens, but how? | ngrok blog by ngrok
Zero to One: Learning Agentic Patterns by Philipp Schmid
  • An initial LLM acts as a router, classifying the user’s input and directing it to the most appropriate specialized task or LLM.
  • Reflection Pattern

An agent evaluates its own output and uses that feedback to refine its response iteratively. This pattern is also known as Evaluator-Optimizer and uses a self-correction loop

  • The key to success with any LLM application, especially complex agentic systems, is empirical evaluation. Define metrics, measure performance, identify bottlenecks or failure points, and iterate on your design. Resist to over-engineer
How to Choose the Right Embedding Model for RAG - Milvus Blog
  • Sparse vectors (like BM25) focus on keyword frequency and document length. They’re great for explicit matches but blind to synonyms and context—“AI” and “artificial intelligence” would look unrelated
  • Dense vectors (like those produced by BERT) capture deeper semantics. They can see that “Apple releases new phone” is related to “iPhone product launch,” even without shared keywords. The downside is higher computational cost and less interpretability
  • Since one token is roughly 0.75 words
  • The key is balance. For most general-purpose applications, 768–1,536 dimensions strike the right mix of efficiency and accuracy. For tasks that demand high precision—such as academic or scientific searches—going beyond 2,000 dimensions can be worthwhile. On the other hand, resource-constrained systems (such as edge deployments) may use 512 dimensions effectively, provided retrieval quality is validated. In some lightweight recommendation or personalization systems, even smaller dimensions may be enough
  • Under the hood, BERT’s input vectors combined three elements: token embeddings (the word itself), segment embeddings (which sentence it belongs to), and position embeddings (where it sits in the sequence). Together, these gave BERT the ability to capture complex semantic relationships at both the sentence and document level. This leap made BERT state-of-the-art for tasks like question answering and semantic search.
  • Dense vectors capture deep semantics, handling synonyms and paraphrases (e.g., “iPhone launch”, ≈ “Apple releases new phone”).
  • Sparse vectors assign explicit term weights. Even if a keyword doesn’t appear, the model can infer relevance—for example, linking “iPhone new product” with “Apple Inc.” and “smartphone.”
  • Multi-vectors refine dense embeddings further by allowing each token to contribute its own interaction score, which is helpful for fine-grained retrieval.
  • With the right tweaks, LLMs can generate embeddings that rival, and sometimes surpass, purpose-built models. Two notable examples are LLM2Vec and NV-Embed.
  • Screen with MTEB subsets. Use benchmarks, especially retrieval tasks, to build an initial shortlist of candidates. Test with real business data. Create evaluation sets from your own documents to measure recall, precision, and latency under real-world conditions. Check database compatibility. Sparse vectors require inverted index support, while high-dimensional dense vectors demand more storage and computation. Ensure your vector database can accommodate your choice. Handle long documents smartly. Utilize segmentation strategies, such as sliding windows, for efficiency, and pair them with large context window models to preserve meaning.
The Ultimate Guide to LLM Evaluation: Metrics, Methods & Best Practices by Kelsey Kinzer
AWS Launches ECS Express Mode to Simplify Containerised Application Deployment by Matt Saunders
Prompt caching: 10x cheaper LLM tokens, but how? | ngrok blog by ngrok
Create and update Apache Iceberg tables with partitions in the AWS Glue Data Catalog using the AWS SDK and AWS CloudFormation
APACHE SPARK OPTIMISATIONS by Guna Chandra Durgapu
Column Storage for the AI Era by Julien Le Dem
Agent Engineering: A New Discipline by LangChain
  • Agent engineering is the iterative process of refining non-deterministic LLM systems into reliable production experiences. It is a cyclical process: build, test, ship, observe, refine, repeat.
Getting into public speaking - James Brooks
Skills vs Dynamic MCP Loadouts by Armin Ronacher
  • You still declare tools ahead of time in the system message, but they are not injected into the conversation when the initial system message is emitted. Instead they appear at a later point. The tool definitions however still have to be static for the entire conversation, as far as I know. So the tools that could exist are defined when the conversation starts. The way Anthropic discovers the tools is purely by regex search.
Why your mock breaks later | Ned Batchelder by @nedbat@hachyderm.io
Introducing AWS Glue 5.1 for Apache Spark | AWS Big Data Blog
Context Engineering: How RAG, agents, and memory make LLMs actually useful
  • This dual-memory approach mirrors human cognition:

Short-term Memory (Redis): Like working memory, it holds the current conversation context. Fast access, automatic expiration Long-term Memory (Vector Store): Persistent knowledge that grows over time. Important patterns and learnings are embedded and searchable