<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://americanexpress.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://americanexpress.io/" rel="alternate" type="text/html" /><updated>2026-06-15T11:13:09-04:00</updated><id>https://americanexpress.io/feed.xml</id><title type="html">American Express Technology</title><subtitle>American Express Technology Open Source and Blog</subtitle><entry><title type="html">Cell-Based Architecture for Resilient Payment Systems</title><link href="https://americanexpress.io/cell-based-architecture-for-resilient-payment-systems/" rel="alternate" type="text/html" title="Cell-Based Architecture for Resilient Payment Systems" /><published>2026-06-11T00:00:00-04:00</published><updated>2026-06-11T00:00:00-04:00</updated><id>https://americanexpress.io/cell-based-architecture-for-resilient-payment-systems</id><content type="html" xml:base="https://americanexpress.io/cell-based-architecture-for-resilient-payment-systems/"><![CDATA[<p>The American Express core payments ecosystem is a global platform relied on by Card Members and partners around the
        world. Every day, it processes live payment transactions that require high availability, low latency, and
        predictable performance.</p>
      <p>Resiliency is not an afterthought; it has been encoded into the system’s design from the beginning. Localized faults
        are contained within defined boundaries, and recovery is designed to be fast and predictable.</p>
      <p>To achieve this, the platform is built around a cell-based architecture that isolates failures, maintains low-latency
        processing, and scales capacity without expanding the failure domain.</p>
      <p>This blog outlines the principles that guide this architecture and how they help us build a resilient payments 
        latform at global scale.</p>
      <h2 id="core-payments-ecosystem">Core Payments Ecosystem</h2>
      <p>In 2018, we started a journey to modernize our core payments ecosystem. This platform processes live card and
        payment transactions and is mission-critical to our Card Members and partners.</p>
      <p>As we modernized the platform, resiliency remained a primary design requirement. We needed an architecture that
        could continue processing transactions reliably, even when individual components failed. This decision was heavily
        influenced by our historical design patterns, which predated the term “cell-based architecture,” but share many of
        the same principles.</p>
      <p>Our new platform targeted cloud-native technologies, which meant we needed to think differently about how we
        designed for resiliency and scalability.</p>
      <p>In the next sections, we’ll discuss some of the design principles we follow in our core payments ecosystem and
        how they not only improve our ability to process payments reliably but also help us reduce latency and
        scale more easily.</p>
      <h2 id="what-is-cell-based-architecture">What is Cell-based Architecture?</h2>
      <p>Cell-based architecture is an architecture pattern that has gained popularity in the cloud-native
        distributed systems space.</p>
      <p>The idea behind the concept is to group related microservices, databases, and other components into independent
        instances called cells. Each cell is able to function independently without reliance on other cells.</p>
      <p><img src="../_post_assets/cell-based-architecture-for-resilient-payment-systems/img/img1.jpg" alt="Cell Based Architecture Concept" class="center-block" /></p>
      <p><em>In this diagram: Each cell contains its own services and data so a failure stays within that cell instead of spreading
          across the platform.</em></p>
      <p>The primary benefit of cell-based architecture is reducing the blast radius of failures. With each cell being
        independent, if one cell experiences issues, it doesn’t impact the others. The trade-off is that cell-based
        architecture often increases management overhead and architectural complexity, as it requires careful design
        to ensure that cells are truly independent and that data is appropriately localized.</p>
      <p>However, for mission-critical systems like payments, we find that the benefits of a reduced blast radius and improved
        resiliency outweigh the additional complexity.</p>
      <p>We’ve also found that when implemented well, a cell-based architecture can help platforms reduce latency (by
        reducing external dependencies and network hops) and improve scaling by introducing additional independent cells.</p>
      <h2 id="how-we-follow-cell-based-architecture">How We Follow Cell-Based Architecture</h2>
      <p>Each instance of our core payments ecosystem is designed as a cell, which:</p>
      <ul>
        <li>Is an independently deployable unit that can process payments on its own.</li>
        <li>Has its own set of microservices, databases, and other components.</li>
        <li>Is a single failure domain, meaning that if one cell experiences issues, it doesn’t cascade the failure beyond the cell boundary.</li>
        <li>Can be taken out of rotation for maintenance or in response to failures without impacting the overall system or requiring coordination with other cells.</li>
        <li>Has no synchronous cross-cell dependencies in the critical path of processing transactions.</li>
      </ul>
      <p>A cell is defined by its failure boundaries rather than a specific infrastructure construct. In practice, cells
        never span multiple regions—everything required to process transactions (DNS, databases, microservices, and
        supporting services) remains local within that boundary.</p>
      <p>To achieve this, we follow a set of core principles that guide our design decisions and help us ensure that our
        cells are truly independent and resilient.</p>
      <h2 id="data-and-processing-locality-by-default">Data and Processing Locality by Default</h2>
      <p>Processing payments requires data: currency rates, merchant category codes, and so on. Some data is static, while
        some data changes with each transaction.</p>
      <h3 id="static--semi-static-data-replication">Static &amp; Semi-Static Data Replication</h3>
      <p>For static or semi-static data like currency rates and merchant category codes, we replicate that data to each cell.</p>
      <p><img src="../_post_assets/cell-based-architecture-for-resilient-payment-systems/img/img2.jpg" alt="Static and semi-static data replication" class="center-block" /></p>
      <p><em>In this diagram: Reference data is pushed into every cell ahead of time so transaction processing never needs a
          synchronous lookup to a central source.</em></p>
      <p>Rather than relying on a fall-through read to a centralized system of record during transaction processing, we
        pre-populate this data in each cell ahead of time. This keeps reference data local before transactions arrive, avoids
        cache-miss latency during processing, and preserves critical-path isolation.</p>
      <p>The replication work happens outside the transaction path, which lets us keep the data available locally without
        introducing synchronous cross-cell dependencies.</p>
      <h2 id="dynamic-data-routing">Dynamic Data Routing</h2>
      <p>Not all data is static and not all data can be pre-populated. For more dynamic data (data that changes with each
        transaction), data replication may not be fast enough to ensure that every cell has the right data at the right
        time. We don’t want to route transactions to cells that don’t have the latest data, as that would increase latency
        and potentially lead to processing failures.</p>
      <p>Instead, we use deterministic routing to route transactions to the cell where the right data is already available. In
        a recent article, <a href="https://www.americanexpress.io/migrating-the-payments-network-twice/">Migrating the Payment Network Twice with Zero Downtime</a>,
        we introduced the Global Transaction Router, which is responsible for managing connectivity and routing transactions
        to the appropriate cell. It can do so because it understands just enough of the payment specifications to make routing
        decisions based on the transaction data.</p>
      <p>For example, we may route transactions based on partner, market, or payment type; how we route depends on the payment
        transaction data and the use case, but the key is that we selectively route transactions to where they are needed when
        there is a need for strong data consistency across transactions.</p>
      <p><img src="../_post_assets/cell-based-architecture-for-resilient-payment-systems/img/img3.jpg" alt="Dynamic data routing and replication" class="center-block" /></p>
      <p><em>In this diagram: The router sends a transaction to the cell that already has the authoritative dynamic state, while
          replication continues asynchronously outside the critical path.</em></p>
      <p>We keep transaction processing localized by restricting microservice communication to pod-to-pod interactions within
        the cell’s Kubernetes network, ensuring all processing remains within the cell’s boundaries.</p>
      <p>To ensure failover data is synchronized across cells using message-based replication, that replication happens
        asynchronously outside the transaction path, so it doesn’t impact latency or availability.</p>
      <p>No in-flight transaction waits for replication to complete; if the latest state is required, the Global Transaction
        Router sends the transaction to the cell where that data is already authoritative or available.</p>
      <p>We only allow our microservices to talk to localized database instances. This keeps latency predictable and avoids
        unnecessary network hops, but it requires deliberate routing decisions.</p>
      <p>By introducing deterministic routing at the edge, we can ensure that transactions are routed to the cell where the
        right data is already available.</p>
      <h2 id="enforced-boundaries-for-ingress-and-egress">Enforced Boundaries for Ingress and Egress</h2>
      <p>Along with its routing capabilities, the Global Transaction Router also serves as a key enforcer of our “local only” processing.</p>
      <p>Transactions must enter a cell through the Global Transaction Router; if a cell cannot process a transaction and
        that transaction needs to be rerouted to another cell, it must also go through the Global Transaction Router.</p>
      <p>In this way, the Global Transaction Router also serves as a payments mesh, connecting our cells globally.</p>
      <p><img src="../_post_assets/cell-based-architecture-for-resilient-payment-systems/img/img4.jpg" alt="Ingress and egress enforcement" class="center-block" /></p>
      <p><em>In this diagram: All cross-cell traffic is funneled through the Global Transaction Router, which preserves strict cell boundaries.</em></p>
      <p>Preventing cross-cell dependencies becomes increasingly difficult as platforms grow.</p>
      <p>By tightly controlling cross-cell communication through the Global Transaction Router, we prevent cells from forming
        strong dependencies on each other, as they do not have the ability to communicate at all—only the Global Transaction
        Router can communicate across cells.</p>
      <p>This enforcement occasionally results in duplicated services where shared implementations might otherwise seem
        simpler, but it preserves cell independence and improves latency by reducing cross-cell network hops.</p>
      <p>The same principle applies to observability. Each cell publishes logs, metrics, and traces to observability components
        localized within that cell first, so losing part of the observability stack only reduces visibility for that cell
        instead of the entire platform. We still aggregate observability data asynchronously to provide global dashboards,
        alerting, and fleet-wide analysis, but that aggregation remains outside the transaction’s critical path.</p>
      <h2 id="cells-break-in-isolation-other-cells-replace-them">Cells Break in Isolation; Other Cells Replace Them</h2>
      <p>Leveraging the ability to reroute transactions to other cells is a key part of our resiliency strategy.</p>
      <p>When failures occur, their impact stays contained within the affected cell, and transactions are automatically rerouted
        to a healthy cell where processing restarts.</p>
      <p>We reroute not only new incoming transactions but also transactions that were already in-flight in the failing cell.</p>
      <p>Our Payments Processing subsystem follows an orchestrated microservices architecture, where an orchestrator microservice
        manages the processing workflow and calls other microservices to perform specific tasks.</p>
      <p>If a downstream service begins to fail, the orchestrator detects the failure, halts processing, and sends the transaction
        back to the Global Transaction Router to be rerouted to another cell.</p>
      <p><img src="../_post_assets/cell-based-architecture-for-resilient-payment-systems/img/img5.jpg" alt="Reroute on failure" class="center-block" /></p>
      <p><em>In this diagram: When a cell fails mid-flow, the transaction is rerouted and restarted in a healthy cell rather than
          resumed across cells.</em></p>
      <p>We do not attempt to resume partially processed transactions across cells. Instead, we restart transaction processing
        in another cell with the original transaction data.</p>
      <p>This restart is only safe while the transaction is still within the core payments ecosystem. Once a transaction has been
        sent to an external system (e.g., card issuer), we consider that a point of no return, and we don’t allow transactions
        to be rerouted after that point.</p>
      <p>Card authorizations are structured so that the point of no return is toward the end of processing. If a transaction
        fails before the point of no return, we can safely reroute and restart processing without worrying about duplicate
        transactions or data consistency issues.</p>
      <p>For other payment types, we manage idempotency through transaction identifiers. Each transaction carries a unique
        transaction identifier that remains consistent across retries and reroutes. Downstream systems use these identifiers
        to detect and suppress duplicate requests, allowing retries and reroutes to be handled safely without introducing
        inconsistencies or duplicate transactions.</p>
      <p>The restart model emphasizes the importance of avoiding shared state between cells. Cross-cell shared state would
        introduce synchronization challenges and potential consistency issues, especially during failover
        scenarios. Communication failures between cells could impact the ability to process transactions globally, which
        we want to avoid at all costs for a payments system.</p>
      <p>In our architecture, cells are designed to be loosely coupled. Each cell has its own database clusters, and the
        microservices within a cell only communicate with the local database cluster.</p>
      <p>When a cell fails, its impact stays confined to that cell, allowing other cells to continue processing transactions normally.</p>
      <p>When rerouted, transactions are processed without reliance on state from the previous cell.</p>
      <p>At any point in time, a cell can be taken out of rotation. When a cell is taken out of rotation either automatically or
        manually, another cell takes its place. This does not have to be a binary cutover. As discussed in
        <a href="https://www.americanexpress.io/migrating-the-payments-network-twice/">Migrating the Payment Network Twice with Zero Downtime</a>,
        the Global Transaction Router can shift traffic between cells by percentage, allowing us to gradually drain a cell
        for maintenance, validate a recovering cell under partial load, or respond more safely during incidents.</p>
      <h2 id="minimal-dependencies-at-the-edge">Minimal Dependencies at the Edge</h2>
      <p>With the Global Transaction Router at the edge, it’s a critical service providing connectivity, routing, and
        resiliency. To ensure its availability, we aim to keep dependencies within this system as small as possible.</p>
      <p>The closer to the edge, the fewer dependencies we aim for.</p>
      <p>But we don’t just reduce the dependencies; we also aim to keep them out of the critical path.</p>
      <p>If our logging infrastructure becomes unavailable, we don’t want that to impact the ability to process
        transactions. We do this by using an asynchronous logger configured with a buffer truncation policy, so if the
        buffer is full, we drop logs instead of blocking transaction processing.</p>
      <p>If our configuration service becomes unavailable, we want to continue running with the last known configuration. For
        this, we maintain an in-memory configuration that is updated asynchronously, so if the configuration service becomes
        unavailable, we can continue running with the last known configuration until it becomes available again and we can
        pull the latest configuration.</p>
      <p><img src="../_post_assets/cell-based-architecture-for-resilient-payment-systems/img/img6.jpg" alt="Reducing dependencies at the edge" class="center-block" /></p>
      <p><em>In this diagram: The edge path stays thin and resilient by handling logging and configuration asynchronously instead
          of letting those dependencies block transactions.</em></p>
      <p>Keeping dependencies out of the critical path reduces failure points. This requires deliberate trade-offs: accepting
        degraded non-critical functionality (logging, metrics) to preserve transaction processing.</p>
      <h2 id="summary">Summary</h2>
      <p>In distributed payments systems, resiliency isn’t achieved through monitoring and retries alone—it’s achieved by
        defining clear failure boundaries and enforcing them through design.</p>
      <p>By organizing our core payments ecosystem into isolated, independently recoverable cells, we transform major failures
        into controlled routing decisions. Locality, deterministic routing, idempotent processing, and strict boundary
        enforcement work together to ensure growth and change don’t increase risk.</p>
      <p>This discipline underpins our cell-based architecture, enabling us to operate a global payments platform with low
        latency and high resiliency—principles that continue shaping our evolution.</p>
      ]]></content><author><name></name></author><category term="architecture" /><category term="payments" /><category term="resiliency" /><summary type="html"><![CDATA[How American Express uses cell-based architecture to deliver resilient, low-latency payments at global scale.]]></summary></entry><entry><title type="html">Reimagining Software Delivery with AI</title><link href="https://americanexpress.io/reimagining-software-delivery-with-ai/" rel="alternate" type="text/html" title="Reimagining Software Delivery with AI" /><published>2026-05-20T00:00:00-04:00</published><updated>2026-05-20T00:00:00-04:00</updated><id>https://americanexpress.io/reimagining-software-delivery-with-ai</id><content type="html" xml:base="https://americanexpress.io/reimagining-software-delivery-with-ai/"><![CDATA[<p>With the rapid rise of AI agents, we’ve entered a new phase of technological acceleration. Every week introduces new
      models, new capabilities, and new benchmarks with one agent claiming deeper reasoning, another promising greater
      autonomy, and so on. Each new release expands context windows, multimodal inputs, or tool integration. The
      landscape is evolving at breakneck speed.</p>
    <p>As engineers, technical project managers, product owners, quality engineers, and leaders, it’s natural to ask which
      of these tools truly matter—and how they can be applied to improve the way we build and deliver software.</p>
    <p>The truth is that the race to adopt the ‘best’ model will never end. Technology has always evolved this way. The real
      question isn’t which model tops a benchmark, but rather: how do we use AI to improve the way we deliver value?</p>
    <p>As we sought to answer this question, we discovered something unexpected.</p>
    <h2 id="the-opportunity">The Opportunity</h2>
    <p>Our objective seemed straightforward: improve the product delivery lifecycle by leveraging emerging AI capabilities.</p>
    <p>Initially, we approached the challenge like many organizations do — evaluate tools, pilot agents, integrate those
      agents into development workflows, and measure productivity gains. But we quickly realized that improving delivery
      wasn’t primarily a tooling problem.</p>
    <p>The software lifecycle spans ideation, requirement definition, design, implementation, testing, deployment, and
      feedback. It involves product, architecture, engineering, QA, and leadership — each with different artifacts,
      incentives, and feedback loops.</p>
    <p>AI could not simply be inserted into one step to magically transform outcomes. More importantly, we realized
      this transformation could not be owned by engineering alone. The opportunity extended across the
      lifecycle — from product teams shaping intent in the earliest phases, to delivery teams executing against
      validated requirements, to QA organizations continuously strengthening quality and release confidence.</p>
    <p>If we wanted meaningful impact, we had to rethink the entire lifecycle — from early ideation and product
      definition through engineering delivery, testing, and production release.</p>
    <h2 id="introducing-ideation-to-implementation">Introducing Ideation to Implementation</h2>
    <p>Instead of treating AI as a coding assistant bolted onto implementation, we shifted our perspective and
      reframed our optimum AI solution as a strategic co-creation partner embedded across the lifecycle. In this
      model, AI does not replace expertise; it amplifies it. Here were some of the principles we were looking for
      in this new approach:</p>
    <ul>
      <li>It should help leaders clarify intent earlier.</li>
      <li>It should enable product teams to test and refine concepts before committing engineering capacity.</li>
      <li>It should strengthen traceability between business objectives and technical execution.</li>
      <li>It should accelerate feedback loops and reduce ambiguity before code is written.</li>
    </ul>
    <p>The goal was to leverage AI to enhance alignment, accelerate value realization, and consistently turn ideas
      into outcomes with greater precision and confidence. Rather than chasing model releases, we redesigned how we use
      AI to turn ideas into real customer impact.</p>
    <h2 id="a-recharged-software-development-life-cycle">A Recharged Software Development Life Cycle</h2>
    <p>Inspired by traditional SDLC and Agile principles, we designed a recharged lifecycle enhanced with AI across
      four integrated phases:</p>
    <p>Envision + Define → Verify + Specify → Build + Integrate → Test + Release</p>
    <p>Each phase produces measurable outputs, but the power lies in how they connect.</p>
    <h2 id="1-envision--define">1. Envision + Define</h2>
    <p>Goal: Transform an ambiguous idea into structured, prioritized capabilities ready for engineering.</p>
    <p>Upstream ambiguity is the largest drag on velocity and the root cause of downstream rework. When intent is
      unclear, everything slows: sprint planning, estimation, testing, and integration.</p>
    <p>AI can help bring structure to early-stage thinking. In this phase, product and business stakeholders remain
      central, with AI helping accelerate discovery, alignment, prioritization, and readiness before implementation begins.</p>
    <p>Market signals, research notes, and competitive inputs can be synthesized quickly and business capabilities
      can be mapped to measurable OKRs. Additionally, features can be decomposed into user stories with acceptance
      criteria, risks, and dependencies.</p>
    <p>Instead of starting engineering with loosely formed epics, teams generate:</p>
    <ul>
      <li>Planning-tool-ready feature sets</li>
      <li>Clear Gherkin-based acceptance criteria</li>
      <li>High-level architecture and integration diagrams</li>
      <li>Dependency maps and risk registers</li>
      <li>Preliminary estimations and readiness assessments</li>
    </ul>
    <p>AI accelerates artifact creation, while humans validate feasibility and tradeoffs. The goal is not more
      documentation, but rather, clearer intent. When done well, this phase reduces ambiguity, improves sprint
      predictability, and shortens time-to-value.</p>
    <h2 id="2-verify--specify">2. Verify + Specify</h2>
    <p>Goal: Convert validated features into implementation-ready specifications with measurable completeness.</p>
    <p>If “Envision” reduces ambiguity, “Verify” eliminates hidden risk. Most delivery failures don’t originate in
      code. They originate in incomplete or misaligned specifications — hidden edge cases, undocumented assumptions,
      missing non-functional requirements, or late-discovered integration constraints.</p>
    <p>In this phase, AI acts as a systematic reviewer. User stories are rigorously evaluated against structured quality
      criteria, with acceptance criteria strengthened, dependencies validated, and data flows and observability
      requirements clearly defined. In parallel, architecture alignment is assessed early, diagrams are reviewed
      to close logical gaps, and integration risks are surfaced—so scaling and performance considerations are already
      modeled before sprint one begins.</p>
    <p>Rather than relying solely on human review cycles, AI can serve as a second-pass auditor, identifying patterns
      of ambiguity across large backlogs.</p>
    <p>This can help shift readiness from assumption to validation and support informed estimation. Historical velocity
      patterns and complexity comparisons inform sprint shaping and capacity modeling. The result is AI that does not replace
      team judgment but helps augment it with pattern recognition.</p>
    <p>Before moving forward, readiness should be measurable:</p>
    <ul>
      <li>Complete acceptance criteria</li>
      <li>Validated architecture alignment</li>
      <li>Documented dependencies and mitigation strategies</li>
      <li>Defined test strategy</li>
      <li>Clear linkage to KPIs</li>
    </ul>
    <p>The cultural shift is subtle but powerful: engineers may begin to focus less on interpreting requirements and
      more on executing validated intent, with effects that can compound over time.</p>
    <h2 id="3-build--integrate">3. Build + Integrate</h2>
    <p>Goal: Translate validated intent into production-grade software with speed and discipline.</p>
    <p>By the time we reach this stage, ambiguity should be minimal. Requirements are hardened, architecture is
      aligned, and dependencies are mapped.</p>
    <p>Here, AI functions as a force multiplier, not an autopilot.</p>
    <p>It can support structured code scaffolding aligned to user stories, automated unit test generation, inline
      documentation, and contract validation.  When implemented correctly, boilerplate shrinks, cognitive load
      decreases, and engineers can focus on design integrity and problem solving instead of repetition.</p>
    <p>Continuous alignment with architecture becomes critical here. Architectural drift — small deviations accumulating
      over time — is one of the most expensive long-term risks in software systems.</p>
    <p>AI-assisted analysis can detect pattern deviations, inconsistent data models, unused interfaces, or emerging
      security and performance anti-patterns before they spread.</p>
    <p>Integration, often the true bottleneck in delivery, shifts left. Early contract validation, mock generation,
      and scenario simulation can reduce late-stage surprises. Breaking changes may be identified sooner, and stabilization
      cycles may shorten.</p>
    <p>CI/CD pipelines continue to evolve as well. AI can support the contextual analysis of build failures, help identify
      flaky tests, and assist in surfacing coverage gaps and quality trends. The pipeline can become more than a
      deployment mechanism — it can become an intelligent feedback engine.</p>
    <p>When this phase operates on validated specifications, the impact is tangible:</p>
    <ul>
      <li>Shorter development cycles</li>
      <li>Reduced rework</li>
      <li>Fewer integration failures</li>
      <li>Slower technical debt accumulation</li>
      <li>Increased delivery confidence</li>
      <li>Engineering time shifts from correction to creation.</li>
      <li>That shift is where competitive advantage lives.</li>
    </ul>
    <h2 id="4-test--release">4. Test + Release</h2>
    <p>Goal: Deliver with confidence, not hope.</p>
    <p>Testing in a recharged lifecycle is continuous.  Because specifications are hardened early, AI can
      generate meaningful test cases directly from acceptance criteria, surfacing edge cases sooner, optimizing
      regression suites instead of bloating them, and making traceability between requirements and coverage
      explicit. This can lead to smarter testing as QA teams are not simply downstream validators in this
      model. With AI-assisted traceability, risk analysis, and regression optimization, quality organizations
      become more active participants in continuously improving delivery confidence across the lifecycle.</p>
    <p>Quality continues to shift from reactive detection to proactive insight. AI helps analyze defect patterns,
      build histories, performance regressions, and runtime signals to highlight risk areas before incidents occur.</p>
    <p>As a result, release decisions become more evidence-informed rather than deadline-driven. Readiness can be
      evaluated through measurable indicators: defect trends, coverage completeness, performance stability, dependency
      health, and rollback preparedness.</p>
    <p>After deployment, the loop should continue, with production serving as an ongoing source of learning. Usage patterns,
      feature adoption, and friction signals can be analyzed and fed back into the next “Envision” phase, supporting a
      more adaptive lifecycle. When Test + Release operates as an intelligent feedback system, organizations can experience
      fewer escaped defects, faster recovery, and greater stakeholder trust.</p>
    <p>Over time, this can contribute to making reliability a stronger strategic differentiator and helping build trust.</p>
    <h2 id="our-findings-from-tools-to-mindset">Our findings: from Tools to Mindset</h2>
    <p>After applying this model, one insight became clear: the biggest challenge is not engineering complexity – It’s
      mindset. When implemented intentionally, AI can sharpen thinking, expose gaps earlier, strengthen alignment, and
      accelerate learning. But it does not replace judgment, accountability, or ownership. Those remain deeply human responsibilities.</p>
    <p>What emerges is a new way of working:</p>
    <ul>
      <li>Intent clarified before execution.</li>
      <li>Specifications validated before implementation.</li>
      <li>Integration treated as continuous.</li>
      <li>Quality made predictive.</li>
      <li>Feedback fueling the next idea.</li>
    </ul>
    <p>AI is not redefining software delivery because it writes code faster, but because it changes when and how clarity
      is achieved. When ambiguity is reduced earlier and feedback is embedded across the lifecycle, the entire system, rather
      than individual steps, accelerates. Teams that do this well can move faster, reduce rework, and deliver with
      greater confidence—not by working harder, but by operating with sharper clarity from the start. This is what it
      means to move from ideation to implementation with precision.</p>
    ]]></content><author><name></name></author><category term="ai" /><category term="software-delivery" /><category term="spec-driven" /><summary type="html"><![CDATA[AI is reshaping the lifecycle by reducing ambiguity early, strengthening alignment across teams, and accelerating the journey from ideation to implementation.]]></summary></entry><entry><title type="html">Trust Without Disclosure: Why Zero-Knowledge Proofs Could Help Build Trust in AI Agents</title><link href="https://americanexpress.io/trust-without-disclosure-why-zero-knowledge-proofs-could-help-build-trust-in-ai-agents/" rel="alternate" type="text/html" title="Trust Without Disclosure: Why Zero-Knowledge Proofs Could Help Build Trust in AI Agents" /><published>2026-05-06T00:00:00-04:00</published><updated>2026-05-06T00:00:00-04:00</updated><id>https://americanexpress.io/trust-without-disclosure-why-zero-knowledge-proofs-could-help-build-trust-in-ai-agents</id><content type="html" xml:base="https://americanexpress.io/trust-without-disclosure-why-zero-knowledge-proofs-could-help-build-trust-in-ai-agents/"><![CDATA[<p>We’re moving from systems that respond to our questions to AI agents
    that act on our behalf. In this new era, AI agents can help book travel,
    manage tasks, and coordinate across systems, with less human
    intervention at each step.</p>
  <p>This creates a practical problem: How do we trust these agents? How do
    we verify what they are allowed to do, or what they have done, without
    exposing sensitive information?</p>
  <p>Enter zero-knowledge proofs—a cryptographic technique that lets you
    prove you know something without revealing what you know. It sounds like
    a magic trick, and in many ways, it is. But unlike magic, the
    mathematics behind it are provably sound.</p>
  <h2 id="the-agentic-ai-trust-problem">The Agentic AI Trust Problem</h2>
  <p>Consider what happens if your AI assistant negotiates a deal with a
    vendor’s AI agent. Your agent needs to prove it has authorization to
    spend within a certain threshold, but revealing your exact budget gives
    the vendor leverage. The vendor’s agent needs to verify the customer’s
    agent isn’t bluffing but doesn’t necessarily need to know their
    financial details.</p>
  <p>Traditional approaches fail here. Revealing everything destroys
    negotiating leverage. Revealing nothing undermines trust. We need
    something in between: proof without disclosure.</p>
  <p>This isn’t hypothetical. As organizations explore AI agents for more
    sensitive workflows across healthcare systems, financial platforms, and
    enterprise infrastructure, the question of <em>how agents prove things to
      each other</em> has become urgent.</p>
  <h2 id="proving-without-showing-the-zk-paradigm">Proving Without Showing: The ZK Paradigm</h2>
  <p>The classic illustration involves a cave with two paths, A and B, that
    meet at a magic door in the back. Peggy claims she knows the password to
    open the door. Victor wants proof, but Peggy refuses to reveal the
    password itself.</p>
  <p>The protocol: Peggy enters the cave and randomly chooses a path. Victor,
    who can’t see which path she took, calls out which path he wants her to
    exit from. If Peggy knows the password, she can always comply—she either
    exits from the path she entered or uses the door to cross to the other
    side.</p>
  <p>Each successful round cuts Victor’s doubt in half. After 20 rounds,
    there’s less than a one-in-a-million chance Peggy is faking it. Victor
    becomes statistically convinced Peggy knows the password—without ever
    learning what it is.</p>
  <p>This captures the three essential properties of zero-knowledge proofs.</p>
  <ul>
    <li>
      <p><strong>Completeness:</strong> if Peggy truly knows the password, she can always
        convince Victor.</p>
    </li>
    <li>
      <p><strong>Soundness:</strong> if Peggy doesn’t know the password, she can’t
        consistently fool Victor.</p>
    </li>
    <li>
      <p><strong>Zero-Knowledge:</strong> Victor learns nothing beyond the fact that Peggy
        knows the password.</p>
    </li>
  </ul>
  <h2 id="from-theory-to-agentic-reality">From Theory to Agentic Reality</h2>
  <p>The cave example is interactive—it requires back-and-forth
    communication. Many modern ZK systems have evolved to support
    <em>non-interactive proofs,</em> where a prover generates a single proof that
    anyone can verify without further communication. This is essential for
    agentic AI, where agents may need to prove credentials asynchronously
    across different systems.</p>
  <p>Three main approaches have emerged, each with distinct trade-offs:</p>
  <h4 id="zk-snarks-compact-but-trust-dependent">zk-SNARKs: Compact but Trust-Dependent</h4>
  <p>Succinct Non-Interactive Arguments of Knowledge produce remarkably small
    proofs—around 200 bytes regardless of what’s being proven. Verification
    is fast, making them ideal for resource-constrained environments. The
    catch: they require a <em>trusted setup</em> ceremony. If this setup is
    compromised, fake proofs become possible.</p>
  <blockquote>
    <p>The trusted setup challenge: SNARKs require a one-time ceremony where
      multiple parties jointly generate cryptographic parameters. The setup
      remains secure as long as a single participant acts honestly and
      destroys their contribution—only if every participant colluded to
      combine their secret inputs could proofs be forged. This 1-of-N
      security model is why ceremonies involve many independent
      participants, but the coordination required is operationally complex
      for agentic systems that need rapid, dynamic deployment. Newer
      “universal setup” approaches (like Plonk) reduce this burden but don’t
      eliminate it entirely.</p>
  </blockquote>
  <h4 id="zk-starks-transparent-and-post-quantum-friendly">zk-STARKs: Transparent and Post-Quantum Friendly</h4>
  <p>Scalable Transparent Arguments of Knowledge eliminate the trusted setup
    entirely. Everything needed to verify proofs is publicly derivable.
    They’re also built on hash functions rather than elliptic curves, making
    them more resistant to quantum computing attacks. The trade-off: larger
    proofs (tens to hundreds of kilobytes instead of a few hundred bytes)
    and more computational overhead, which can increase verification time
    and on-chain costs.</p>
  <h4 id="bulletproofs-efficient-and-setup-free">Bulletproofs: Efficient and Setup-Free</h4>
  <p>Bulletproofs require no trusted setup and are particularly well-suited
    for proving that a value falls within a certain range—without revealing
    the value itself. Proof size grows slowly relative to what’s being
    proven, keeping them practical even in constrained environments.</p>
  <h2 id="performance-reality-async-over-real-time">Performance Reality: Async Over Real-Time</h2>
  <p>Proof generation takes time—seconds to minutes depending on circuit
    complexity. Today, this often makes ZK proofs better suited for
    asynchronous workflows: pre-flight credential checks, batch audit
    generation, or background compliance verification. An agent negotiating
    a contract can generate proofs between message exchanges; an agent
    executing millisecond trades cannot. Hardware acceleration is closing
    this gap but hasn’t eliminated it.</p>
  <p><img src="../_post_assets/trust-without-disclosure-why-zero-knowledge-proofs-could-help-build-trust-in-ai-agents/img/zero-knowledge.jpg" alt="The Zero-Knlowedge Frontier" /></p>
  <h2 id="where-zk-proofs-meet-agentic-ai">Where ZK Proofs Meet Agentic AI</h2>
  <p>The intersection of zero-knowledge proofs and agentic AI opens
    possibilities that neither technology enables alone:</p>
  <h4 id="agent-to-agent-authentication">Agent-to-Agent Authentication</h4>
  <blockquote>
    <p>When AI agents interact, they need to verify each other’s capabilities
      and authorizations. An agent could prove it’s authorized to access
      certain data, that its operation falls within specified parameters, or
      that it meets the requirements set by the receiving system—all without
      revealing the underlying credentials or system architecture.</p>
  </blockquote>
  <h4 id="verifiable-agent-reasoning">Verifiable Agent Reasoning</h4>
  <blockquote>
    <p>One of the challenges with AI agents is understanding <em>why</em> they made
      certain decisions. ZK proofs could allow an agent to prove its
      reasoning followed certain rules or constraints without exposing its
      full reasoning chain, protecting proprietary models while enabling
      accountability.</p>
  </blockquote>
  <h4 id="privacy-preserving-collaboration">Privacy-Preserving Collaboration</h4>
  <blockquote>
    <p>Multiple AI agents working together often need to share information
      selectively. A medical AI agent could prove that a patient meets
      defined eligibility criteria without revealing their complete medical
      history. A financial AI agent could prove that a transaction falls
      within approved limits without exposing full account details.</p>
  </blockquote>
  <h4 id="audit-without-surveillance">Audit Without Surveillance</h4>
  <blockquote>
    <p>Regulators and compliance systems need to verify AI agents operate
      within bounds, but constant surveillance creates privacy and
      competitive concerns. ZK proofs enable agents to generate audit trails
      that support compliance audits without exposing operational details.</p>
  </blockquote>
  <h2 id="real-world-adoption-dids-vcs-and-beyond">Real-World Adoption: DIDs, VCs, and Beyond</h2>
  <p>Some Verifiable Credentials (VCs) and Decentralized Identifiers (DIDs)
    already leverage ZK proofs in production environments. Standardized
    credential frameworks and digital identity wallet initiatives are
    enabling selective disclosure—proving “I’m over 18” or “I hold
    certification X” without exposing full identity documents. Agentic
    commerce frameworks are now exploring VCs as the trust substrate for
    agent-to-agent transactions.</p>
  <p>On an emerging frontier, ZK circuits are being developed that allow
    model creators to prove their training data was used under selected
    licensing or data-governance requirements—without revealing the dataset
    itself. As regulators increase scrutiny of AI training practices, this
    capability becomes a potential differentiator.</p>
  <h2 id="current-limitations">Current Limitations</h2>
  <p>Honest assessment is essential. Several constraints limit immediate
    deployment:</p>
  <ol>
    <li>
      <p><strong>Tooling fragmentation:</strong> Proofs generated in one system (Circom)
        may not readily verify in another (Noir) without translation.
        Portability across agentic platforms—where Agent A’s proof must
        verify on Agent B’s stack—remains immature.</p>
    </li>
    <li>
      <p><strong>Blockchain dependency:</strong> Many of the most mature ZK
        implementations emerged from blockchain infrastructure (Ethereum
        L2s, Zcash, Mina). Enterprise tooling outside crypto is maturing but
        early-stage.</p>
    </li>
    <li>
      <p><strong>Computational overhead:</strong> Proof generation remains
        resource-intensive. Better suited for high-value, asynchronous
        verification than real-time decision loops.</p>
    </li>
    <li>
      <p><strong>Standards gap:</strong> There is not yet a broadly adopted standard for
        ZK-based trust in AI agent interactions. W3C’s DID and Verifiable
        Credentials specs provide the most mature foundation—already
        referenced by governments (EU eIDAS 2.0) and enterprises. The
        Decentralized Identity Foundation (DIF) and Internet Identity
        Workshop (IIW) are convening efforts, but agent-to-agent trust
        protocols remain undefined.</p>
    </li>
  </ol>
  <h2 id="looking-forward">Looking Forward</h2>
  <p>The trust infrastructure for AI agents is still catching up to their
    capabilities. Zero-knowledge proofs represent one promising
    direction—offering a mechanism to establish verifiable trust without
    requiring full disclosure of underlying data.</p>
  <p>Early convergence is visible. ZK-based identity frameworks are being
    explored as a way for agents to assert credentials selectively.
    Verifiable computation approaches could allow an agent to demonstrate
    what code it ran and on what inputs—shifting the basis of trust from
    assertion to proof. Standards work is beginning to examine how these
    tools might support compliant AI operations across different regulatory
    contexts.</p>
  <p>Whether and how quickly these approaches are adopted remains an open
    question. But the underlying cryptographic primitives are well-studied,
    and the problems they address are real.</p>
  <p>The cave example showed how to prove you know a password without
    revealing it. The agentic AI era presents opportunity to scale that
    principle to everything agents do: proving authorization, proving
    compliance, proving correctness—all without disclosure.</p>
  <p>What began as an elegant mathematical curiosity in 1985 may become part
    of the trust infrastructure for a world where autonomous agents act on
    our behalf. An idea that once seemed like magic may prove increasingly
    practical as autonomous systems take on more responsibility.</p>
  ]]></content><author><name>Pratyaksh Gupta</name></author><category term="agentic" /><category term="privacy" /><category term="cryptography" /><summary type="html"><![CDATA[As AI agents begin operating across systems with greater autonomy, ZK proofs may help specific claims or permissions without exposing sensitive data.]]></summary></entry><entry><title type="html">Building Trust in AI-Powered Transactions with Amex Agentic Commerce Experiences (ACE) Developer Kit</title><link href="https://americanexpress.io/building-trust-in-ai-powered-transactions-with-amex-agentic-commerce-experiences/" rel="alternate" type="text/html" title="Building Trust in AI-Powered Transactions with Amex Agentic Commerce Experiences (ACE) Developer Kit" /><published>2026-04-22T00:00:00-04:00</published><updated>2026-04-22T00:00:00-04:00</updated><id>https://americanexpress.io/building-trust-in-ai-powered-transactions-with-amex-agentic-commerce-experiences</id><content type="html" xml:base="https://americanexpress.io/building-trust-in-ai-powered-transactions-with-amex-agentic-commerce-experiences/"><![CDATA[<h2 id="introduction-from-user-driven-to-agent-driven-commerce">Introduction: From User-Driven to Agent-Driven Commerce</h2>
  <p>What if purchases could be made on a Card Member’s behalf by an agent that understands what they need, when they need
    it, and how to execute the transaction?</p>
  <p>In this scenario, the Card Member doesn’t explicitly add items to a cart or tap “Buy Now.” Instead, an agent
    could recommend options and complete a purchase using an American Express account, based on permissions defined
    by the Card Member.</p>
  <p>At American Express, we’re building for this shift with the Agentic Commerce Experiences (ACE) Developer Kit.</p>
  <p>This shift challenges a core assumption of how payments work today.</p>
  <p>Traditional systems are built around user actions where a person browses, decides, and executes each step of a
    transaction. As agents begin to act on behalf of users, that interaction model may no longer hold. In a more
    autonomous agentic AI system, instead of repeatedly translating intent into action, the user defines intent
    once, and the agent continuously evaluates context, makes decisions, and executes when conditions are met. This
    is the new model the Amex Agentic Commerce Experiences (ACE) Developer Kit is designed to enable—bringing
    intent-driven, agent-powered transactions onto the American Express network with trust and control.</p>
  <h2 id="the-core-challenge-enabling-delegation">The Core Challenge: Enabling Delegation</h2>
  <p>This evolution introduced a complex problem: how do we help ensure that any agent taking delegated actions
    is explicitly authorized, controlled, and accountable?</p>
  <p>How can an agent acting on a customer’s behalf prove it has the required authority to make purchases? How can
    guardrails be set for the agent’s actions? Who is accountable if the agent makes a mistake? These questions require
    the payment ecosystem to adapt for agent-powered transactions.</p>
  <h2 id="what-it-takes-core-capabilities">What it Takes: Core Capabilities</h2>
  <p>As we explored these challenges in building the ACE Developer Kit, it became clear that enabling AI-powered payments
    required a new set of capabilities.</p>
  <ol>
    <li>
      <p><em>Establishing Trust Through Identity and Enrollment</em></p>
      <p>Trust begins with a clear, explicit setup: the Card Member provides their payment instrument for agents registered
        with Amex, completes issuer authentication, and defines controls on how that instrument can be used. This
        interaction model is supported by capabilities such as agent registration and account enablement, which help
        create a verifiable relationship between the Card Member, the agent, and the payment instrument.</p>
    </li>
    <li>
      <p><em>Representing Intent as Enforceable Boundaries</em></p>
      <p>At the time of purchase, the agent may play a role in both discovery and execution—recommending what to buy,
        selecting from relevant merchants, and completing the transaction. These decisions are guided by natural language
        instructions that express the Card Member’s intent, which ultimately helps determine what is purchased, how much
        can be spent, and with which merchants.</p>
      <p>With that in mind, we built an interaction model where intent captures qualifiers such as:</p>
      <ul>
        <li>What should be purchased</li>
        <li>Where it could be purchased from</li>
        <li>How much can be spent</li>
        <li>When additional approvals are required</li>
      </ul>
      <p>Intent helps define the boundaries within which the agent can operate, and the ACE Developer Kit helps make
        those boundaries enforceable in practice.</p>
    </li>
    <li>
      <p><em>Enforcing Boundaries at Execution</em></p>
      <p>Delegation often involves enforced boundaries, including spending limits, merchant restrictions, frequency
        controls, and conditional execution. These are applied at execution, helping to ensure agents operate within
        defined boundaries.</p>
    </li>
    <li>
      <p><em>Securing Execution Across the Transaction Flow</em></p>
      <p>At the payment authorization stage, payment credentials are designed to be used in a limited and controlled
        manner; actions are tied to verified intent, while execution is auditable and traceable.</p>
      <p>The ACE Kit is designed to support this through:</p>
      <ul>
        <li>Scoped payment credentials</li>
        <li>Short-lived authorization artifacts</li>
        <li>Strong verification mechanisms</li>
      </ul>
      <p>In this way, the agent can complete a payment within a defined framework.</p>
    </li>
  </ol>
  <h2 id="what-this-means-for-developers">What This Means for Developers</h2>
  <p>Instead of building complex payment and risk infrastructure from scratch, developers can integrate a framework that helps manage identity, intent, and execution in a consistent way.
    As the platform evolves, additional tools will further simplify integration and help accelerate development.</p>
  <h2 id="how-this-works-in-practice">How This Works in Practice</h2>
  <p>When these capabilities come together, an adapted model of payments emerges. In the ACE Developer Kit, this
    interaction model is implemented through a sequence of coordinated steps.</p>
  <p>Intent is first captured and stored as a structured contract. At the time of payment credentials
    issuance, Amex generates a scoped credential tied to that intent.</p>
  <p>Before a payment credential is generated, the ACE Kit is designed to validate:</p>
  <ul>
    <li>That the intent is still valid</li>
    <li>That constraints are satisfied</li>
  </ul>
  <h2 id="partner-integration-supporting-existing-commerce-flows">Partner Integration: Supporting Existing Commerce Flows</h2>
  <p>The ACE Developer Kit is designed to integrate with partner environments—including AI agent providers, platforms, and merchants—without requiring fundamental changes
    to their existing payment infrastructure. Partners interact with the Kit through a combination of synchronous
    APIs and asynchronous notifications. API-driven interactions allow partners to initiate enrollment, create
    intent, and request payment credentials, while event-driven notifications provide updates on state transitions
    such as authentication completion or transaction outcomes. This dual model allows partners to operate in both
    request-driven and event-driven environments, depending on their architecture.</p>
  <p>In a typical flow, the partner initiates a request to create or update an intent. Amex evaluates the request,
    applies risk controls, and determines whether criteria are met to permit execution. Scoped payment credentials
    are then issued and used by the merchant to process the transaction through existing payment rails.</p>
  <p><em>Security Protocols and Design Patterns</em></p>
  <p>Security is a foundational element of this architecture and is implemented across all stages of the lifecycle.</p>
  <p>All API interactions are authenticated using OAuth and mutual TLS, ensuring that each request is associated with
    a verified partner or agent identity. This allows access to be scoped, monitored, and revoked as needed.</p>
  <p>Sensitive payloads are protected using industry-standard encryption mechanisms, helping ensure account details
    and credentials remain protected in transit.</p>
  <p>The platform follows a zero-trust model with respect to client input. No request is accepted based solely on
    client-provided data. Instead, each step is validated using signed artifacts that bind to a specific
    intent. These artifacts are short-lived and include protections against replay, ensuring that requests cannot be
    reused or forged. This combination of authentication, encryption, tokenization, and verification allows ACE to
    enforce strong security guarantees without introducing unnecessary friction for developers.</p>
  <p><em>Scaling Across Ecosystems</em></p>
  <p>As adoption grows, the ACE Developer Kit is designed to scale and be interoperable with different standards
    and models.</p>
  <p>By decoupling intent and credential issuance, and keeping the transaction processing the same for merchants, ACE
    avoids dependency across components, allowing each to scale independently. This approach enables partners to
    onboard incrementally, extend their use cases over time, and operate at scale without requiring changes to the
    core model.</p>
  <p><em>Developer Experience and Integration Simplicity</em></p>
  <p>While the ACE Developer Kit enforces complex validation and security controls, the integration model is
    intentionally designed to remain straightforward. Developers interact with a small set of well-defined
    capabilities: enrollment, intent creation, credential retrieval, and lifecycle management, which map cleanly
    to existing application flows. This reduces the need for custom orchestration logic and allows developers to focus
    on building agentic experiences.</p>
  <p>To simplify integration, in the future, the platform will provide supporting tools such as SDKs, Agent Skills, an
    MCP server, and reference implementations. These abstractions encapsulate common patterns, allowing developers to
    focus on building user experiences rather than managing low-level details. Structured error responses and
    consistent request and response patterns help ensure that integrations are predictable and easier to debug.</p>
  <p>This balance between strong controls and simple integration is critical for enabling adoption at scale.</p>
  <h2 id="what-we-learned-key-principles">What We Learned: Key Principles</h2>
  <p>As we worked through these challenges, a few principles became clear.</p>
  <ul>
    <li>Delegation should always be explicit and verifiable, rather than implied.</li>
    <li>Intent should capture the Card Member’s goal in a way that can be enforced.</li>
    <li>Payment credentials should be scoped to the intent.</li>
  </ul>
  <p>These principles shaped how we approached the problem.</p>
  <h2 id="conclusion">Conclusion</h2>
  <p>The ACE Developer Kit brings a new payments model to life, providing developers with the capabilities
    to enable secure delegation, enforceable intent, and controlled execution on the American Express network.</p>
  <p>Developers can explore the APIs, integration patterns, and supporting tools on the American Express developer
    portal: https://developer.americanexpress.com</p>
  ]]></content><author><name></name></author><category term="agentic-ai" /><category term="agentic-commerce" /><category term="partner-integrations" /><summary type="html"><![CDATA[A model for secure, intent-driven transactions executed by agents on behalf of Card Members.]]></summary></entry><entry><title type="html">Optimizing Istio for Large-Scale Enterprise Applications</title><link href="https://americanexpress.io/optimizing-istio-for-large-scale-enterprise-applications/" rel="alternate" type="text/html" title="Optimizing Istio for Large-Scale Enterprise Applications" /><published>2026-03-30T00:00:00-04:00</published><updated>2026-03-30T00:00:00-04:00</updated><id>https://americanexpress.io/optimizing-istio-for-large-scale-enterprise-applications</id><content type="html" xml:base="https://americanexpress.io/optimizing-istio-for-large-scale-enterprise-applications/"><![CDATA[<h2 id="overview">Overview</h2>
  <p>In today’s rapidly evolving cloud-native application landscape, adopting service meshes has become vital for effectively managing the complexities inherent in microservices architectures. Among the leading solutions, Istio stands out by offering a comprehensive suite of features, including traffic management, security, and observability.</p>
  <p>If a large enterprise is expanding its use of Istio, performance optimization should sit front and center in the overall implementation strategy. Below, I’ll delve into proven strategies for enhancing Istio’s performance in large enterprises.</p>
  <h2 id="sidecar-resource-usage-and-sizing">Sidecar resource usage and sizing</h2>
  <p>Managing Istio sidecar resources can pose significant challenges that often go unnoticed, potentially leading to application issues if not addressed properly. The default resource requests for the sidecar are 128Mi for memory and 100m for CPU, with limits set at 1Gi for memory and 2 cores for CPU.</p>
  <h3 id="when-to-update-cpu-requests">When to Update CPU Requests</h3>
  <ul>
    <li>The Envoy proxy consumes approximately 0.5 vCPU per 1000 requests per second. CPU requests should be increased when transactions per second (TPS) are high, and the Istio sidecar approaches its configured CPU limits.</li>
  </ul>
  <h3 id="when-to-update-memory-requests">When to Update Memory Requests</h3>
  <ul>
    <li>The Envoy proxy uses 50 MB of memory per 1,000 requests per second going through the proxy. When numerous entries—such as egress or import and export resources—are added to the namespace, the sidecar will require additional memory to manage these configurations effectively.</li>
  </ul>
  <p>The sample deployment configuration below provides a guide for modifying resource requests and limits. It’s crucial to specify limits for both CPU and memory; omitting these will result in limits being set to unlimited, which could lead to resource contention and instability.</p>
  <div class="language-yaml highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code><span class="na">spec</span><span class="pi">:</span>
  <span class="na">replicas</span><span class="pi">:</span> <span class="m">1</span>
  <span class="na">selector</span><span class="pi">:</span>
    <span class="na">matchLabels</span><span class="pi">:</span>
      <span class="na">app</span><span class="pi">:</span> <span class="s">myapp</span>
  <span class="na">template</span><span class="pi">:</span>
    <span class="na">metadata</span><span class="pi">:</span>
      <span class="na">labels</span><span class="pi">:</span>
        <span class="na">app</span><span class="pi">:</span> <span class="s">myapp</span>
      <span class="na">annotations</span><span class="pi">:</span>
        <span class="na">sidecar.istio.io/proxyMemoryLimit</span><span class="pi">:</span> <span class="s">3Gi</span>
        <span class="na">sidecar.istio.io/proxyCPULimit</span><span class="pi">:</span> <span class="s1">'</span><span class="s">3'</span>
        <span class="na">sidecar.istio.io/proxyCPU</span><span class="pi">:</span> <span class="s1">'</span><span class="s">1'</span>
        <span class="na">sidecar.istio.io/proxyMemory</span><span class="pi">:</span> <span class="s">2G</span>
</code></pre>
    </div>
  </div>
  <h2 id="when-to-use-l4-over-l7">When to use L4 over L7</h2>
  <p>Istio is capable of handling both Layer 7 (L7) and Layer 4 (L4) communications during pod-to-pod interactions, depending on the protocol specified for the destination Kubernetes service. If the <code class="language-plaintext highlighter-rouge">appProtocol</code> is set to ‘tcp’, Istio treats the connection to that service as an L4 connection; otherwise, it is classified as L7.</p>
  <p>In high-traffic scenarios where multiple hops are required between microservices before reaching the final response, each hop adds additional latency. If L7 controls are not required, disabling them can improve latency when using Istio.</p>
  <table>
    <thead>
      <tr>
        <th>Layer</th>
        <th>Use When</th>
        <th>Examples</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>L7</td>
        <td>Advanced traffic routing, observability, or security for HTTP/HTTPS traffic is needed.</td>
        <td>Canary deployments, API routing, fault injection.</td>
      </tr>
      <tr>
        <td>L4</td>
        <td>Protocol-agnostic traffic management or low-latency handling for non-HTTP protocols is needed.</td>
        <td>Database traffic, gRPC, streaming services.</td>
      </tr>
    </tbody>
  </table>
  <h3 id="l7-destination-appprotocol-as-http">L7 destination [appProtocol as ‘http’]:</h3>
  <div class="language-yaml highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code><span class="na">kind</span><span class="pi">:</span> <span class="s">Service</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">myapp</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">mynamespace</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">ipFamilies</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="s">IPv4</span>
  <span class="na">ports</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">http-8080</span>
      <span class="na">protocol</span><span class="pi">:</span> <span class="s">TCP</span>
      <span class="na">appProtocol</span><span class="pi">:</span> <span class="s">http</span>
      <span class="na">port</span><span class="pi">:</span> <span class="m">8080</span>
      <span class="na">targetPort</span><span class="pi">:</span> <span class="m">8080</span>
  <span class="na">internalTrafficPolicy</span><span class="pi">:</span> <span class="s">Cluster</span>
  <span class="na">type</span><span class="pi">:</span> <span class="s">ClusterIP</span>
  <span class="na">ipFamilyPolicy</span><span class="pi">:</span> <span class="s">SingleStack</span>
  <span class="na">sessionAffinity</span><span class="pi">:</span> <span class="s">None</span>
  <span class="na">selector</span><span class="pi">:</span>
    <span class="na">app</span><span class="pi">:</span> <span class="s">myapp</span>
</code></pre>
    </div>
  </div>
  <h3 id="l4-destination--appprotocol-as-tcp">L4 destination [ appProtocol as ‘tcp’]</h3>
  <div class="language-yaml highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code><span class="na">kind</span><span class="pi">:</span> <span class="s">Service</span>
<span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">myapp</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">mynamespace</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">ipFamilies</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="s">IPv4</span>
  <span class="na">ports</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">tcp-8080</span>
      <span class="na">protocol</span><span class="pi">:</span> <span class="s">TCP</span>
      <span class="na">appProtocol</span><span class="pi">:</span> <span class="s">tcp</span>
      <span class="na">port</span><span class="pi">:</span> <span class="m">8080</span>
      <span class="na">targetPort</span><span class="pi">:</span> <span class="m">8080</span>
  <span class="na">internalTrafficPolicy</span><span class="pi">:</span> <span class="s">Cluster</span>
  <span class="na">type</span><span class="pi">:</span> <span class="s">ClusterIP</span>
  <span class="na">ipFamilyPolicy</span><span class="pi">:</span> <span class="s">SingleStack</span>
  <span class="na">sessionAffinity</span><span class="pi">:</span> <span class="s">None</span>
  <span class="na">selector</span><span class="pi">:</span>
    <span class="na">app</span><span class="pi">:</span> <span class="s">myapp</span>
</code></pre>
    </div>
  </div>
  <h3 id="sample-test-results">Sample Test results:</h3>
  <p>In a complex multi-hop microservices architecture (approximately 15 hops), tests show that using L4 yields about 53% better response times compared to L7 for pod-to-pod communication. This highlights the importance of choosing the right traffic management layer to optimize performance in microservices architectures.</p>
  <h3 id="with-layer7">With Layer7</h3>
  <h3 id="tps-graph">TPS graph:</h3>
  <p><img src="../_post_assets/optimizing-istio-for-large-scale-enterprise-applications/img/img1.jpg" alt="img1.jpg" /></p>
  <h3 id="response-time-graph-p90-reaching-till-340ms-at-6ktps">Response time graph [P90 reaching till 340ms at 6kTPS]</h3>
  <p><img src="../_post_assets/optimizing-istio-for-large-scale-enterprise-applications/img/img2.jpg" alt="img2.jpg" /></p>
  <h3 id="with-layer4">With Layer4</h3>
  <h3 id="tps-graph-1">TPS graph</h3>
  <p><img src="../_post_assets/optimizing-istio-for-large-scale-enterprise-applications/img/img3.jpg" alt="img3.jpg" /></p>
  <h3 id="response-time-graph-p90-reaching-only-till-140ms">Response time graph [P90 reaching only till 140ms]</h3>
  <p><img src="../_post_assets/optimizing-istio-for-large-scale-enterprise-applications/img/img4.jpg" alt="img4.jpg" /></p>
  <h2 id="istio-retry-logic">Istio Retry logic</h2>
  <p>The default retry policy for the mesh includes connect-failure, refused-stream, unavailable, cancelled, and retriable-status-codes. It’s important to be cautious about retriable-status-codes, which, when combined with the configuration for http.StatusServiceUnavailable, means that Istio will, by default, retry any 503 error—even those intentionally returned by the service. For applications that may not work with default retries should consider updating retry logic as described in the link below:</p>
  <p>https://istio.io/latest/docs/reference/config/networking/virtual-service/#HTTPRetry</p>
  <h2 id="limiting-the-configuration-sprawl-that-needs-to-be-pushed-out">Limiting the configuration sprawl that needs to be pushed out</h2>
  <p>To optimize control plane performance effectively, the most straightforward approach is to minimize the scope and size of the proxy configurations deployed to the data plane. For example, consider a specific workload ‘myapp’. Instead of pushing configurations for all services within the mesh, it’s possible to significantly improve efficiency by only deploying the proxy configuration relevant to workload ‘myapp’ and its dependent services. Utilizing the Sidecar resource allows for precise control over which configurations are sent, ensuring that only necessary data is pushed to the data plane.</p>
  <div class="language-yaml highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">networking.istio.io/v1beta1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Sidecar</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">sidecar-myapp</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">600000392-colleague360</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">egress</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">hosts</span><span class="pi">:</span>
        <span class="pi">-</span> <span class="s1">'</span><span class="s">*/mysecondapp.mysecondappnamespace.svc.cluster.local'</span>
        <span class="pi">-</span> <span class="s1">'</span><span class="s">*/myexternalendpoint.com'</span>
  <span class="na">workloadSelector</span><span class="pi">:</span>
    <span class="na">labels</span><span class="pi">:</span>
      <span class="na">app</span><span class="pi">:</span> <span class="s">myapp</span>
</code></pre>
    </div>
  </div>
  <h2 id="smart-dns-proxy">Smart DNS proxy</h2>
  <p>Smart DNS Proxy is a feature in Istio that enhances DNS resolution for workloads within the service mesh. It allows Istio sidecars to intercept DNS queries and resolve them based on Istio’s service registry. There are a few things that need to be evaluated when setting up ServiceEntries for external access.</p>
  <p>https://istio.io/latest/docs/ops/configuration/traffic-management/dns/#proxy-dns-resolution</p>
  <ul>
    <li>Switch to <code class="language-plaintext highlighter-rouge">resolution: NONE</code> to avoid proxy DNS lookups entirely. This is suitable for many use cases.</li>
    <li>If the domains being resolved are controlled internally, increasing their TTL is recommended.</li>
    <li>If <code class="language-plaintext highlighter-rouge">ServiceEntry</code> is only needed by a few workloads, its scope can be limited with <code class="language-plaintext highlighter-rouge">exportTo</code> or a <a href="https://istio.io/latest/docs/reference/config/networking/sidecar/">Sidecar</a>.</li>
  </ul>
  <h2 id="reduce-stampede-of-dns-requests-to-upstream">Reduce stampede of DNS requests to upstream</h2>
  <p>Unlike most clients, which will handle DNS requests on demand before caching the results, the Istio proxy never conducts synchronous DNS requests. When a resolution: DNS type <code class="language-plaintext highlighter-rouge">ServiceEntry</code> is configured, the proxy will periodically resolve the configured hostnames and use those for all requests. This interval is fixed at 30 seconds and cannot be changed, this occurs even if the proxy never sends any requests to these applications and regardless of TTL values returned by the DNS server. This can create issues in large clusters with multiple service entries and DNS queries upstream.</p>
  <h3 id="the-problem-synchronized-30-second-dns-refreshes">The problem: synchronized 30-second DNS refreshes</h3>
  <p>Because the DNS refresh interval is fixed and identical across all proxies, large Istio meshes can experience highly synchronized DNS lookups. When hundreds or thousands of Envoy sidecars refresh DNS at the same 30-second boundary, a classic thundering herd effect ensues, leading to problems such as:</p>
  <ul>
    <li>Burst spikes in DNS queries every 30 seconds</li>
    <li>Increased load on CoreDNS / kube-dns or external DNS providers</li>
    <li>DNS latency spikes or rate limiting from upstream DNS servers</li>
    <li>Increased control-plane pressure during mass restarts or rollouts</li>
  </ul>
  <h3 id="this-behavior-becomes-especially-problematic-when">This behavior becomes especially problematic when:</h3>
  <ul>
    <li>During events like rolling restarts, deployments, or config pushes:
      <ul>
        <li>Many proxies restart and reinitialize envoy clusters simultaneously</li>
        <li>DNS resolution is triggered immediately during envoy cluster warming</li>
        <li>This stacks on top of periodic refreshes, compounding DNS pressure</li>
      </ul>
    </li>
    <li>Each Envoy sidecar independently maintains its own DNS cache and schedules periodic asynchronous resolution using a timer-driven event loop. However, since the refresh interval is deterministic and starts at roughly the same time (e.g., proxy startup or cluster warming), thousands of sidecars can align their DNS queries on the same boundary.</li>
    <li>Envoy’s DNS refresh behavior is interval-driven and does not strictly honor upstream TTLs in all cases. When TTLs are low (or effectively overridden by dns_refresh_rate), queries are issued more frequently than necessary.</li>
  </ul>
  <h3 id="fix-via-pilot_dns_jitter_duration">Fix via PILOT_DNS_JITTER_DURATION</h3>
  <p>Thankfully, a solution exists. PILOT_DNS_JITTER_DURATION is an Istio configuration that introduces randomized jitter to DNS to refresh scheduling across proxies.</p>
  <p>Instead of all Envoy sidecars refreshing DNS exactly every 30 seconds at the same moment, Istio spreads those refreshes across a configurable time window. Each proxy still refreshes DNS on the same fixed interval, but <strong>the refreshes are intentionally de-synchronized</strong>.</p>
  <p>This means:</p>
  <ul>
    <li>The 30-second DNS refresh interval remains unchanged</li>
    <li>Refresh timing is staggered across proxies</li>
    <li>DNS query traffic is evenly distributed over time</li>
  </ul>
  <h3 id="resulting-benefits">Resulting benefits</h3>
  <ul>
    <li>Eliminates DNS query bursts caused by synchronized refreshes</li>
    <li>Reduces load and rate-limit risk on DNS infrastructure</li>
    <li>Improves DNS latency stability and P99 behavior</li>
    <li>Makes large Istio meshes more resilient during restarts and scaling events</li>
  </ul>
  <h3 id="when-to-use-it">When to use it</h3>
  <p>PILOT_DNS_JITTER_DURATION is strongly recommended for:</p>
  <ul>
    <li>Large Istio deployments with many sidecars</li>
    <li>Heavy use of ServiceEntry with resolution: DNS</li>
    <li>Environments sensitive to DNS performance or quotas</li>
  </ul>
  <h2 id="logging-optimization">Logging Optimization</h2>
  <p>Istio can produce a significant volume of logs when default logging is enabled at the cluster level. This excessive logging can result in performance degradation, increased storage costs, and challenges in log analysis. To optimize logging practices, it is recommended to enable error logging by default while allowing application teams to manage logging settings for their respective applications. The steps below outline how to implement this best practice:</p>
  <ul>
    <li><em>Add <a href="https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#MeshConfig-ExtensionProvider">MeshConfig.ExtensionProvider.EnvoyFileAccessLogProvider</a> at cluster level to enable cluster wide logging</em></li>
  </ul>
  <div class="language-yaml highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code><span class="na">extensionProviders</span><span class="pi">:</span>
   <span class="pi">-</span> <span class="na">envoyFileAccessLog</span><span class="pi">:</span>
       <span class="na">path</span><span class="pi">:</span> <span class="s">/dev/stdout</span>
     <span class="na">name</span><span class="pi">:</span> <span class="s">envoy-access-logs</span>
</code></pre>
    </div>
  </div>
  <ul>
    <li><em>Now create telemetry object cluster wide to only show error logs, below filter can be updated based on usage requirements:</em></li>
  </ul>
  <div class="language-yaml highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">telemetry.istio.io/v1alpha1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Telemetry</span>
<span class="na">metadata</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s">default-exception-logging</span>
  <span class="na">namespace</span><span class="pi">:</span> <span class="s">istio-system</span>
<span class="na">spec</span><span class="pi">:</span>
  <span class="na">accessLogging</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">providers</span><span class="pi">:</span> 
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">envoy-access-logs</span>
      <span class="na">filter</span><span class="pi">:</span>
        <span class="na">expression</span><span class="pi">:</span> <span class="s2">"</span><span class="s">response.code</span><span class="nv"> </span><span class="s">&gt;=</span><span class="nv"> </span><span class="s">400</span><span class="nv"> </span><span class="s">||</span><span class="nv"> </span><span class="s">xds.cluster_name</span><span class="nv"> </span><span class="s">==</span><span class="nv"> </span><span class="s">'BlackHoleCluster'</span><span class="nv"> </span><span class="s">||</span><span class="nv"> </span><span class="s">xds.cluster_name</span><span class="nv"> </span><span class="s">==</span><span class="nv"> </span><span class="s">'PassthroughCluster'"</span>
</code></pre>
    </div>
  </div>
  <ul>
    <li><em>For production setups, it is recommended to set up info access logging for Istio gateways as well for Istio gateways, enable that with below telemetry object:</em></li>
  </ul>
  <div class="language-yaml highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">telemetry.istio.io/v1alpha1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Telemetry</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">disable-providers-envoy-access-logs</span>
<span class="na">namespace</span><span class="pi">:</span> <span class="s">istio-gateways</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">accessLogging</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">providers</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">envoy-access-logs</span>
</code></pre>
    </div>
  </div>
  <ul>
    <li><em>Application teams will now only see the minimum required error logs as mentioned in the filter at istio-system namespace level and if needed on demand can enable logging for their workloads via below Telemetry object:</em></li>
  </ul>
  <div class="language-yaml highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">telemetry.istio.io/v1alpha1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Telemetry</span>
<span class="na">metadata</span><span class="pi">:</span>
    <span class="na">name</span><span class="pi">:</span> <span class="s">myapp-telemetry</span>
    <span class="na">namespace</span><span class="pi">:</span> <span class="s">mynamespace</span>
<span class="na">spec</span><span class="pi">:</span>
    <span class="na">accessLogging</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="na">providers</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">envoy-access-logs</span>
    <span class="na">selector</span><span class="pi">:</span>
      <span class="na">matchLabels</span><span class="pi">:</span>
        <span class="na">app</span><span class="pi">:</span> <span class="s">myapp</span>
</code></pre>
    </div>
  </div>
  <h2 id="metrics-optimization">Metrics Optimization</h2>
  <p>Istio offers a wide range of additional metrics that can be easily enabled or disabled, as outlined below. However, leveraging these metrics comes with trade-offs in resource consumption and system complexity. Therefore, it is advisable to enable only the necessary metrics in production environments, while maintaining the flexibility to toggle metrics on or off in development and testing environments. Here’s an overview of the potential impacts:</p>
  <h3 id="enabling-additional-metrics">Enabling additional metrics:</h3>
  <div class="language-yaml highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code><span class="na">annotations</span><span class="pi">:</span>
          <span class="na">proxyStatsMatcher</span><span class="pi">:</span>    <span class="c1">## this part is adding additional metrics</span>
            <span class="na">inclusionRegexps</span><span class="pi">:</span>
              <span class="pi">-</span> <span class="s2">"</span><span class="s">.*upstream_rq_.*"</span>
              <span class="pi">-</span> <span class="s2">"</span><span class="s">.*upstream_cx_.*"</span>
              <span class="pi">-</span> <span class="s2">"</span><span class="s">.*downstream_rq_.*"</span>
              <span class="pi">-</span> <span class="s2">"</span><span class="s">.*downstream_cx_.*"</span>
</code></pre>
    </div>
  </div>
  <h3 id="increased-resource-usage">Increased Resource Usage:</h3>
  <ul>
    <li>Collecting and exporting additional metrics increases CPU and memory usage for the Envoy sidecar proxies.</li>
    <li>The Prometheus server may also consume more resources to scrape, store, and query the expanded dataset.</li>
  </ul>
  <h2 id="higher-network-overhead">Higher Network Overhead:</h2>
  <ul>
    <li>Exporting metrics from sidecars to telemetry systems generates additional network traffic.</li>
    <li>This can impact overall cluster performance, especially in high-traffic environments.</li>
  </ul>
  <h2 id="storage-requirements">Storage Requirements:</h2>
  <ul>
    <li>More metrics mean larger storage requirements for time-series databases like Prometheus.</li>
    <li>Long-term retention policies may need adjustment to accommodate the increased data volume.</li>
  </ul>
  <h2 id="conclusion">Conclusion</h2>
  <p>Running Istio at enterprise scale requires deliberate trade-offs rather than enabling every feature by default. As environments grow, unmanaged observability, retries, sidecars, and configuration sprawl can introduce significant performance and operational overhead.</p>
  <p>A more thoughtful approach goes a long way: focus on high-signal telemetry, right-sizing sidecars, lean on L4 over L7 when possible, and be selective about when to use retries. Keeping configuration scope controlled is equally important to ensure the control plane remains scalable and predictable.</p>
  <p>Advanced patterns like Smart DNS proxies can be useful in targeted scenarios, but they introduce additional DNS load and should be adopted cautiously with clear justification and monitoring.</p>
  <p>Ultimately, successful Istio operations depend on continuous tuning—measuring impact, refining configurations, and evolving alongside workload and traffic changes.</p>
  <p><em>*Note: Istio is an open-source technology.</em></p>
  ]]></content><author><name>Gurpreet Singh</name></author><category term="kubernetes" /><category term="istio" /><category term="service-mesh" /><summary type="html"><![CDATA[Improving performance and scalability in complex microservices environments.]]></summary></entry><entry><title type="html">Migrating the Payments Network Twice with Zero Downtime</title><link href="https://americanexpress.io/migrating-the-payments-network-twice/" rel="alternate" type="text/html" title="Migrating the Payments Network Twice with Zero Downtime" /><published>2026-03-12T00:00:00-04:00</published><updated>2026-03-12T00:00:00-04:00</updated><id>https://americanexpress.io/migrating-the-payments-network-twice</id><content type="html" xml:base="https://americanexpress.io/migrating-the-payments-network-twice/"><![CDATA[<p>If you tuned in to Monster Scale Summit this year, you may have seen our talk on migrating the American Express Payments Network - not once, but twice — with zero customer-impacting downtime — meaning no transactions were interrupted and no planned maintenance windows were required during either migration.
    The session focused on how we moved live payments traffic reliably under strict operational constraints.
    If you missed it, the talk is available to watch on the <a href="https://www.scylladb.com/monster-scale-summit/">Monster Scale Summit website</a>.</p>
  <p>This article expands on the conference talk and dives deeper into the engineering decisions, tradeoffs, and lessons learned across both migrations.</p>
  <h2 id="context-the-payments-network">Context: The Payments Network</h2>
  <p>The payments network is a mission-critical distributed system responsible for processing critical payments traffic, including live card authorization.
    It serves as the bridge between American Express merchants, acquirers, and issuers globally.</p>
  <p>This platform must be continuously available, operate at low latency, and handle large volumes of critical traffic.</p>
  <h2 id="migration-constraints">Migration Constraints</h2>
  <p>In 2018, American Express began a multi-year modernization of our payments network, including migrating from a legacy platform to a new microservices-based architecture.</p>
  <p>A migration of this scale had to operate within several non-negotiable constraints:</p>
  <ul>
    <li>The migration had to be performed online, with no planned or unplanned downtime.</li>
    <li>The new platform had to reimplement existing payment processing logic; regressions in functionality were not acceptable.</li>
    <li>Latency, throughput, and resiliency characteristics had to remain consistent, and in some cases improve.</li>
    <li>Payment requests could not be dropped, delayed, or left unanswered.</li>
  </ul>
  <p>Not only did we need to migrate under these constraints once - we needed to do it twice.</p>
  <h2 id="migration-1-from-the-legacy-payments-network-to-the-new-platform">Migration #1: From the Legacy Payments Network to the New Platform</h2>
  <p>The first migration involved transitioning live card authorization traffic from the legacy payments network to a new, modernized platform.</p>
  <p>While the payments network is large and complex, real-time card authorization traffic is primarily handled by two subsystems: a routing layer (which we’ll refer to as the “Global Transaction Router” or “GTR”, for simplicity) and the payments processing platform.</p>
  <p><img src="../_post_assets/migrating-the-payments-network-twice/img/pn-rt-arch.jpg" alt="High-level Real-Time Payments Network Architecture" class="center-block-migrating-payments" /></p>
  <p>Understanding these two subsystems is key to understanding how we approached the migration.</p>
  <h3 id="global-transaction-router-gtr">Global Transaction Router (GTR)</h3>
  <p>The GTR acts as the gateway into the payments network.
    Unlike typical backend platforms, card authorization traffic is primarily sent over long-lived TCP connections carrying ISO8583 messages, a message format specific to payments.</p>
  <p>The GTR manages these long-lived connections from acquirers and issuers and routes incoming transactions to the payments processing platform.
    It is also responsible for routing responses from the payments processing platform to network participants.</p>
  <p>The router intentionally implements a minimal understanding of payment protocols - just enough to make routing decisions.
    Its primary role is to make routing, failover, and traffic-shaping decisions without owning payment processing logic.</p>
  <p>Acting as the gateway, the GTR also provides centralized traffic control and resiliency for the payments network.
    It sits at the edge of the payments network and is highly specialized, optimized for low latency and high throughput.</p>
  <h3 id="payments-processing-platform">Payments Processing Platform</h3>
  <p>The payments processing platform is where the complex, business-critical payment processing logic lives.</p>
  <p>This platform is implemented as a microservices-based architecture, consisting of numerous services and databases.
    As transactions flow through the payments network, the payments processing platform validates, enriches, and transforms them.</p>
  <p>This logic has been developed and refined over many years.
    Rebuilding this logic was a significant undertaking, and ensuring parity with the legacy system was critical.</p>
  <h3 id="migration-strategy">Migration Strategy</h3>
  <p>Rebuilding the full payments network from scratch was a significant, multi-year effort.
    It involves complex processing logic, extensive edge cases, and exception handling.
    Waiting for full platform completion before migrating live traffic was not an option.
    Building new functionality would require building in both the legacy and new systems, leading to duplicated effort and increased risk of functionality drift.</p>
  <p>Instead, we broke the migration into three stages:</p>
  <ul>
    <li>Stage 1: Connection Migration</li>
    <li>Stage 2: Shadow Traffic</li>
    <li>Stage 3: Canary Routing</li>
  </ul>
  <h4 id="stage-1-connection-migration">Stage 1: Connection Migration</h4>
  <p>In the first stage, we wanted to introduce the GTR into the flow of transactions.
    This was the most critical stage of the migration - it enabled every other stage and was the first time a new component was inserted into the live traffic path.</p>
  <p><img src="../_post_assets/migrating-the-payments-network-twice/img/pn-connection-migration.jpg" alt="Connection Migration: GTR in the Flow" class="center-block-migrating-payments-2" /></p>
  <p>When new connections landed on the GTR, it routed all traffic to the legacy payments network.
    This allowed us to introduce the GTR without requiring processing logic parity.</p>
  <p>For each incoming connection, the GTR established a corresponding connection to the legacy payments network.
    Any transaction received on the incoming connection was forwarded to the legacy payments network over the downstream connection.
    No logic, no message parsing, just simple forwarding.</p>
  <p>This approach allowed us to insert centralized traffic control and resiliency into the payments network with minimal risk.
    To reduce risk further, we migrated connections in small batches, monitoring system health and performance closely.
    Observability and metrics from the GTR were critical during this stage.</p>
  <h4 id="stage-2-shadow-traffic">Stage 2: Shadow Traffic</h4>
  <p>With the GTR in place, we were able to introduce shadow traffic to the new payments processing platform.</p>
  <p>Shadow traffic is, at its core, a replay of live production traffic.
    We deployed a dedicated production instance of the new payments processing platform and replayed a copy of live traffic to it.</p>
  <p><img src="../_post_assets/migrating-the-payments-network-twice/img/pn-shadow-traffic.jpg" alt="Shadow Traffic Validation" class="center-block-migrating-payments-2" /></p>
  <p>If there were any functional discrepancies between the legacy and new payments processing platform, they would show up here.</p>
  <p>This shadow traffic capability allowed us to validate payment processing logic in a production-like environment without impacting live traffic.
    It did not replace traditional unit and functional testing, but rather it provided a final validation step before routing live traffic to the new platform.</p>
  <h4 id="stage-3-canary-routing">Stage 3: Canary Routing</h4>
  <p>With processing logic validated via shadow traffic and the GTR in place, we were ready to route live traffic to the new payments processing platform.</p>
  <p>We applied canary deployment principles to the platform migration.
    We extended the GTR with just enough understanding of payment protocols to make routing decisions based on transaction attributes.</p>
  <p><img src="../_post_assets/migrating-the-payments-network-twice/img/pn-canary-routing.jpg" alt="Canary Routing in Action" class="center-block-migrating-payments-2" /></p>
  <p>This allowed us to take small percentages of live traffic and route them to the new payments processing platform.
    As functionality was ready, we identified customer segments and transaction types that could be routed to the new platform.</p>
  <p>The GTR took care of routing these transactions to the appropriate backend platform based on the canary configurations.
    All canary decisions were enforced centrally by the GTR, before transactions reached the payments processing platform.
    This canary routing capability was implemented as custom logic within the GTR to support this migration and has since become a critical component of the Payments Network architecture.</p>
  <p>We started with 1%; when everything looked good, we increased to 5%, then 10%, and so on.</p>
  <p>If anomalies were detected, we immediately reverted all routing back to the legacy payments network.
    This gradual approach allowed us to migrate live traffic with minimal risk.
    We avoided any big-bang cutovers or customer impacts.</p>
  <p>In addition to reducing risk, this approach reduced duplicated development effort.
    It allowed the platform to evolve with real traffic without needing to maintain two separate codebases for an extended period.</p>
  <h2 id="migration-2-kubernetes-infrastructure-migration">Migration #2: Kubernetes Infrastructure Migration</h2>
  <p>After the new payments processing platform was operational, we faced a second major migration that reused the same traffic control patterns established during the platform migration.
    We needed to move from a legacy Kubernetes infrastructure to a new Kubernetes environment.</p>
  <p><img src="../_post_assets/migrating-the-payments-network-twice/img/pn-k8s-migration.jpg" alt="Brand New Platform" class="center-block-migrating-payments-2" /></p>
  <p>Due to significant differences in networking, security, and operational practices between the two environments, an in-place migration was not feasible.
    This required a full rebuild of the payments network infrastructure in the new Kubernetes environment.</p>
  <p>This meant we needed to migrate live traffic again - with zero downtime.
    Latency, throughput, and resiliency characteristics had to remain consistent as well.</p>
  <h3 id="environment-setup-and-validation">Environment Setup and Validation</h3>
  <p>The first step in this migration was establishing the new Kubernetes environment in a repeatable and consistent manner.
    We leveraged infrastructure-as-code to ensure consistency and repeatability across test and production environments.</p>
  <p><img src="../_post_assets/migrating-the-payments-network-twice/img/pn-iac.jpg" alt="IaC Everything" class="center-block-migrating-payments-2" /></p>
  <p>Existing pod and service configurations were exported from our existing production environment.
    They were redefined as declarative infrastructure-as-code configurations.</p>
  <p>This approach ensured consistency across regions and environments.
    It took time to get right, but once we had a solid foundation, we could spin up new environments quickly, both for the initial migration and future expansions.
    Any new infrastructure changes now start with infrastructure-as-code definitions.</p>
  <h4 id="performance-and-resiliency-testing">Performance and Resiliency Testing</h4>
  <p>With the new environment established, we validated that it could meet our performance and resiliency requirements.
    We first established a performance baseline in our existing environment.
    We then deployed the same application versions into the new environment and ran load tests to compare performance characteristics.
    The new environment exhibited differences that required tuning.</p>
  <p>We implemented those tuning changes via infrastructure-as-code and rolled them out to all environments.</p>
  <p>Resiliency testing followed a similar approach.
    We ran various failure scenarios in the existing environment, documented the results, and then ran the same scenarios in the new environment.
    Any discrepancies were investigated and resolved via infrastructure-as-code changes.</p>
  <p>Before moving any traffic, we ensured the new environment met or exceeded all performance and resiliency requirements.</p>
  <h3 id="canary--again">Canary — Again</h3>
  <p>With the new environment validated, we were ready to migrate live traffic again - with zero downtime.</p>
  <p><img src="../_post_assets/migrating-the-payments-network-twice/img/pn-canary-again.jpg" alt="More Canary" class="center-block-migrating-payments-3" /></p>
  <p>We reused the same canary routing strategy from the first migration.
    This time, we were routing traffic between two identical payments processing platforms.
    External ISO8583 connectivity continued to terminate at the edge; canary routing was applied only to internal gRPC Remote Procedure Calls (gRPC) traffic between the GTR and the payments processing platform.</p>
  <p>As we built the GTR, we implemented canary deployments leveraging Envoy Proxy and a custom control plane.
    While our initial implementation was focused on routing between different versions within the same region, we extended this capability to route between different regions.</p>
  <p>We called this multi-region canary routing.
    This allowed us to route all traffic from one region to another.
    With traffic re-routed, it freed us to enable the new Kubernetes environment in the original region.</p>
  <p>Once ready, we routed percentages of traffic back to the original region, now running the new Kubernetes environment.
    We gradually increased traffic back to the original region, monitoring system health and performance closely.</p>
  <p>Observability was as critical to this step as the canary routing itself.
    Our business metrics, application logs, and application health metrics all gave us visibility into how the new environment was performing under live traffic.
    If issues were detected, we could quickly revert all traffic back to the secondary region.</p>
  <h2 id="lessons-learned">Lessons Learned</h2>
  <p>Both migrations were significant undertakings, and we learned a lot along the way.</p>
  <h3 id="traffic-control-was-essential">Traffic Control was Essential</h3>
  <p>The GTR and Envoy Proxy-based canary routing were essential components of both migrations.
    They provided the traffic control needed to safely route live traffic between different platforms and environments.</p>
  <p>These capabilities were initially developed as glue code, but over time evolved into critical components of our payments network architecture.</p>
  <h3 id="rolling-back-is-a-first-class-capability">Rolling Back is a First-Class Capability</h3>
  <p>In both migrations, the ability to quickly and safely roll back changes was essential.
    Designing systems and processes with rollback in mind reduced risk and allowed us to respond quickly to any issues that arose.</p>
  <h3 id="invest-in-observability">Invest in Observability</h3>
  <p>Observability was critical to the success of both migrations.
    Having deep visibility into system health, performance, and business metrics allowed us to make informed decisions during the migrations.</p>
  <h3 id="shadow-traffic-is-invaluable">Shadow Traffic is Invaluable</h3>
  <p>The shadow traffic capability provided a final validation step before routing live traffic to the new payments processing platform.
    This capability was essential in identifying any unknown discrepancies between the legacy and new systems.</p>
  <p>We’ve since leveraged this capability for ongoing testing and validation of new features and changes.
    We are also using this capability to validate other downstream systems migrations.</p>
  <h3 id="infrastructure-as-code-is-non-negotiable">Infrastructure-as-Code is Non-Negotiable</h3>
  <p>Leveraging infrastructure-as-code for the Kubernetes migration ensured consistency and repeatability.
    It allowed us to manage complex infrastructure changes with confidence, and it set the foundation for future expansions.</p>
  <h3 id="the-most-important-lesson">The Most Important Lesson</h3>
  <p>The most important lesson was patience and discipline.
    In payments, success is measured in reliability, even if it takes longer to get there.</p>
  ]]></content><author><name></name></author><category term="payments" /><category term="platform-engineering" /><category term="reliability" /><summary type="html"><![CDATA[The architecture and coordination that kept global transactions flowing through complex application and infrastructure changes.]]></summary></entry><entry><title type="html">When Human Feedback Is Scarce, How Do You Evaluate AI?</title><link href="https://americanexpress.io/when-human-feedback-is-scarce-how-do-you-evaluate-ai/" rel="alternate" type="text/html" title="When Human Feedback Is Scarce, How Do You Evaluate AI?" /><published>2026-03-02T00:00:00-05:00</published><updated>2026-03-02T00:00:00-05:00</updated><id>https://americanexpress.io/when-human-feedback-is-scarce-how-do-you-evaluate-ai</id><content type="html" xml:base="https://americanexpress.io/when-human-feedback-is-scarce-how-do-you-evaluate-ai/"><![CDATA[<p>Evaluating AI systems is easy… until it isn’t.</p>
  <p>For many user-facing applications like travel planning, clinical note
    drafting, or conversational agents, there is no single “right answer.”
    The most reliable signal of quality is human feedback: ratings,
    preferences, or real-world behavior. That’s because quality is
    subjective, contextual, and often best judged by people. As a result,
    human feedback such as ratings, preferences, and real-world behavior is
    the most reliable signal we have.</p>
  <p>But in early-stage systems and research prototypes, that signal is often
    too sparse, too expensive, or too slow to guide development. This
    creates a gap between how AI systems are evaluated in research settings
    and how they need to be evaluated in real-world deployment.</p>
  <p>This challenge sits at the heart of a new ICLR paper, <a href="https://arxiv.org/abs/2512.17267"><strong><em>AutoMetrics:
          Approximate Human Judgments with Automatically Generated
          Evaluators</em></strong></a>, authored by
    researchers from American Express and Stanford HAI.</p>
  <p>The work introduces a practical open-source framework for transforming small amounts
    of human feedback into scalable and interpretable evaluation metrics,
    helping teams move from prototypes to production with greater
    confidence.</p>
  <h2 id="from-expensive-human-labels-to-practical-metrics">From Expensive Human Labels to Practical Metrics</h2>
  <p>Today, when evaluating AI systems, there is often a trade-off between two
    imperfect options:</p>
  <ul>
    <li>
      <p><strong>Human evaluation:</strong> Accurate but costly and slow.</p>
    </li>
    <li>
      <p><strong>LLM-as-a-Judge</strong>: Fast and inexpensive but can be brittle and often
        poorly aligned with what users actually care about.</p>
    </li>
  </ul>
  <p>AutoMetrics offers a third path.</p>
  <p>The key idea is simple but powerful: instead of relying on a single
    evaluator, AutoMetrics learns a weighted combination of evaluation
    metrics that best matches human judgment, using fewer than 100 feedback
    points.</p>
  <p><img src="../_post_assets/when-human-feedback-is-scarce-how-do-you-evaluate-ai/img/image1.jpg" alt="Evaluate AI" /></p>
  <p>The framework operates in four steps:</p>
  <ol>
    <li>
      <p><strong>Generate candidate metrics</strong></p>
      <p>AutoMetrics automatically creates task-specific evaluation criteria
        (e.g., clarity, usefulness, policy compliance) using LLMs.</p>
    </li>
    <li>
      <p><strong>Retrieve existing metrics from MetricBank</strong></p>
      <p>The system draws from <strong>MetricBank</strong>, a curated library of 48
        well-documented evaluation metrics spanning tasks like
        summarization, dialogue, code generation, and safety.</p>
    </li>
    <li>
      <p><strong>Learn how to combine them</strong></p>
      <p>Using lightweight regression, AutoMetrics identifies which metrics
        matter most and how they should be weighted to best predict human
        feedback.</p>
    </li>
    <li>
      <p><strong>Report interpretable evaluators</strong></p>
      <p>The output isn’t just a score—it’s a breakdown of <em>why</em> a system is
        performing well or poorly.</p>
    </li>
  </ol>
  <p>The result is an evaluation signal that is data-efficient, adaptive, and
    explainable.</p>
  <h2 id="stronger-alignment-with-human-judgment">Stronger Alignment with Human Judgment</h2>
  <p>The paper analyzes AutoMetrics across five diverse tasks, including
    dialogue, product description generation, code completion, and travel
    planning. Across these settings, AutoMetrics consistently outperforms
    strong baselines.</p>
  <p><img src="../_post_assets/when-human-feedback-is-scarce-how-do-you-evaluate-ai/img/image2.jpg" alt="Evaluate AI 2" /></p>
  <p>It improves correlation with human ratings by up to 33.4% compared to
    standard LLM-as-a-Judge approaches.</p>
  <ul>
    <li>
      <p>Performance saturates with approximately 80 human feedback examples,
        making it practical even for low-data settings.</p>
    </li>
    <li>
      <p>The learned metrics remain stable under irrelevant changes and
        sensitive to real quality degradations, a key requirement for
        trustworthy evaluation.</p>
    </li>
  </ul>
  <p>In other words, AutoMetrics doesn’t assign scores, it behaves like a
    reliable proxy for how people actually judge quality.</p>
  <h2 id="evaluation-that-can-drive-optimization">Evaluation That Can Drive Optimization</h2>
  <p>One of the most compelling findings goes beyond measurement. The authors
    show that AutoMetrics can be used as a proxy reward signal to optimize
    an AI agent, matching or even exceeding the performance of systems
    trained with a fully verifiable reward. In a realistic
    airline-assistance benchmark, agents optimized with AutoMetrics improved
    at the same rate as those trained with explicit ground-truth rewards.</p>
  <p>This opens the door to human-aligned optimization in domains where
    rewards are subjective, ambiguous, or hard to formalize.</p>
  <h2 id="why-this-matters">Why This Matters</h2>
  <p>For practitioners building real-world AI systems, AutoMetrics points to
    a future where:</p>
  <ul>
    <li>
      <p>Evaluation adapts as products evolve</p>
    </li>
    <li>
      <p>Small amounts of user feedback go much further</p>
    </li>
    <li>
      <p>Metrics are understandable enough to guide iteration, not just
        leaderboard scores</p>
    </li>
  </ul>
  <p>By releasing AutoMetrics and MetricBank as open-source tools, the
    authors aim to make adaptive, human-aligned evaluation a standard part
    of the AI development workflow.</p>
  <h2 id="looking-ahead-evaluation-that-keeps-pace-with-ai">Looking Ahead: Evaluation That Keeps Pace with AI</h2>
  <p>As AI systems move faster from prototype to production, evaluation can
    no longer be an afterthought or a bottleneck. AutoMetrics shows that
    it’s possible to ground evaluation in human judgment without requiring
    massive labeling efforts, and to do so in a way that remains explainable
    adaptive, and actionable.</p>
  <p>The broader implication is clear: evaluation itself must become a
    learning system. By discovering what users actually value and
    translating that signal into scalable metrics, AutoMetrics reframes
    evaluation as a first-class component of AI development, rather than a
    scorecard at the end. These metrics can then be used to optimize AI
    Agent configurations.</p>
  <p>For teams building AI in open-ended, user-facing domains, this work
    points toward a future where small amounts of real feedback can drive
    rapid iteration, reliable optimization, and more human-aligned systems
    from day one.</p>
  <p>As the community continues to explore adaptive evaluation, AutoMetrics
    provides both a practical toolkit and a compelling blueprint for how we
    measure progress in AI—when the only true reference is human judgment.</p>
  <p><strong>Read the full ICLR paper:</strong><br />
    <a href="https://arxiv.org/abs/2512.17267"><em>AutoMetrics: Approximate Human Judgments with Automatically Generated
        Evaluators</em></a></p>
  <p><em>Note: This research is part of an industrial affiliate program.</em></p>
  ]]></content><author><name></name></author><category term="ai" /><category term="research" /><category term="evaluations" /><summary type="html"><![CDATA[AutoMetrics turns limited human feedback into scalable, human-aligned AI evaluation.]]></summary></entry><entry><title type="html">Mastering Decision-Making in Technology</title><link href="https://americanexpress.io/mastering-decision-making-in-technology/" rel="alternate" type="text/html" title="Mastering Decision-Making in Technology" /><published>2026-02-11T00:00:00-05:00</published><updated>2026-02-11T00:00:00-05:00</updated><id>https://americanexpress.io/mastering-decision-making-in-technology</id><content type="html" xml:base="https://americanexpress.io/mastering-decision-making-in-technology/"><![CDATA[<p>We all do a common thing, every single day, especially in the fast-paced
    world of engineering leadership: making decisions. Big ones, small ones,
    the kind that keep you up at night, and the ones you barely notice.</p>
  <p>For the longest time, I prided myself on my “gut feeling” and ability to
    make quick calls. Sometimes it works spectacularly. Other times… well,
    let’s just say hindsight is 20/20, and some decisions felt more like
    stumbling in the dark than striding confidently forward.</p>
  <p>I realized that just being smart or experienced wasn’t enough. Leading a
    team, building complex systems, and navigating the business landscape
    demands more. It demands smarter decision-making. Not just faster, but
    better. I needed a process to cut through the noise and, frankly, get
    out of my own way.</p>
  <p>So, I went on a bit of a quest️, researching, completing trainings and
    courses, and diving deep into the art and science of decision-making
    strategy. And wow, did I learn a few things! I want to share my journey
    and some “aha!” moments, hoping they might help you level up your own
    decision-making.</p>
  <h2 id="step-1-define-the-problem">Step 1: Define the Problem</h2>
  <p>How often have we jumped into coding a solution, only to realize later
    we misunderstood the core need? Well-defined problems lead to
    breakthrough solutions.</p>
  <p>Adopt a more rigorous Problem Definition Process:</p>
  <ul>
    <li>
      <p><strong>Problem Statement</strong>: Write it down. Is it clear? Is it actually
        multiple problems? What does success look like? Who needs to be
        involved?</p>
    </li>
    <li>
      <p><strong>Need</strong>: What’s the fundamental need? Who benefits? Why?</p>
    </li>
    <li>
      <p><strong>Justification</strong>: Does this align with our strategy? What are the
        measurable benefits? How do we ensure implementation?</p>
    </li>
    <li>
      <p><strong>Context</strong>: What have we or others tried? What are the constraints
        (tech debt, budget, regulations)?</p>
    </li>
  </ul>
  <p><em>“Instead of just fixing a slow page, we dug deeper to define the
      problem as: an API endpoint’s response times increased significantly
      over the past month, correlating with a decline in user engagement.”</em></p>
  <h2 id="step-2-choose-your-battles">Step 2: Choose Your Battles</h2>
  <p>What now? We design a better decision-making system.</p>
  <p>Not every decision needs a 10-page analysis. Focus intense effort on the
    critical, high-impact decisions. Which ones truly warrant the deep dive?
    Think of it as triaging decisions, classifying them as low-, medium-,
    and high- stakes.</p>
  <p>Leaders can get stuck treating all decisions as equal. The skill is in
    knowing when to slow down and invest more thinking versus when to move
    fast and conserve energy for the calls that matter most.</p>
  <h2 id="step-3-recognize-and-spot-biases">Step 3: Recognize and Spot Biases</h2>
  <p>We’re all biased. It’s not a moral failing; it’s just how our brains are
    wired. We take mental shortcuts (heuristics) to deal with complexity,
    but sometimes these shortcuts lead us down the wrong path. Think of it
    like wearing slightly warped glasses—the world looks almost right, but
    things are subtly off, leading to missteps.</p>
  <p>Here are the common culprits:</p>
  <ul>
    <li><strong>Action-Oriented Bias</strong>: We want to go fast, jumping into solutions
      before fully understanding the problem. We need to embrace uncertainty
      and explore before executing.</li>
  </ul>
  <blockquote>
    <p><em>“I dove right into coding a complex feature request without writing a
        proper design doc. Halfway through, it hit me—I’d completely missed
        some crucial edge cases and overlooked key non-functional
        requirements.”</em></p>
  </blockquote>
  <ul>
    <li><strong>Pattern-Recognition Bias</strong>: Seeing patterns where none exist, often
      based on past (but maybe irrelevant) experiences. Like assuming a new
      coding challenge is exactly like one you solved five years ago,
      ignoring crucial differences. Change the angle and look from a
      different perspective.</li>
  </ul>
  <blockquote>
    <p><em>“I caught myself assuming a performance issue must be the database
        again, without even checking the caching layers or network latency
        first. I defaulted to my past experiences instead of considering other
        possibilities.”</em></p>
  </blockquote>
  <ul>
    <li><strong>Stability Bias</strong>: Preferring the status quo even when change is
      needed. “If it ain’t broke, don’t fix it” can be dangerous in a
      dynamic environment. Sometimes, you need to shake things up!</li>
  </ul>
  <blockquote>
    <p><em>“Hesitating to upgrade frameworks that are outdated and lack
        essential features because it feels too disruptive.”</em></p>
  </blockquote>
  <ul>
    <li><strong>Interest Bias</strong>: This one is very common for tech leaders, letting
      personal or team incentives cloud judgment. Is this really the best
      technical solution, or does it just let my team use that shiny new
      framework they love? It is important to make those interests explicit!</li>
  </ul>
  <blockquote>
    <p><em>“Let’s do Rust, for a new service, even if the rest of the team isn’t
        proficient in it.”</em></p>
  </blockquote>
  <ul>
    <li><strong>Social Bias</strong>: Grounded in groupthink or letting the loudest voice
      dominate. We need processes that encourage diverse viewpoints and
      depersonalize debate.</li>
  </ul>
  <blockquote>
    <p><em>“I remember that architecture review where I found myself deferring
        to the most senior engineer’s opinion. Even though I had concerns, I
        hesitated to speak up, and I noticed that other junior members did the
        same. The senior voice dominated the conversation, and our quieter
        perspectives were never heard.”</em></p>
  </blockquote>
  <p>Recognizing these biases is like turning on a light switch. It’s about
    seeing potential pitfalls before falling into them.</p>
  <h2 id="step-4-deploy-countermeasures-for-biases">Step 4: Deploy Countermeasures for Biases</h2>
  <p>Use targeted tactics:</p>
  <ul>
    <li>
      <p>Think statistically and rely on data rather than intuition.</p>
    </li>
    <li>
      <p>Make sure to gather diverse perspectives.</p>
    </li>
    <li>
      <p>Aggregate input from multiple team members to improve decision
        quality.</p>
    </li>
  </ul>
  <h2 id="step-5-embed-those-countermeasures">Step 5: Embed those Countermeasures</h2>
  <p>Make it routine. Add bias checks to your formal decision processes (like
    project kick-offs or solution or strategy reviews).</p>
  <p>The key to embedding is ritualizing good practices. A few practical ways
    to make it stick:</p>
  <ul>
    <li>
      <p>Add a simple “bias check” question into project templates: “What blind
        spots might affect this decision?”</p>
    </li>
    <li>
      <p>In retrospectives, explicitly review not just outcomes but the
        decision process: Did we rush? Did we ignore dissenting opinions?</p>
    </li>
    <li>
      <p>Incorporate bias-awareness and problem-definition training into
        onboarding for engineers and managers so that new team members are
        aligned from the start.</p>
    </li>
    <li>
      <p>Celebrate examples of good decision-making, not just good results —
        sometimes a well-structured process prevents disaster, even if the
        initial idea didn’t pan out.</p>
    </li>
  </ul>
  <p>Over time, these small rituals hardwire bias awareness and structured
    decision-making into the team’s cultural DNA, so it becomes second
    nature.</p>
  <h2 id="step-6-remain-grounded-in-strategy">Step 6: Remain Grounded in Strategy</h2>
  <p>Decisions don’t happen in a vacuum. They need to serve a larger
    strategy.</p>
  <p>I used to think strategy was just for senior executives. But strategy is
    crucial at every level. Why? Scarcity. We don’t have infinite time,
    money, or people. Strategy helps us make choices about where to focus
    our limited resources.</p>
  <p>Crucially, strategy needs:</p>
  <ul>
    <li>
      <p><strong>Internal Fit</strong> (do the pieces work together logically, reinforcing
        each other?)</p>
    </li>
    <li>
      <p><strong>External Fit</strong> (does it match the reality of the market, tech
        trends, regulations, etc.?).</p>
    </li>
  </ul>
  <p>Your internal plan isn’t helpful if the external world makes it
    obsolete. Strategy ensures that every investment and development aligns
    with the broader objectives of the enterprise, creating a whole that is
    greater than the sum of its parts.</p>
  <h2 id="step-7-move-from-gut-feel-to-hypothesis-driven">Step 7: Move From Gut Feel to Hypothesis-Driven</h2>
  <p>This was a big shift. Instead of saying “I think this feature will
    work,” start saying, “My hypothesis is that if we build feature X
    (independent variable), then we will see a Y% increase in (dependent
    variable).”</p>
  <p>Why? Because most ideas, even good-sounding ones, often don’t deliver
    the expected value when tested scientifically! We need to move from
    intuition to evidence.</p>
  <p>Process:</p>
  <ul>
    <li>
      <p><strong>Ask Questions</strong>: Start broad (exploratory questions) especially with
        unknowns. Why is the system slow? What are users really trying to do?</p>
    </li>
    <li>
      <p><strong>Collect Facts / Stats</strong>: Collect facts, data, different perspectives
        as much as possible.</p>
    </li>
    <li>
      <p><strong>Formulate Hypotheses</strong>: Get specific (confirmatory questions). Make
        them measurable and testable.</p>
    </li>
    <li>
      <p><strong>Test &amp; Learn</strong>: Test and gather data! Run experiments (POC, tests,
        user studies, others who have done it in past).</p>
    </li>
    <li>
      <p><strong>Refine</strong>: Was the hypothesis right, wrong, or partially right?
        Update your understanding and iterate.</p>
    </li>
  </ul>
  <h2 id="the-road-ahead">The Road Ahead</h2>
  <p>This isn’t an overnight transformation and each of our leadership
    transformation journeys will look different. It’s an ongoing practice of
    awareness, discipline, and learning.</p>
  <p>But the payoff? More confident decisions, better team alignment,
    strategies that actually work, and ultimately, building better products
    and stronger teams to win.</p>
  <p>It’s about shifting from simply reacting to proactively architecting our
    decisions and strategies. It takes effort, but the clarity and
    effectiveness it brings are invaluable.</p>
  ]]></content><author><name></name></author><category term="engineering-leadership" /><category term="decision-making" /><category term="cognitive-bias" /><summary type="html"><![CDATA[From Gut Instinct to Intentional Leadership Decisions]]></summary></entry><entry><title type="html">The Innovation Behind Amex’s Platinum Card Refresh</title><link href="https://americanexpress.io/the-innovation-behind-amexs-platinum-card-refresh/" rel="alternate" type="text/html" title="The Innovation Behind Amex’s Platinum Card Refresh" /><published>2026-01-29T00:00:00-05:00</published><updated>2026-01-29T00:00:00-05:00</updated><id>https://americanexpress.io/the-innovation-behind-amexs-platinum-card-refresh</id><content type="html" xml:base="https://americanexpress.io/the-innovation-behind-amexs-platinum-card-refresh/"><![CDATA[<p>At American Express, innovation is not a one-time milestone, it’s a continuous journey.
    Over the past decade, we’ve reimagined the way we deliver new Card products and benefits
    to our Card Members by thoughtfully investing in technology platform modernization, API
    architecture, data-driven insights, and digital experiences. As a result, we’ve completed
    more than 200 Card refreshes since 2019, averaging over 30 each year. Our most recent U.S.
    Platinum Card refresh—the most ambitious yet across both consumer and business Cards—showcases
    this transformation in action.</p>
  <p><strong>Continuous Modernization Fuels Innovation</strong></p>
  <p>Our technology transformation has been deliberate and multi-year, with
    ongoing investments in continuous modernization. These investments enhance
    our modular architecture, reusable frameworks, and advanced big data platforms
    that drive speed and efficiency.</p>
  <p>Gen AI-powered rule generation and UI-driven configuration are two key foundational
    investments. Our technical architecture places a rules engine above the base code.
    That means Gen AI generates core decision rules for benefits, offers, and rewards,
    enabling faster configuration and deployment. Parameters and configurations are
    managed through intuitive UI layers, while Gen AI supports internal teams’ real-time
    setup, speeding up end-to-end delivery and reducing time and complexity for our engineers.</p>
  <p>Previously, setting up new benefits could take three months or more.
    Now, with UI-driven configuration and Gen AI–powered business rule
    creation, setup timelines have been reduced to as little as 6–8 weeks.</p>
  <p><strong>APIs: The Building Blocks of Exceptional Experiences</strong></p>
  <p>At the center of our digital ecosystem is a robust API-led architecture.
    Our core products and benefits are powered by APIs designed for
    transparency and scalability.</p>
  <p>For customers, this means seamless experiences: real-time benefit
    tracking, self-service enrollment, and personalized recommendations. For
    engineers, APIs act as modular building blocks—ready-to-use components
    that can be combined to create new digital experiences without starting
    from scratch.</p>
  <p>This flexible foundation also made it possible to bring together lifestyle,
    dining, and payment experiences together in one cohesive ecosystem. Now,
    our mobile app serves as a hub where customers can manage their finances
    and explore curated lifestyle offerings, like discovering their next Resy
    reservation or a new travel inspiration—all in one place.</p>
  <p>And we’re already building for what’s next: many of our APIs are
    AI-ready, paving the way for intelligent, context-aware features in
    travel, dining, and beyond. Our API strategy ensures that as technology
    evolves, our platforms evolve along with it.</p>
  <p><strong>Data-Driven Insights at Scale</strong></p>
  <p>American Express’ closed-loop model provides a high-definition view of our
    customers, allowing us to deliver benefits that resonate. Our new cloud-based
    data platform unlocks scalable computing power, real-time analytics, and robust
    governance to enable innovation while meeting regulatory requirements. For the
    Platinum Card refresh, this level of data-driven rigor helped us curate new dining,
    lifestyle and business benefits. For example, dining is a top passion for American
    Express Card Members, who spent over $87B on dining in the U.S. in 2024, so offering
    stronger dining value to Card Members was a key priority in this update.</p>
  <p><strong>Looking Ahead: Digital at the Core</strong></p>
  <p>The Platinum Card refresh is another exciting milestone in our digital journey, but
    there is more to come. We’re making strategic investments in our mobile app, website,
    and tech platforms to enable future product updates and seamless digital experiences
    for our customers. From more intuitive personalization to new integrations across
    benefits and rewards, digital is at the heart of every product refresh.</p>
  ]]></content><author><name></name></author><category term="data" /><category term="infrastructure" /><category term="apis" /><summary type="html"><![CDATA[How Amex’s modern platforms, APIs, and data are powering a better card experience.]]></summary></entry><entry><title type="html">Beyond Vanilla RAG: 7 Techniques for Better Retrieval-Augmented Generation</title><link href="https://americanexpress.io/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation/" rel="alternate" type="text/html" title="Beyond Vanilla RAG: 7 Techniques for Better Retrieval-Augmented Generation" /><published>2026-01-14T00:00:00-05:00</published><updated>2026-01-14T00:00:00-05:00</updated><id>https://americanexpress.io/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation</id><content type="html" xml:base="https://americanexpress.io/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation/"><![CDATA[<hr />
    <div class="fix-self-rag">
      <p>Large Language Models (LLMs) are trained on vast datasets, yet they
        still struggle when queries require information outside their training
        data. Correct responses to challenging queries might involve proprietary
        information, recent events, or specialized knowledge not captured during
        training. One popular approach to mitigate this issue is
        Retrieval-Augmented Generation (RAG), which enhances LLMs by leveraging
        external knowledge to deliver better responses.</p>
      <p>The standard, or “vanilla,” RAG process involves the following steps:</p>
      <ol>
        <li><strong>Document Chunking</strong>: Splitting a document or article into smaller,
          manageable chunks.</li>
        <li><strong>Vectorization</strong>: Using an embedding model to transform these
          chunks into vector representations and store them in a vector store
          along with relevant metadata.</li>
        <li><strong>Similarity Search</strong>: When a query is received, the system
          vectorizes the query using the same embedding model and performs a
          similarity search to retrieve the top k chunks that are most
          relevant to the query.</li>
        <li><strong>Response Generation</strong>: The query, along with the top k chunks, is
          passed to an LLM to generate a response based on the retrieved
          information.</li>
      </ol>
    </div>
    <p>While vanilla RAG is effective in many cases, it is not without
      limitations. It may fail to retrieve the most relevant chunks or
      generate accurate responses, especially for more complex or nuanced
      questions. These limitations have driven significant research efforts
      aimed at enhancing the basic RAG approach.</p>
    <p>In this blog post, we will explore seven advanced RAG approaches grouped
      by core strategies, including: Reasoning-based, Retrieval reliability,
      and Knowledge structure-enhanced. Each of these is an improvement from
      vanilla RAG and each results in better responses from the LLM. By the
      end of this post, you’ll have a clearer understanding of these advanced
      techniques and the types of applications they are best suited for.</p>
    <p><strong>Reasoning-based</strong>: Self-RAG, ActiveRAG, Chain-of-Note, RAFT</p>
    <p><strong>Retrieval reliability</strong>: CorrectiveRAG, Adaptive-RAG</p>
    <p><strong>Knowledge structure-enhanced:</strong> Graph-Enhanced RAG</p>
    <h2 id="reasoning-based">Reasoning-Based</h2>
    <h3 id="self-rag">Self-RAG</h3>
    <p><img src="../_post_assets/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation/img/ragimage1.jpg" alt="Self Rag Image" /></p>
    <div class="adjust-bullet-spacing">
      <p>The Self-RAG [<a href="#ref-self-rag">1</a>] approach leverages a fine-tuned model to make more
        informed decisions during the question-answering process. Unlike the
        vanilla RAG approach, which always retrieves additional context,
        Self-RAG introduces a conditional retrieval mechanism. Here’s how it
        works:</p>
      <ol>
        <li>
          <p><strong>Initial Query and Conditional Retrieval</strong>:
            The process starts with a query being sent to the model, which then
            decides whether extra context needs to be retrieved from the vector
            store. If retrieval is necessary, the model retrieves relevant
            chunks.</p>
        </li>
        <li><strong>Chunk Evaluation and Response Generation</strong>:
          For each retrieved chunk, a two-fold evaluation takes place:
          <ul>
            <li>The model checks if the chunk is relevant to the query.</li>
            <li>Regardless of the chunk’s relevance, the model generates a preliminary response.</li>
          </ul>
        </li>
        <li><strong>Self-Reflection and Validation</strong>:<br />
          The generated response, alongside the query and the chunk, is then
          passed through the model again to evaluate whether:
          <ul>
            <li>The response is supported by the chunk.</li>
            <li>The response is useful for answering the question.</li>
          </ul>
        </li>
        <li>
          <p><strong>Re-Ranking Based on “Tokens”</strong>:<br />
            Self-RAG ranks the retrieved chunks based on three key factors
            (tokens): Relevance, Usefulness, and Supportiveness (though this
            ranking step is not depicted in diagram). The top <em>k</em> re-ranked
            chunks are selected.</p>
        </li>
        <li><strong>Final Answer Generation</strong>:<br />
          Finally, the top <em>k</em> ranked chunks are sent back to the model one
          last time, along with the original query, to generate the final,
          refined answer (not depicted in diagram).</li>
      </ol>
    </div>
    <p>Self-RAG excels at handling single-hop questions, where the answer can
      be found within a single retrieved chunk. Its success across various
      benchmarks, such as PopQA, TriviaQA, PubHealth, ARC-Challenge,
      Biography, and ASQA, is attributed to the multiple rounds of
      self-reflection and reasoning achieved through repeated LLM calls. This
      iterative process significantly enhances the model’s reasoning capacity
      and ensures higher accuracy.</p>
    <div class="fix-self-rag">
      <p>However, there is a trade-off. Self-RAG requires:</p>
      <ul>
        <li>Fine-tuning two large language models.</li>
        <li>Multiple calls to one fine-tuned LLM during the inference stage.</li>
      </ul>
    </div>
    <p>While these factors contribute to its superior performance, they also
      make Self-RAG less cost-effective, especially for applications requiring
      real-time responses or operating under strict computational budgets.</p>
    <h3 id="activerag">ActiveRAG</h3>
    <p><img src="../_post_assets/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation/img/ragimage2.jpg" alt="Active Rag Image" /></p>
    <p>ActiveRAG [<a href="#ref-active-rag">2</a>] is a unique approach that can be thought of as dual
      tasking in parallel. On one hand, a Chain-of-Thought (CoT) query is sent
      to an LLM to generate a step-by-step reasoning response for the
      question. Simultaneously, after retrieving relevant chunks based on the
      query, these chunks are sent to the LLM along with one of four knowledge
      construction prompting strategies, which enhances the LLM’s reasoning
      process. For example, one strategy helps the LLM better understand the
      query and context leveraging the retrieved context.</p>
    <p>In the final cognitive nexus step, ActiveRAG integrates the reasoning
      result from the reasoning process to identify potential errors in the
      original CoT response, ultimately producing the final, refined answer.</p>
    <p>ActiveRAG has outperformed several benchmarks, including Natural
      Questions (NQ), TriviaQA, PopQA, and WebQ, demonstrating its strength in
      single-hop questions. This improvement is largely due to the explicit
      expansion of the LLM’s reasoning capabilities in the knowledge
      construction step, coupled with the cognitive nexus step, which
      self-checks the CoT response against the retrieved information.</p>
    <p>Unlike some other approaches, ActiveRAG does not require fine-tuning of
      any large or small LMs. However, it does involve multiple calls to LLMs,
      which can lead to higher latency and increased computational costs.</p>
    <h3 id="chain-of-note">Chain-of-Note</h3>
    <p><img src="../_post_assets/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation/img/ragimage3.jpg" alt="Chain-of-Note Image" /></p>
    <div class="fix-self-rag">
      <p>The Chain-of-Note [<a href="#ref-chain-of-note">3</a>] approach leverages a fine-tuned model. In this
        approach, retrieved chunks of information along with the query are
        passed to the model. The model’s response not only provides the final
        answer but also includes explanatory notes on how the answer was derived
        from the retrieved chunks, reducing the risk of hallucination. There are
        three types of notes that can be generated:</p>
      <ul>
        <li><strong>Relevant (contains the answer)</strong>: The chunk directly provides the
          correct answer.</li>
        <li><strong>Irrelevant (model knows the answer)</strong>: The model already knows the
          answer independently of the retrieved chunk.</li>
        <li><strong>Irrelevant (model doesn’t know the answer)</strong>: The chunk does not
          help, and the model acknowledges uncertainty.</li>
      </ul>
    </div>
    <p>Chain-of-Note has outperformed key benchmarks such as Natural Questions
      (NQ), TriviaQA, and WebQ, particularly excelling at single-hop
      questions. This improvement is due to the model’s additional
      self-refinement steps before producing the final answer, which
      strengthens its reasoning abilities.</p>
    <p>During inference, only a single call to the fine-tuned model is
      required, which keeps operational efficiency high. However, the data
      collection process for fine-tuning a model can be resource intensive.
      For instance, Chain-of-Note leveraged ChatGPT to generate answers with
      notes for 10,000 questions sampled from the NQ dataset. While the
      approach is effective, using a more robust, commercial LLM for
      production deployments may offer better performance. Chain-of-Note could
      be expensive when developers need to fine-tune a model with their
      dataset for their use cases.</p>
    <h3 id="raft">RAFT</h3>
    <p><img src="../_post_assets/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation/img/ragimage4.jpg" alt="RAFT Image" /></p>
    <p>RAFT [<a href="#ref-raft">4</a>] employs a fine-tuned model. During the training phase, in
      addition to relevant chunks, RAFT intentionally includes irrelevant
      chunks in the training datasets. The model generates responses in a
      Chain-of-Thought (CoT) style, incorporating reasoning and citing
      relevant documents. This training strategy equips the model to identify
      and disregard irrelevant chunks during inference, enabling it to provide
      accurate answers even when such chunks are mistakenly retrieved.</p>
    <p>RAFT has surpassed benchmarks such as PubMed, HotPotQA, HuggingFace,
      Torch Hub, and TensorFlow Hub, showcasing its effectiveness in tackling
      multi-hop questions. This capability means that generating a correct
      answer often requires synthesizing information from multiple chunks
      located in different contexts. RAFT’s exceptional performance on
      multi-hop questions stems from its enhanced reasoning capacity, allowing
      it to analyze both relevant and irrelevant chunks concurrently.</p>
    <p>At first glance, RAFT may resemble the Chain-of-Note approach; however,
      its distinctive training methodology sets it apart. By deliberately
      including irrelevant chunks during training, RAFT bolsters the
      robustness of its fine-tuned model, ensuring better performance in
      inference scenarios where irrelevant information may arise. Moreover, it
      is specifically trained to excel at multi-hop questions rather than just
      single-hop inquiries.</p>
    <h2 id="retrieval-reliability">Retrieval Reliability</h2>
    <h3 id="correctiverag">CorrectiveRAG</h3>
    <p><img src="../_post_assets/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation/img/ragimage5.jpg" alt="CorrectiveRag Image" /></p>
    <p>The CorrectiveRAG [<a href="#ref-corrective-rag">4</a>] approach introduces a Retrieval Evaluator,
      fine-tuned using T5-large, to assess the relevance of retrieved chunks
      to the query. If the retrieved chunks are accurate, the query and chunks
      are sent to a large language model (LLM) to generate the final response.
      However, if the chunks are deemed incorrect, CorrectiveRAG rolls back to
      a web search, and the query along with the search results are then
      passed to the LLM for response generation.</p>
    <p>In cases where ambiguity arises, CorrectiveRAG combines both the
      retrieved chunks from the vector store and the search results from the
      web search, along with the query, and sends them to the LLM for
      generating a response.</p>
    <p>CorrectiveRAG has outperformed benchmarks such as PopQA, Biography,
      PubHealth, and ARC-Challenge, demonstrating its effectiveness at
      handling single-hop questions. This performance boost is due to the
      added evaluation of the correctness of retrieved chunks and the
      integration of web search results when necessary.</p>
    <p>Compared to previous approaches, CorrectiveRAG only requires fine-tuning
      a small language model (LM) as a classifier. During inference, it
      requires just one call to a small LM and one call to an LLM, with
      occasional web searches. This makes CorrectiveRAG more cost-effective
      than methods that rely on fine-tuning one or more LLMs.</p>
    <p>However, the reliance on web searching, when retrieved chunks are
      ambiguous or incorrect, is a double-edged sword. While web searches can
      improve the accuracy of answers, certain use cases prohibit external web
      searching, limiting the applicability of CorrectiveRAG in such
      environments.</p>
    <h3 id="adaptive-rag">Adaptive-RAG</h3>
    <p><img src="../_post_assets/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation/img/ragimage6.jpg" alt="Adaptive-Rag Image" /></p>
    <p>Adaptive-RAG [<a href="#ref-adaptive-rag">4</a>] employs a classifier, fine-tuned on T5-large, to
      assess the complexity of incoming queries. If a query is classified as
      native, the query is sent directly to an LLM for a response. For simple
      queries, the system retrieves chunks only once; the query and the
      retrieved chunks are then sent to the LLM for a response. In the case of
      complex queries, Adaptive-RAG retrieves chunks multiple times before
      passing the query and chunks to the LLM for final response generation.</p>
    <p>Adaptive-RAG has outperformed benchmarks such as SQuAD, Natural
      Questions (NQ), TriviaQA, MuSiQue, HotPotQA, and 2WikiMultiHopQA,
      achieving superior results in either accuracy or cost-effectiveness.
      This indicates that Adaptive-RAG is efficient at handling both
      single-hop and multi-hop questions, particularly when incoming queries
      may vary in complexity.</p>
    <p>While both Adaptive-RAG and CorrectiveRAG utilize a small LM as a
      classifier, they differ in their approaches. Adaptive-RAG classifies the
      query prior to any chunk retrieval. For native questions, the system
      sends them directly to the LLM without retrieving additional
      information. For complex questions, multiple retrievals are performed,
      but Adaptive-RAG does not revert to web searching. We are curious about
      the potential outcomes of combining Adaptive-RAG with CorrectiveRAG to
      leverage their respective strengths.</p>
    <h2 id="knowledge-structure-enhanced">Knowledge Structure-Enhanced</h2>
    <h3 id="graph-enhanced-rag">Graph-Enhanced RAG</h3>
    <p>Knowledge Graphs can significantly enhance the capabilities of vanilla
      RAG. To differentiate this general approach from Microsoft’s GraphRAG,
      which is not the focus of this blog post, we refer to this approach as
      Graph-Enhanced RAG [<a href="#ref-graph-enhanced-rag">7</a>].</p>
    <p><img src="../_post_assets/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation/img/ragimage7.jpg" alt="Graph-Enhanced Rag Image" /></p>
    <p>The vanilla RAG approach begins with a vector database that stores the
      vectors of chunks (e.g., c1, c2, and c3) extracted from articles or
      documents. The texts of these chunks are stored as metadata linked to
      their corresponding vectors (this metadata is not depicted in the
      diagram). When a query is received, the system retrieves the top k
      chunks from the vector database based on their similarity to the query.</p>
    <p><img src="../_post_assets/beyond-vanilla-rag-7-techniques-for-better-retrieval-augmented-generation/img/ragimage8.jpg" alt="Graph-Enhanced Rag2 Image" /></p>
    <p>By incorporating a knowledge graph, we can conceptualize the chunks as
      nodes (e.g., n1, n2, and n3), with the chunk texts and vectors
      represented as properties of these nodes. Relationships can then be
      established among the nodes, such as a NEXT relationship that describes
      the sequence of the chunks.</p>
    <p>Subsequently, LLMs can be employed to extract entities from the nodes,
      including people (e.g., p1 and p2), organizations, companies (e.g., c1),
      locations, and more. These entities become new nodes within the graph.
      Relationships between the extracted entities and the original chunks can
      also be added (e.g., MENTIONS).</p>
    <p>Additionally, relationships among the newly created nodes can be
      identified using LLMs. For example, person p1 works for company c1, and
      person p2 also works for company c1. We can infer that p1 and p2 are
      colleagues based on this information.</p>
    <p>One of the advantages of Graph-Enhanced RAG is that it does not require
      fine-tuning any models. When a new context is introduced, developers can
      easily extend the existing knowledge graph. This approach is
      particularly effective for multi-hop questions and overarching tasks,
      where responses may need to synthesize content from across an entire
      article or document, such as generating a summary.</p>
    <p>However, Graph-Enhanced RAG does necessitate calls to LLMs for entity
      and relationship extraction from the texts of the nodes. Additionally,
      depending on the size and content of the documents, storing the
      knowledge graph may require significant space, which can increase costs.</p>
    <h2 id="summary">Summary</h2>
    <table class="advanced-rag-table">
      <colgroup>
        <col style="width: 4%" />
        <col style="width: 23%" />
        <col style="width: 22%" />
        <col style="width: 23%" />
        <col style="width: 25%" />
      </colgroup>
      <thead>
        <tr>
          <th>#</th>
          <th>Name</th>
          <th>Need to fine-tune model(s)?</th>
          <th>Fine-tuned model size x amount</th>
          <th>Applications</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>1</td>
          <td>Self-RAG</td>
          <td>Yes</td>
          <td>7b x 2</td>
          <td>single-hop QA</td>
        </tr>
        <tr>
          <td>2</td>
          <td>ActiveRAG</td>
          <td>No</td>
          <td>n/a</td>
          <td>single-hop QA</td>
        </tr>
        <tr>
          <td>3</td>
          <td>Chain-of-Note</td>
          <td>Yes</td>
          <td>7b x 1</td>
          <td>single-hop QA</td>
        </tr>
        <tr>
          <td>4</td>
          <td>RAFT</td>
          <td>Yes</td>
          <td>7b x 1</td>
          <td>
            <p>single-hop QA</p>
            <p>multi-hop QA</p>
          </td>
        </tr>
        <tr>
          <td>5</td>
          <td>CorrectiveRAG</td>
          <td>Yes</td>
          <td>0.77b x 1</td>
          <td>single-hop QA+</td>
        </tr>
        <tr>
          <td>6</td>
          <td>Adaptive-RAG&nbsp;</td>
          <td>Yes</td>
          <td>0.77b x 1</td>
          <td>
            <p>single-hop QA</p>
            <p>multi-hop QA</p>
          </td>
        </tr>
        <tr>
          <td>7</td>
          <td>Graph-Enhanced RAG</td>
          <td>No</td>
          <td>n/a</td>
          <td>
            <p>single-hop QA</p>
            <p>multi-hop QA</p>
            <p>overarching task</p>
          </td>
        </tr>
      </tbody>
    </table>
    <p>The table above provides a high-level summary of the seven advanced RAG
      approaches, highlighting whether each method requires fine-tuning a
      model, the model size, and the applications for which each approach is
      best suited.</p>
    <p>Overall, there is no single RAG approach that universally fits all use
      cases. Developers should choose a RAG approach based on their specific
      question types (e.g., single-hop and multi-hop) and other requirements
      (e.g., latency is a critical factor and can fine-tune a model.) For
      instance, if the question set includes a mix of single-hop and multi-hop
      questions, and latency is a critical factor, developers might consider
      starting with Adaptive-RAG. Conversely, if the questions are complex and
      necessitate information drawn from various parts of the context, and
      fine-tuning a model is not feasible, Graph-Enhanced RAG may be the
      better option. Advanced RAG approaches will continually evolve with new
      ones emerging in the future. By leveraging these advanced RAG
      techniques, we can improve the quality of LLM answers for our use cases.</p>
    <p><em>This article summarizes publicly available research on
        retrieval-augmented generation (RAG) techniques. It is provided for
        informational purposes only.</em></p>
    <h3 id="academic-papers"><strong>Academic Papers</strong></h3>
    <p><a id="ref-self-rag"></a>
      <strong>1. Self-RAG</strong><br />
      Asai, A., Wu, Z., Wang, Y., Sil, A., &amp; Hajishirzi, H. (2023). <em>Self-RAG:
        Learning to Retrieve, Generate, and Critique through Self-Reflection.</em>
      arXiv preprint arXiv:2310.11511.
      <a href="https://arxiv.org/pdf/2310.11511?utm_source=chatgpt.com">https://arxiv.org/pdf/2310.11511</a></p>
    <p><a id="ref-active-rag"></a>
      <strong>2. ActiveRAG</strong><br />
      Xu, Z., Liu, Z., Liu, Y., Xiong, C., Yan, Y., Wang, S., Yu, S., Liu, Z.,
      &amp; Yu, G. (2024). <em>ActiveRAG: Autonomous Knowledge Assimilation and
        Accommodation through Active Retrieval.</em> arXiv preprint
      arXiv:2402.13547.
      <a href="https://ar5iv.labs.arxiv.org/html/2402.13547?utm_source=chatgpt.com">https://ar5iv.labs.arxiv.org/html/2402.13547</a></p>
    <p><a id="ref-chain-of-note"></a>
      <strong>3. Chain-of-Note</strong><br />
      Yu, W., Zhang, H., Pan, X., Cao, P., Ma, K., Li, J., Wang, H., &amp; Yu, D.
      (2023). <em>Chain-of-Note: Enhancing Retrieval-Augmented Generation with
        Knowledge Organization.</em> arXiv preprint arXiv:2311.09210.
      <a href="https://arxiv.org/pdf/2311.09210?utm_source=chatgpt.com">https://arxiv.org/pdf/2311.09210</a></p>
    <p><a id="ref-raft"></a>
      <strong>4. RAFT</strong><br />
      Zhang, T., Patil, S. G., Jain, N., Shen, S., Zaharia, M., Stoica, I., &amp;
      Gonzalez, J. E. (2024). <em>RAFT: Adapting Language Model to
        Domain-Specific Retrieval-Augmented Generation Tasks.</em> arXiv preprint
      arXiv:2403.10131.
      <a href="https://arxiv.org/pdf/2403.10131?utm_source=chatgpt.com">https://arxiv.org/pdf/2403.10131</a></p>
    <p><a id="ref-corrective-rag"></a>
      <strong>5. CorrectiveRAG</strong><br />
      Yan, S.-Q., Gu, J.-C., Zhu, Y., &amp; Ling, Z.-H. (2024). <em>Corrective
        Retrieval-Augmented Generation.</em> arXiv preprint arXiv:2401.15884.
      <a href="https://arxiv.org/pdf/2401.15884?utm_source=chatgpt.com">https://arxiv.org/pdf/2401.15884</a></p>
    <p><a id="ref-adaptive-rag"></a>
      <strong>6. Adaptive-RAG</strong><br />
      Jeong, S., Baek, J., Cho, S., Hwang, S. J., &amp; Park, J. C. (2024).
      <em>Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language
        Models through Question Complexity.</em> arXiv preprint arXiv:2403.14403.
      <a href="https://arxiv.org/pdf/2403.14403?utm_source=chatgpt.com">https://arxiv.org/pdf/2403.14403</a></p>
    <h3 id="graph-enhanced-rag-and-related-resources"><strong>Graph-Enhanced RAG and Related Resources</strong></h3>
    <p><a id="ref-graph-enhanced-rag"></a>
      <strong>7. Neo4j Product Examples – SEC EDGAR Data Prep Repository</strong><br />
      Neo4j Product Examples. (n.d.). <em>Data Preparation for SEC EDGAR
        Knowledge Graph Examples.</em> GitHub repository. Retrieved November 2025,
      from <a href="https://github.com/neo4j-product-examples/data-prep-sec-edgar">https://github.com/neo4j-product-examples/data-prep-sec-edgar</a></p>
    <p><strong>8. DeepLearning.AI Short Course</strong><br />
      DeepLearning.AI. (n.d.). <em>Knowledge Graphs &amp; Retrieval-Augmented
        Generation [Online short course].</em> Retrieved November 2025, from
      <a href="https://www.deeplearning.ai/short-courses/knowledge-graphs-rag/">https://www.deeplearning.ai/short-courses/knowledge-graphs-rag/</a></p>
    ]]></content><author><name></name></author><category term="llms" /><category term="rag" /><category term="generative-ai" /><summary type="html"><![CDATA[Improving LLM Accuracy with Modern Retrieval Techniques]]></summary></entry></feed>