Business

    Working Effectively with Engineering Teams on Agentic Projects

    How product and project managers can collaborate productively with engineers on agentic systems — including what to ask, what not to assume, and how to manage the information asymmetry.

    Jay Burgess7 min read

    The information asymmetry between managers and engineers is not new, but agentic systems amplify it. A traditional software feature either works or it doesn't. An agentic system works sometimes, in some ways, for some inputs — and understanding the boundaries of that performance requires technical fluency that most managers don't have and shouldn't need to develop fully. The productive response to this asymmetry is not to fake technical depth. It is to develop the right questions, the right vocabulary, and the right instincts for when to trust an engineering assessment and when to probe further.

    Three questions unlock most technical conversations about agentic systems. First: what does failure look like, and how will we know when it happens? Engineers who cannot answer this clearly are building a system that nobody will be able to debug in production. Second: what is the agent allowed to do without human approval, and who decided that? This surfaces the autonomy design decisions that directly affect your risk posture. Third: how do we measure whether this is working better over time? If there is no eval, no metric, and no improvement loop, the system will degrade silently. These questions are not adversarial — they are the same questions good engineers ask themselves. Asking them as a manager signals that you understand how agentic systems actually work.

    Managing the relationship also means protecting engineering teams from the two most common managerial failure modes in AI projects. The first is over-specification: telling the engineering team exactly how to build the agent rather than specifying the outcome and constraints. Agentic systems require significant experimentation; prescriptive requirements eliminate the slack that makes that experimentation possible. The second is under-specification: handing the engineering team a vague goal and expecting them to make the governance, risk, and business model decisions that are properly yours to make. Engineers should not be deciding which workflows require human review, what success means, or how to communicate uncertainty to users. Those are product and project decisions.

    The most effective cross-functional dynamic on agentic projects is one where managers own the outcome definition, the user experience, the risk framework, and the stakeholder relationship — and engineers own the architecture, the tooling, the observability, and the implementation approach. Overlap in the middle is healthy: managers who understand enough architecture to ask good questions, and engineers who understand enough business context to make good tradeoffs. Power dynamics can become a problem when technical teams withhold information about limitations because they fear the project will be cancelled, or when managers override engineering judgment on safety grounds without understanding the implications. Both failure modes are manageable with explicit communication norms and a shared vocabulary established early.

    What this means in practice

    The practical implementation question is not whether the idea is interesting. It is how a team turns it into a workflow that can be inspected, repeated, and improved. For this topic, the operating focus is direct: Develop the three diagnostic questions that unlock technical conversations about agentic systems — and avoid the two managerial failure modes that destroy cross-functional trust.

    That means the engineering work starts before the first model call. The team must decide what the agent is allowed to know, what it is allowed to do, what evidence it must produce, and which actions require a human decision. This is the difference between an impressive demo and a system that can survive real users, changing inputs, and production constraints.

    A credible implementation also includes a feedback path. Every agent run should leave behind enough context for another engineer to answer four questions: what goal was attempted, what context was used, which tools were called, and why the system believed the task was complete. If those questions cannot be answered from logs, traces, or structured outputs, the agent is still operating as a black box.

    Reference Diagram

    A simple architecture to reason from

    Use this diagram as a starting point, not as a universal blueprint. The important move is to make the stages visible. Once stages are visible, you can assign owners, define contracts, set permissions, measure quality, and decide where human review belongs.

    Workflow Map
    Read left to right: state moves through controlled boundaries.
    1
    What does failure look like?

    Engineers who can't answer can't debug in production.

    2
    What can it do autonomously?

    Surfaces autonomy design — directly affects your risk posture.

    3
    How do we measure improvement?

    No eval loop means silent degradation.

    4
    Manager owns: Outcome + Risk + Stakeholders

    These decisions are not engineering decisions.

    5
    Engineer owns: Architecture + Implementation

    These decisions are not management decisions.

    6
    Shared: Vocabulary + Tradeoffs

    The productive overlap that enables good tradeoffs.

    The productive overlap
    The most effective cross-functional dynamic is managers who understand enough architecture to ask good questions, and engineers who understand enough business context to make good tradeoffs. Invest in building this overlap explicitly.
    Code Example

    Agentic project kickoff conversation guide

    The example below is intentionally small. Production agentic systems should start with compact contracts like this because small contracts are testable. Once the boundary is working, you can add richer orchestration without losing control of the core behavior.

    ts·Agentic project kickoff conversation guide
    const diagnosticQuestions = [
      {
        question: "What does failure look like, and how will we know when it happens?",
        purpose: "Surfaces observability gaps and error taxonomy before build",
      },
      {
        question: "What is the agent allowed to do without human approval, and who decided that?",
        purpose: "Surfaces autonomy design decisions that affect your risk posture",
      },
      {
        question: "How do we measure whether this is working better over time?",
        purpose: "Ensures an eval and improvement loop exists from day one",
      },
    ];
    
    const managerialFailureModes = [
      "Over-specification: prescribing the implementation instead of the outcome",
      "Under-specification: deferring governance and risk decisions to engineering",
    ];
    Illustrative pattern — not production-ready

    Implementation notes

    Treat these notes as the first design review checklist. They are deliberately concrete because agentic systems fail most often in the gaps between the model, the tools, the data, and the human operating process.

    Design note 1

    Ask all three questions at project kickoff, not after the first production incident.

    Design note 2

    Establish explicit communication norms about what happens when engineering discovers a limitation.

    Design note 3

    Create a shared vocabulary document in the first sprint so both sides mean the same things.

    Common failure modes

    The fastest way to make an article useful is to name how the pattern breaks. These are the failure modes to watch for when a team moves from reading about this idea to deploying it inside a real workflow.

    Technical teams withhold information about limitations to prevent project cancellation.
    Managers override engineering safety judgment without understanding the downstream implications.
    Governance and risk decisions get made by engineers because managers haven't specified them.

    Operating checklist

    Before this pattern graduates from experiment to production, require a short operating checklist. The checklist should include the owner of the workflow, the allowed tools, the risk rating for each tool, the data sources the agent can use, the completion criteria, the review path, and the rollback plan. If a team cannot fill out that checklist, the workflow is not ready for higher autonomy.

    The checklist should also define how the system will be evaluated after launch. Useful metrics include task success rate, human correction rate, average iterations per completed task, cost per successful run, escalation rate, and the number of blocked tool calls. These metrics turn agent quality into an engineering conversation instead of an opinion about whether the output felt good.

    Finally, make the learning loop explicit. When the agent fails, decide whether the fix belongs in the prompt, the retrieval layer, the tool contract, the permission model, the evaluation suite, or the human process. Mature agentic engineering is not the absence of failures. It is the ability to classify failures quickly and improve the system without expanding risk.

    Key Takeaways
    Ask three questions on every agentic project: what does failure look like, what can the agent do autonomously, and how do we measure improvement.
    Avoid over-specifying the implementation and under-specifying the outcome — both are common managerial failure modes in AI projects.
    Managers own outcome definition, risk framework, and stakeholder communication; engineers own architecture and implementation.
    Learn the full system

    Build real fluency in agentic engineering.

    The Academy turns these concepts into a full curriculum, AI tutor, templates, and the CAE credential path.

    Start Learning