Self-debugging in `CodeExecutionAgent` #6207

ekzhu · 2025-04-04T17:59:42Z

Confirmation

I confirm that I am a maintainer and so can use this template. If I am not, I understand this issue will be closed and I will be asked to use a different template.

Issue body

Follow up to #6098, add auto-debugging loop to CodeExecutionAgent so it automatically tries to regenerate code when there is error.

The text was updated successfully, but these errors were encountered:

Ethan0456 · 2025-04-05T16:40:26Z

Hi @ekzhu,

Below is a proposed approach along with a few questions I had while thinking about it:

Questions

State management: Should we store each retry's code and result in model_context, or only update the agent context after a successful execution?
User feedback during retries: Should we yield full code + execution result at every retry step, or keep it simple with messages like "Code execution failed with exit code ..., retrying..."?
User intervention possibility: If we opt to yield full code + result, it opens up the opportunity to delegate control to a user_agent (if present), allowing for interactive feedback to improve the code generation trajectory. Would that align with the intended design?

model_result = None
execution_result: CodeExecutionEvent = None

for attempts in range(max_code_retries):
    async for inference_output in self._call_llm(
        model_client=model_client,
        model_client_stream=model_client_stream,
        system_messages=system_messages,
        model_context=model_context,
        agent_name=agent_name,
        cancellation_token=cancellation_token,
    ):
        if isinstance(inference_output, CreateResult):
            model_result = inference_output
        else:
            # Streaming chunk event
            yield inference_output
    
      assert model_result is not None, "No model result was produced."
      
      ...

      inferred_text_message: CodeGenerationEvent = CodeGenerationEvent(
          content=str(model_result.content),
          code_blocks=self._extract_markdown_code_blocks(model_result.content),
          source=agent_name,
      )
      
      # execute generated code if present
      execution_result = await self.execute_code_block([inferred_text_message], cancellation_token)
      if execution_result.result.exit_code == 0:
          break

ekzhu · 2025-04-06T04:28:49Z

State management: Should we store each retry's code and result in model_context, or only update the agent context after a successful execution?

For simplicity, let's store all events to model context, including unsuccessful and successful ones.

User feedback during retries: Should we yield full code + execution result at every retry step, or keep it simple with messages like "Code execution failed with exit code ..., retrying..."?

Yield full code and result for transparency. User don't trust models at this point for most scenarios.

User intervention possibility: If we opt to yield full code + result, it opens up the opportunity to delegate control to a user_agent (if present), allowing for interactive feedback to improve the code generation trajectory. Would that align with the intended design?

User intervention should happen outside of the agent. It should be part of a team. So the agent after certain number of unsuccessful attempts, it should just stop, reflect, and pass it on to the next agent, which could be the user if the team orchestrated this way.

ekzhu · 2025-04-06T04:33:04Z

I think a simple loop with counter may not address all cases.

At the end of each iteration, it should use the model to determine if the code error can be fixed, and if not, exit the loop and return a final response to the caller. We can use structured output for this to ensure the model output can be used with the loop condition.

If the code error can be improved, it attempts to try again.

If the code execution succeeded, then apply a final reflection.

It still needs a maximum retry to avoid the model goes off the rails.

Let's experiment with this using a few examples especially data science and data analytics scenarios.

) Signed-off-by: Abhijeetsingh Meena <[email protected]>

ekzhu added code-execution execute generated code proj-agentchat labels Apr 4, 2025

ekzhu added this to the 0.4.x-python milestone Apr 4, 2025

ekzhu mentioned this issue Apr 4, 2025

Add code generation support to CodeExecutorAgent #6098

Merged

3 tasks

Ethan0456 added a commit to Ethan0456/autogen that referenced this issue Apr 15, 2025

Add baseline self-debugging loop to CodeExecutionAgent (microsoft#6207

2b2e362

) Signed-off-by: Abhijeetsingh Meena <[email protected]>

Ethan0456 mentioned this issue Apr 15, 2025

Add self-debugging loop to CodeExecutionAgent #6306

Merged

3 tasks

ekzhu closed this as completed in #6306 Apr 22, 2025

ekzhu closed this as completed in aad6caa Apr 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-debugging in `CodeExecutionAgent` #6207

Self-debugging in `CodeExecutionAgent` #6207

ekzhu commented Apr 4, 2025

Ethan0456 commented Apr 5, 2025

ekzhu commented Apr 6, 2025

ekzhu commented Apr 6, 2025 •

edited

Loading

Self-debugging in CodeExecutionAgent #6207

Self-debugging in CodeExecutionAgent #6207

Comments

ekzhu commented Apr 4, 2025

Confirmation

Issue body

Ethan0456 commented Apr 5, 2025

ekzhu commented Apr 6, 2025

ekzhu commented Apr 6, 2025 • edited Loading

Self-debugging in `CodeExecutionAgent` #6207

Self-debugging in `CodeExecutionAgent` #6207

ekzhu commented Apr 6, 2025 •

edited

Loading