Commit Graph

3 Commits

Author SHA1 Message Date
amrit
69e0676b51 chore: refine system prompt and small update to eval (#1940)
<!-- This is an auto-generated description by cubic. -->

## Summary by cubic
Refined the system prompt for the email assistant to clarify tool usage, safety protocols, and response guidelines. Updated eval test case builders for more realistic coverage and improved test data generation.

- **Prompt Improvements**
  - Expanded instructions on when and how to use tools, safety checks, and bulk actions.
  - Added detailed workflow examples, safety protocols, and clearer self-check steps.
  - Updated common use cases and removed manual instruction responses.

- **Eval Updates**
  - Replaced and improved test case builders for Gmail search and email composition.
  - Made test prompts and expected outputs more realistic and varied.

<!-- End of auto-generated description by cubic. -->



<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

* **New Features**
  * Enhanced AI assistant guidance with more detailed instructions for tool usage, safety protocols, and workflow examples.
  * Added comprehensive safety protocols for bulk and destructive email operations, including confirmation steps and undo guidance.
  * Expanded support for contextual assistance and smart organization workflows.

* **Refactor**
  * Improved and modularized test case generation for AI email search and composition, with stricter validation and clearer prompts.

* **Style**
  * Updated prompt language to prioritize relevance in email retrieval instead of a fixed number of recent emails.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-08-06 21:00:58 -07:00
amrit
bf2c8f3a60 fix: eval lint err for extra arguments and autofix failing test (#1768)
fixes eval check script error due to extra arguments from past code.
    
<!-- This is an auto-generated description by cubic. -->
---

## Summary by cubic
Fixed eval script errors by removing extra arguments from AiChatPrompt calls in all eval tasks.

<!-- End of auto-generated description by cubic. -->



<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **Refactor**
  * Simplified AI Chat evaluation tests by streamlining prompt inputs.
* **Chores**
  * Disabled automatic draft generation in email thread workflows to adjust system behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-07-20 11:43:20 -07:00
amrit
35bf6df65b feat: add a way to evaluate llms for our usecase against our prompts (#1700) 2025-07-15 10:22:42 -07:00