Core
Full Browser Automation in Your Side Panel
Harness the power of multi-modal Large Language Models to navigate and manage web tasks using simple, intuitive commands. Unlike many other web agents, Fuji Chat is available as a browser extension, allowing you to call it anytime you need it. Even in the middle of tasks, you can hand them off to the agent and watch it take care of the rest.
Prior Knowledge Augmentation System
Fuji Chat navigates websites with past experiences, improving its understanding of web dynamics. Users can customize and inject domain-specific insights in real-time by adding instructions in the settings menu.
In the future, Fuji Chat will support additional mechanisms to enhance its web interaction capabilities, including:
Custom text/CSS selectors
JavaScript execution
Enhancing Website Understanding
When analyzing a webpage and composing a prompt, Fuji Chat filters for relevant elements, ensuring a high success rate in interactions.
The system leverages HTML semantics and WAI-ARIA roles to accurately identify interactive elements:
amazon.com
534
547
twitter.com
56
121
github.com
1364
1446
Fuji Chat ensures that essential interactive elements are never missed, while filtering out redundant or hidden components, enhancing its web navigation accuracy.
Benchmarks
Fuji Chat’s ability to complete real-world tasks has been compared to other models using industry benchmarks. The results demonstrate superior success rates across multiple websites:
GPT-4 (All Tools)
11.1%
17.1%
44.2%
60.5%
9.5%
48.8%
25.6%
WebVoyager
53.3%
51.2%
65.1%
76.7%
61.9%
63.4%
65.1%
Fuji Chat
64.4%
65.1%
60.4%
81.4%
76.2%
73.2%
86.0%
Note: Fuji Chat was benchmarked using the GPT-4o model, while WebVoyager results are from their February 2024 report using GPT-4V.
A more detailed benchmark report will be released soon.
Limitations
While Fuji Chat offers advanced automation, some limitations exist:
1. Missing Semantics
Fuji Chat relies on semantic HTML and accessibility roles to identify interactive elements. Websites that do not follow accessibility standards may cause inconsistencies in detection.
2. Non-Semantic Web Technologies
Some applications, like Google Sheets, rely on Canvas/WebGL instead of standard HTML elements, making interaction difficult.
3. Limited Interaction Types
Currently, Fuji Chat can scroll entire webpages, but it may struggle with:
Scrolling within specific containers
Dropdown menus with excessive options
Drag-and-drop interactions
Workaround: Users can leverage Fuji Chat's "instructions" feature to manually guide interactions when necessary.
Future Development
Fuji Chat is not just a tool for automating online tasks—it is also a state-of-the-art web automation agent for complex workflows.
1. Supporting Programmatic Usage
Fuji Chat will introduce a JavaScript API to facilitate integration with automation frameworks like:
Puppeteer
Playwright
Selenium
This API will enable:
Automated performance benchmarking
Fuji Chat as a sub-agent in larger AI systems
Task execution triggered by external signals (e.g., scheduled tasks, email triggers, etc.)
Cloud-based Fuji Chat services
2. Cross-Tab Workflows
Most real-world automation spans multiple websites and requires context awareness across tabs. Fuji Chat will introduce:
Cross-tab memory to retain information between sessions
Seamless automation even when switching tabs
3. Copilot Mode
Fuji Chat will proactively seek user input when necessary, such as:
Logging in
Entering verification codes
Reviewing actions before proceeding
4. Long-Term & Decentralized Memory
To improve efficiency, Fuji Chat is developing a "Prior Knowledge Augmentation" system to store task-specific insights. Planned improvements include:
Task saving for quick re-use
Task & instruction sharing
A knowledge extraction tool to create general automation rules
A Wikipedia-like knowledge base where users can collaboratively enhance Fuji Chat’s capabilities
Autonomous site exploration to generate useful automation instructions
Fuji Chat is committed to building a powerful, intelligent AI partner for modern web automation. 🚀
Last updated