Fuji Chat Docs
  • Welcome
  • Getting Started
    • Quickstart
    • What is Fuji Chat?
  • BASICS
    • Installing and Running
    • Downloads
    • Browser Extension
    • Roadmap
    • Troubleshooting
    • Contributing
    • Credits
  • TECHNICAL OVERVIEW
    • Fundamental
    • Core
Powered by GitBook
On this page
  • Full Browser Automation in Your Side Panel
  • Prior Knowledge Augmentation System
  • Enhancing Website Understanding
  • Benchmarks
  • Limitations
  • Future Development
  1. TECHNICAL OVERVIEW

Core

Full Browser Automation in Your Side Panel

Harness the power of multi-modal Large Language Models to navigate and manage web tasks using simple, intuitive commands. Unlike many other web agents, Fuji Chat is available as a browser extension, allowing you to call it anytime you need it. Even in the middle of tasks, you can hand them off to the agent and watch it take care of the rest.


Prior Knowledge Augmentation System

Fuji Chat navigates websites with past experiences, improving its understanding of web dynamics. Users can customize and inject domain-specific insights in real-time by adding instructions in the settings menu.

In the future, Fuji Chat will support additional mechanisms to enhance its web interaction capabilities, including:

  • Custom text/CSS selectors

  • JavaScript execution


Enhancing Website Understanding

When analyzing a webpage and composing a prompt, Fuji Chat filters for relevant elements, ensuring a high success rate in interactions.

The system leverages HTML semantics and WAI-ARIA roles to accurately identify interactive elements:

Website
Interactive Elements (HTML Tags)
Interactive Elements (HTML Tags + WAI-ARIA)

amazon.com

534

547

twitter.com

56

121

github.com

1364

1446

Fuji Chat ensures that essential interactive elements are never missed, while filtering out redundant or hidden components, enhancing its web navigation accuracy.


Benchmarks

Fuji Chat’s ability to complete real-world tasks has been compared to other models using industry benchmarks. The results demonstrate superior success rates across multiple websites:

Model
Allrecipes
ArXiv
Apple
Google Search
BBC News
GitHub
Cambridge Dictionary

GPT-4 (All Tools)

11.1%

17.1%

44.2%

60.5%

9.5%

48.8%

25.6%

WebVoyager

53.3%

51.2%

65.1%

76.7%

61.9%

63.4%

65.1%

Fuji Chat

64.4%

65.1%

60.4%

81.4%

76.2%

73.2%

86.0%

Note: Fuji Chat was benchmarked using the GPT-4o model, while WebVoyager results are from their February 2024 report using GPT-4V.

A more detailed benchmark report will be released soon.


Limitations

While Fuji Chat offers advanced automation, some limitations exist:

1. Missing Semantics

Fuji Chat relies on semantic HTML and accessibility roles to identify interactive elements. Websites that do not follow accessibility standards may cause inconsistencies in detection.

2. Non-Semantic Web Technologies

Some applications, like Google Sheets, rely on Canvas/WebGL instead of standard HTML elements, making interaction difficult.

3. Limited Interaction Types

Currently, Fuji Chat can scroll entire webpages, but it may struggle with:

  • Scrolling within specific containers

  • Dropdown menus with excessive options

  • Drag-and-drop interactions

Workaround: Users can leverage Fuji Chat's "instructions" feature to manually guide interactions when necessary.


Future Development

Fuji Chat is not just a tool for automating online tasks—it is also a state-of-the-art web automation agent for complex workflows.

1. Supporting Programmatic Usage

Fuji Chat will introduce a JavaScript API to facilitate integration with automation frameworks like:

  • Puppeteer

  • Playwright

  • Selenium

This API will enable:

  • Automated performance benchmarking

  • Fuji Chat as a sub-agent in larger AI systems

  • Task execution triggered by external signals (e.g., scheduled tasks, email triggers, etc.)

  • Cloud-based Fuji Chat services

2. Cross-Tab Workflows

Most real-world automation spans multiple websites and requires context awareness across tabs. Fuji Chat will introduce:

  • Cross-tab memory to retain information between sessions

  • Seamless automation even when switching tabs

3. Copilot Mode

Fuji Chat will proactively seek user input when necessary, such as:

  • Logging in

  • Entering verification codes

  • Reviewing actions before proceeding

4. Long-Term & Decentralized Memory

To improve efficiency, Fuji Chat is developing a "Prior Knowledge Augmentation" system to store task-specific insights. Planned improvements include:

  • Task saving for quick re-use

  • Task & instruction sharing

  • A knowledge extraction tool to create general automation rules

  • A Wikipedia-like knowledge base where users can collaboratively enhance Fuji Chat’s capabilities

  • Autonomous site exploration to generate useful automation instructions

Fuji Chat is committed to building a powerful, intelligent AI partner for modern web automation. 🚀

PreviousFundamental

Last updated 3 months ago