Fundamental
Under the simple UI of Fuji Chat, which resides in the side panel of your browser, is an LLM-powered agent enhanced by sensors and actuators. While the core of the agent is a streamlined ReAct-based system, we have put significant effort into ensuring that sensors provide relevant information and actuators execute tasks reliably.
Unlike traditional approaches that transmit entire HTML strings or screenshots to a language model, Fuji Chat employs a novel method. Instead of overwhelming the model with raw data, Fuji Chat sends:
A clean screenshot of the webpage.
An annotated screenshot, enhanced with textual descriptions that highlight key interactive elements like input fields and buttons.
This approach streamlines web navigation, making it easier to interact with essential UI components such as buttons and input fields through clicks and typing.
Note: All prompts (text and image) are sent directly to the API of your choice. Fuji Chat does not collect any user data or attempt to store any information.
Last updated