I still don't buy the we needed it to be a whole Browser and not a Chrome Extension argument:
- your interface is still literally a chrome extension side panel
- none of the agentic browsers from the bigger players like Atlas and Comet really took off either
I do think the server side integration is required:
- with rtrvr.ai a ton of users are integrating our web agent chrome extension via Remote MCP from chatgpt.com as well as triggering as an API endpoint remotely. Your implementation is limited to only local connections as I understand.
- the biggest unlock for users is running at scale, so just being able to launch a hundred cloud browsers, do a task, and return results while you do other things. So we see hybrid cloud/local execution as the key unlock for this year
Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
Last year was a lot of technical builders exploring the capabilities, and I am excited for this year of making these agentic browsers useful!
One simple example is an extension can't see cross origin iframes. This means it could never do soemthing like fill out a payment form for you if it's an extension.
Limited computation and action space is another as well as bot detection systems.
For example a javascript method trying to automate something like microsoft word in an iframe will have a tough time because the second you inject code in there they will block you.
We honestly haven't faced any bot detection or blocking issues. Owning the browser layer exposes to you much more detection just look at Comet getting blocked on Amazon etc.
what permissions are you talking about? No user permissions/any insecure permissions are needed to navigate cross origin iframes, shadow DOMs and likewise. It comes down to your architecture choices and capabilities - rtrvr can navigate these diff realms without ever taking debugger or such insecure permissions
> whole Browser and not a Chrome Extension argument
Both of us are definitely biased to think our own approach is better :)
But without owning the binary, we couldn't shipped today's feature -- Agent with access to your filesystem and being able to run shell commands like Claude Cowork.
> your interface is still literally a chrome extension side panel
Yep, our interface is a chrome extension to make iterating on the UX faster. But it uses a ton of C++ APIs that we expose under `chrome.browseros.*`
> Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
> But without owning the binary, we couldn't shipped today's feature -- Agent with access to your filesystem and being able to run shell commands like Claude Cowork
Chrome Extension can also access local files and can also execute LLM generated code in sandboxes
Really cool product. How do you plan to monetize it?
You guys need some marketing help. There’s a lot of potential here, but you don’t do a good job of selling it. Tell me what problems I’m going to be able to solve or what headaches it will eliminate. Can it going into that shitty Canvas app my kids’ school uses, identify outstanding assignments or low grades and send me a daily text summary? Can it automate buying everything on my grocery list and setting up delivery? Or look up flight options, ask me what I want and book it for me? Even better, I’m stuck having to look up international flights for 7 people in three households, get everyone to agree on one and then book them. Please build something that will do that.
Have you ever thought about a marketplace for premade workflows? Or a library of integrations that are already tested that a user can mix and match to create complex automations? Or access to more MCP servers?
For example, it would be really neat to trigger jobs that perform some task and then make a call to Twilio or something to send an alert. Or some building blocks that tie into my Square account or Amazon account. I want to be able to describe the results I want, but I don’t want to explain how to interact with a particular service and then test that.
I would love to be able to give a prompt like this: “review my item library in Square, identify items that are missing descriptions or are miscategorized, propose the fixes, and confirm with me before making any changes.” That’s an extremely tedious task that requires a lot of clicking and page loads. I hate it and I would pay for your product if you could save me that time.
Or this: “Every month, alert me to any fluctuations in product cost and which items in my Square catalog are affected. Highlight any items where my COGS exceeds 35%. All the invoices are available in my email.” That would be incredibly powerful. Doing this manually can take days.
> “Every month, alert me to any fluctuations in product cost and which items in my Square catalog are affected. Highlight any items where my COGS exceeds 35%. All the invoices are available in my email.” That would be incredibly powerful.
You could try this use case on on agent builder even today. We also have a scheduled tasks for you to schedule it to run monthly
> Have you ever thought about a marketplace for premade workflows?
We want to do this and are moving towards that! But we first need to make the premade (or user published) workflows very reliable.
At the chromium level, you have access to every single DOM element and coordinate space around it. So, when a click happens either user or agent, we have a neat way of enforcing required action (either allow it or nullify the click).
We are still at early version. And mostly targeting enterprise sites (like SAP) which don't change that often.
> we're adding browser-level guardrails (think IAM for agents)
This sounds interesting, but where would I go to see these guardrails and their implementation? I tried searching in the repository and couldn't find them.
What would be great is if it could work in the browser like Claude in chrome and communicate (with my control) back to objects on my desktop like my ide for example or really anything
Ohh, interesting, technically this should already be possible. Because we already package gemini-cli into the sidecar (bun) binary. We just have to create a good UX.
What angle are you looking at this from? Is it for convenience? Or do you not like terminal UI and need a web-friendly UI for these agents?
There are bunch of tedious / routine tasks that AI can automate.
I think the big hurdle is mostly education / shift in mindset. We are so used to doing the task manually that most of us (including me) don't pause to think if I should be doing this or can I give to an agent.
I had browseros do a bunch of data validation for me in my Dolibarr ERP system. It cross checked my new master data against our old ERP, flagging bad links and filling in missing data. I could have done it much quicker overall with the api and some scripting, but it was easy to just write a two line prompt telling it where the data is and how to manage disagreements. Then I just watched it run on a second monitor for a few hours while I worked other projects.
I used a local Ollama model and though it was kind of amazing that it worked. I couldn't turn a typical user lose with something like this yet, but I think I see the vision. I image a lot of automation could happen this way in the future. I put less effort into the prompt than I would have needed to spend teaching someone from the office pool to accomplish the same goal, and got a good enough result.
In practice I have found that I can accomplish the same results in a stricter, more accurate, and faster way just using codex on the command line with some scripting and API access, but that's not going to work for a lot of people and putting it in the browser is pretty convenient... The MCP server that's built in can also become a bit of an API for the entire web if you're careful in how you use it, which opens up possibilities for things that don't have real APIs.
Good question. We think the browser is becoming the new OS. It doesn’t really matter anymore if you’re on Windows, macOS, or Linux—the browser is where most work already happens.
We see a future where it’s the main gateway to everything, and where agents live and work alongside you inside the browser. That’s why we call it BrowserOS. :)
Is this really true? Mobile device users are all mostly forced to use apps rather than the browser for most stuff, and people on desktop PCs/laptops are probably either using them for gaming (all desktop apps), or work where a lot of stuff is desktop apps.
Sure regular consumer stuff like social media is webapps (if they're not mobile only), and if you're interacting with like salesforce or a customer support tracker or an issue tracker or something you're likely using a webapp, but the move to mobile devices for most consumer stuff means that people still using PCs are largely power users.
Hey cool stuff since last update!
I still don't buy the we needed it to be a whole Browser and not a Chrome Extension argument:
- your interface is still literally a chrome extension side panel
- none of the agentic browsers from the bigger players like Atlas and Comet really took off either
I do think the server side integration is required:
- with rtrvr.ai a ton of users are integrating our web agent chrome extension via Remote MCP from chatgpt.com as well as triggering as an API endpoint remotely. Your implementation is limited to only local connections as I understand.
- the biggest unlock for users is running at scale, so just being able to launch a hundred cloud browsers, do a task, and return results while you do other things. So we see hybrid cloud/local execution as the key unlock for this year
Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
Last year was a lot of technical builders exploring the capabilities, and I am excited for this year of making these agentic browsers useful!
Extensions are limited though.
One simple example is an extension can't see cross origin iframes. This means it could never do soemthing like fill out a payment form for you if it's an extension.
Limited computation and action space is another as well as bot detection systems.
For example a javascript method trying to automate something like microsoft word in an iframe will have a tough time because the second you inject code in there they will block you.
> One simple example is an extension can't see cross origin iframes
Sounds like a skill issue, our web agent is able to interact with cross origin iframes to for example solve captchas: https://www.youtube.com/watch?v=LD3afouKPYc
We honestly haven't faced any bot detection or blocking issues. Owning the browser layer exposes to you much more detection just look at Comet getting blocked on Amazon etc.
With specific user permission to do so sure but in general it is blocked.
You're still limited in lots of annoying ways though
what permissions are you talking about? No user permissions/any insecure permissions are needed to navigate cross origin iframes, shadow DOMs and likewise. It comes down to your architecture choices and capabilities - rtrvr can navigate these diff realms without ever taking debugger or such insecure permissions
Thanks!
> whole Browser and not a Chrome Extension argument
Both of us are definitely biased to think our own approach is better :)
But without owning the binary, we couldn't shipped today's feature -- Agent with access to your filesystem and being able to run shell commands like Claude Cowork.
> your interface is still literally a chrome extension side panel
Yep, our interface is a chrome extension to make iterating on the UX faster. But it uses a ton of C++ APIs that we expose under `chrome.browseros.*`
> Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
Thanks! We'll look into publishing a blog soon!
> But without owning the binary, we couldn't shipped today's feature -- Agent with access to your filesystem and being able to run shell commands like Claude Cowork
Chrome Extension can also access local files and can also execute LLM generated code in sandboxes
You can't run shell commands though. Bash is the most powerful tool in claude code!
https://github.com/browseros-ai/BrowserOS/issues/99#issuecom...
I didn't hear back there, but huzzah, it looks like this is in there. I'm glad to see it!
Thanks for initial feature request! We do read every single request :)
Yes, we expose BrowserOS as an MCP server -- that you can use from claude code, cursor, opencode, etc -- https://docs.browseros.com/features/use-with-claude-code
MCP server works out of box (unlike Chrome DevTools MCP which requires tricky setup).
Really cool product. How do you plan to monetize it?
You guys need some marketing help. There’s a lot of potential here, but you don’t do a good job of selling it. Tell me what problems I’m going to be able to solve or what headaches it will eliminate. Can it going into that shitty Canvas app my kids’ school uses, identify outstanding assignments or low grades and send me a daily text summary? Can it automate buying everything on my grocery list and setting up delivery? Or look up flight options, ask me what I want and book it for me? Even better, I’m stuck having to look up international flights for 7 people in three households, get everyone to agree on one and then book them. Please build something that will do that.
Keep at it because this thing is cool!
> You guys need some marketing help. There’s a lot of potential here, but you don’t do a good job of selling it.
Thank you for the feedback. Ack, we need to do a better job of marketing.
> How do you plan to monetize it? Our goal is to eventually to sell license for enterprise browsers.
Have you ever thought about a marketplace for premade workflows? Or a library of integrations that are already tested that a user can mix and match to create complex automations? Or access to more MCP servers?
For example, it would be really neat to trigger jobs that perform some task and then make a call to Twilio or something to send an alert. Or some building blocks that tie into my Square account or Amazon account. I want to be able to describe the results I want, but I don’t want to explain how to interact with a particular service and then test that.
I would love to be able to give a prompt like this: “review my item library in Square, identify items that are missing descriptions or are miscategorized, propose the fixes, and confirm with me before making any changes.” That’s an extremely tedious task that requires a lot of clicking and page loads. I hate it and I would pay for your product if you could save me that time.
Or this: “Every month, alert me to any fluctuations in product cost and which items in my Square catalog are affected. Highlight any items where my COGS exceeds 35%. All the invoices are available in my email.” That would be incredibly powerful. Doing this manually can take days.
> “Every month, alert me to any fluctuations in product cost and which items in my Square catalog are affected. Highlight any items where my COGS exceeds 35%. All the invoices are available in my email.” That would be incredibly powerful.
You could try this use case on on agent builder even today. We also have a scheduled tasks for you to schedule it to run monthly
> Have you ever thought about a marketplace for premade workflows?
We want to do this and are moving towards that! But we first need to make the premade (or user published) workflows very reliable.
IAM for agents sounds interesting but how is it reliably enforced? You also built evals?
Thanks!
> how is it reliably enforced?
At the chromium level, you have access to every single DOM element and coordinate space around it. So, when a click happens either user or agent, we have a neat way of enforcing required action (either allow it or nullify the click).
We are still at early version. And mostly targeting enterprise sites (like SAP) which don't change that often.
What use case did you have in mind?
> we're adding browser-level guardrails (think IAM for agents)
This sounds interesting, but where would I go to see these guardrails and their implementation? I tried searching in the repository and couldn't find them.
We are still in early versions of the feature! Haven't released on our repo yet.
What use case did you have? Happy to show a demo of current version we have (you can hit me up on discord or slack -- links available on our repo)
What would be great is if it could work in the browser like Claude in chrome and communicate (with my control) back to objects on my desktop like my ide for example or really anything
Ohh, interesting, technically this should already be possible. Because we already package gemini-cli into the sidecar (bun) binary. We just have to create a good UX.
What angle are you looking at this from? Is it for convenience? Or do you not like terminal UI and need a web-friendly UI for these agents?
Thanks a lot, i wanted to ask about the headless agent use case: How does it compare to using https://github.com/vercel-labs/agent-browser
BrowserOS should work with agent browser as well in headless mode!
On top of that, if you want headful mode, you can use our MCP server https://docs.browseros.com/features/use-with-claude-code
Would love to understand your use case! You can hit me up at nithin[at]browseros.com
Genuine question. What's the use case for agentic browsing?
There are bunch of tedious / routine tasks that AI can automate.
I think the big hurdle is mostly education / shift in mindset. We are so used to doing the task manually that most of us (including me) don't pause to think if I should be doing this or can I give to an agent.
I had browseros do a bunch of data validation for me in my Dolibarr ERP system. It cross checked my new master data against our old ERP, flagging bad links and filling in missing data. I could have done it much quicker overall with the api and some scripting, but it was easy to just write a two line prompt telling it where the data is and how to manage disagreements. Then I just watched it run on a second monitor for a few hours while I worked other projects.
I used a local Ollama model and though it was kind of amazing that it worked. I couldn't turn a typical user lose with something like this yet, but I think I see the vision. I image a lot of automation could happen this way in the future. I put less effort into the prompt than I would have needed to spend teaching someone from the office pool to accomplish the same goal, and got a good enough result.
In practice I have found that I can accomplish the same results in a stricter, more accurate, and faster way just using codex on the command line with some scripting and API access, but that's not going to work for a lot of people and putting it in the browser is pretty convenient... The MCP server that's built in can also become a bit of an API for the entire web if you're careful in how you use it, which opens up possibilities for things that don't have real APIs.
It seems cool, will it work in headless mode without X11/Wayland/.. ?
At the moment, our product is designed for headful mode.
But if you want to use our browser in headless and use playwright that would work too! (we are chromium fork)
Which local model works best with this? (Assuming MacOS with 32GB unified RAM)
gpt-oss 20B works well. You'll want at least 12k context length for agent mode.
why are you calling this an OS
Good question. We think the browser is becoming the new OS. It doesn’t really matter anymore if you’re on Windows, macOS, or Linux—the browser is where most work already happens.
We see a future where it’s the main gateway to everything, and where agents live and work alongside you inside the browser. That’s why we call it BrowserOS. :)
Is this really true? Mobile device users are all mostly forced to use apps rather than the browser for most stuff, and people on desktop PCs/laptops are probably either using them for gaming (all desktop apps), or work where a lot of stuff is desktop apps.
Sure regular consumer stuff like social media is webapps (if they're not mobile only), and if you're interacting with like salesforce or a customer support tracker or an issue tracker or something you're likely using a webapp, but the move to mobile devices for most consumer stuff means that people still using PCs are largely power users.
> if you're interacting with like salesforce or a customer support tracker or an issue tracker or something you're likely using a webapp
Precisely. I think most knowledge work (especially at business) still happens browser. That is the workflow we want to target!
While that may be true, honest feedback is that it is confusing, possibly even misleading. But I hope whatever you pick works for you