Rendered at 19:39:37 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
saadn92 3 hours ago [-]
I built something like this but much worse. No extension, no recording, I literally sit there with Chrome devtools open, do the action manually, copy the 3-4 network requests into a Python script, and replay them with urllib and a cookie jar.
It's absurd but it works. Gumroad's cover image upload for example, their actual API can't do it, but the browser makes 3 requests (presign to their Rails Active Storage endpoint, PUT the binary to S3, POST the signed_blob_id to attach it). Captured those once in April, been replaying them since. I uploaded covers and thumbnails to 9 products today without opening a browser.
Obviously falls apart the second they change anything.
tim-projects 11 hours ago [-]
If you could take this recording and turn it into a playwright script - that would be a massive time saver.
Having to redo recordings once they break sounds like too much hassle.
arjunchint 11 hours ago [-]
Hey thats a great idea, we will take a look into exploring this export option. But how would it save time by being a Playwright script?
Right now since we have a custom sandbox to re-execute the code in, we are using our own syntax and exposed methods. So even now you can edit the generated script.
JSR_FDED 19 hours ago [-]
Maybe there’s a middle ground where a small local model can roll with the variations in a site that would break a script, while saving the per token costs?
arjunchint 15 hours ago [-]
We found Gemini Flash to be the sweet spot for both agentic actions as well as writing code. Even Flash-Lite is too hit or miss.
We are thinking through on self healing mechanisms like falling back to a live web agent and rewriting script.
amelius 20 hours ago [-]
The problem: I don't trust extensions one bit.
quarkcarbon279 16 hours ago [-]
The reason we open our client side code is to bring in the trust in putting rtrvr's DOM intelligence in your web apps - https://github.com/rtrvr-ai/rover/tree/main . Our monetization is super straight forward with subscription - https://www.rtrvr.ai/pricing . The experiences of some extensions shipping anything or selling user data comes in when people build them as side-gigs not when we pour more than year in building the highly accurate automation engine. We have cloud sandboxes too if you prefer executing with the same intelligence on cloud and not on your own device.
auditing the code is fairly straightforward if it isn't obfuscated. so long as it doesn't execute dynamic code that is. but the big issue is you can't control when the extension itself gets an update (to my knowledge). and it isn't uncommon to sell browsing data, or the extension itself to someone more shady than the original author down the road.
amelius 10 hours ago [-]
Yes, this exactly.
daylab 14 hours ago [-]
oh this is clever. running in main world dodges a lot of the usual scraping pain. how do you handle sites with strict csp that block inline scripts, is the extension somehow exempt?
arjunchint 12 hours ago [-]
We execute the code in a sandbox and proxy the fetch calls through main world!
rvz 23 hours ago [-]
Aren't there just many ways for the website to just break the automation?
Does this work on sites that have protection against LLMs such as captchas, LLM tarpits and PoW challenges?
I just see this as a never ending cat and mouse game.
acoyfellow 22 hours ago [-]
It is. They are saying “we are willing to chase the mouse for you for money”.
arjunchint 20 hours ago [-]
The bigger goal is to build and maintain a global library of popular automations. Users can also quickly re-record and recreate the scripts to update.
Since it runs inside your own browser, there should be no captchas or challenges. On failure it can fallback to our regular web agent that can solve captchas.
Big picture wise with the launch of Mythos it might just become impossible for websites to keep up, and they will have to go like Salesforce and just expose APIs for everything.
It's absurd but it works. Gumroad's cover image upload for example, their actual API can't do it, but the browser makes 3 requests (presign to their Rails Active Storage endpoint, PUT the binary to S3, POST the signed_blob_id to attach it). Captured those once in April, been replaying them since. I uploaded covers and thumbnails to 9 products today without opening a browser.
Obviously falls apart the second they change anything.
Having to redo recordings once they break sounds like too much hassle.
Right now since we have a custom sandbox to re-execute the code in, we are using our own syntax and exposed methods. So even now you can edit the generated script.
We are thinking through on self healing mechanisms like falling back to a live web agent and rewriting script.
PS: Also, our data policy if you are interested: https://www.rtrvr.ai/blog/rtrvr-ai-privacy-security-how-we-h...
Does this work on sites that have protection against LLMs such as captchas, LLM tarpits and PoW challenges?
I just see this as a never ending cat and mouse game.
Since it runs inside your own browser, there should be no captchas or challenges. On failure it can fallback to our regular web agent that can solve captchas.
Big picture wise with the launch of Mythos it might just become impossible for websites to keep up, and they will have to go like Salesforce and just expose APIs for everything.