I think this is a great endeavor. I was thinking about a channel that I like watching on YouTube. They travel to exotic places by boat and film themselves, nature documentary style. To make good videos requires going to these places, a ton of filming, AND a ton of editing. They put out a video every 2 weeks or so on their trips. I imagine the editing is the hard part.
This is a long winded way of saying that I think creators need what you're making! People who have hours of awesome footage but have to spend dozens of hours cutting it down need this. Then also people who have awesome footage but aren't good at editing or hiring an editor, same thing. I'd love to see someone solve this so that 90th percentile editing is available to all, and then it can be more about who has the interesting content, rather than who has the interesting content and editing skills.
thanks! Mosaic can already do the rough cuts for you — so you can upload all your footage from your travel, and prompt it to "make a 2 minute highlight reel of your trip to Japan", for instance.
soon, we also plan to incorporate style transfer, so you could even give it a video from the channel you enjoy watching + your raw footage, and have the agent edit your footage in the same style of the reference video.
As a creator who films long form content, editing (specifically clipping for short form) is such a nightmare - this solves such a huge problem and the ui is insanely clean.
This is one of those ideas that seems obvious after you hear about it, yet somehow didn't exist yet. So many potential applications. Met the founder back in SF and he's one of the coolest, down to earth dudes there is. Best of luck to the team!
Damn, you beat me to it. I was building something similar but got too caught up optimizing the context extraction. I actually ended up building a full spec for it—basically a PoC of "grep for videos."
My end goal was to let an agent make semantic changes (e.g., "remove the parts where the guy in the blue dress is seen") by simply grepping the context spec for the relevant timestamps and using ffmpeg to cut them out.
Vector embeddings are fuzzy on finding boundaries. With my spec approach, my goal is to get precise start/end times for ffmpeg to do edits. The downside is, that there is a lot of pre-processing of raw footage in my approach. Vectors win on zero-shot flexibility here.
I just signed up for a Creator plan, but it looks like the automated "Thank you for being a Mosaic Creator" email going out is not configured correctly. Instead of having my company name, it referenced a different business name and description (that seems to exist/be accurate, so not a placeholder).
Really interesting direction.
The node-based canvas feels like a more scalable abstraction for video automation than the usual chat-only interface.
I’m curious how you’re handling long-form content where temporal context matters (e.g., emotional shifts, pacing, narrative cues).
Multimodal models are good at frame-level recognition, but editing requires understanding relationships between scenes, have you found any methods that work reliably there?
we've actually found that multimodal models are surprisingly good at maintaining temporal context as well
that being said, there's also a bunch of additional processing using more traditional CV / audio analysis we do to extract this information out as well (both frame-level and temporal) in your video understanding
for example, with the mean-motion analysis — you can see how subjects move over a period of time, which can help determine where important things are happening in the video, which ultimately can lead to better placements of edits.
I absolutely love your approach of "expert tools". If I understand your approach, you aren't just feeding a video into a multimodal LLM and asking it "what is the bounding box of the optimal caption region?" -- you have built tools with discrete algorithms (using traditional CV techniques) that use things like object detection boxes + traditional motion analysis techniques to give "expert opinions" to the LLM in the form of tool calls -- such as finding the regions of minimal saliency + minimal movement to be the best places for caption placement.
If the LLM needs to place captions, it calls one of these expert discrete-algorithm tools to determine the best place to put the captions -- you aren't just asking the LLM to do it on its own.
If I'm correct about that, then I absolutely applaud you -- it feels like THIS is a fantastic model for how agentic tools should be built, and this is absolutely the opposite of AI slop.
we're using a mix of out-of-the-box multimodal AI capability + traditional audio / video analysis techniques as part of our video understanding pipeline, all of which become context for the agent to use during its editing process
our original name was Frame, only to realize that frame.io existed already.
we brainstormed names for a while and had several notes full of possible names
mosaic is one which stood out to us because it not only represents artwork, but also the tiles (nodes) in the canvas come together to form your mosaic — we thought that was a fitting name
Hey, this is super cool. congrats on the product and the launch!
I'm building something exactly similar and couldn't believe my eyes when I saw the HN post. What i'm building (chatoctopus.com) is more like a chat-first agent for video editing, only at a prototype stage. But what you guys have achieved is insane. Wishing you lots of success.
thank you! chatoctopus looks pretty cool, I'm trying it out right now!
how did you find the chat-first interface to work out for video? what we found is that the response times can be so long that the chat UX breaks down a bit. how are you thinking about this?
I just clicked the link and encountered a non-scrollable, dark, fixed content pane with loads of flickering images and scrolling text with random font sizes without much meaning. I felt imprisoned, subjected to unexpected suffering, can't scroll away, got scared and raced for the window close button, and then breathed easy.
seems like the landing page is detracting from the main product, this is good feedback so thanks! For now, avoid the scaries and head directly to https://edit.mosaic.so to try the actual canvas interface
Since video is your thing, I feel like you need to just make a very edited demo reel and put all your energy into trying to get people to watch that video. Meaning, remove almost all text and bloat from the site and just show us all the cool stuff the product does for/to video editing. Distill it to 60-120 seconds and put that on your landing, hell put it on auto play if you want to, so long as it's clear that is the one thing I'm supposed to be paying attention to
a lot of tooling is being built around generative AI in particular, but there's still a big gap for people that want to share their own stories / experiences / footage but aren't well-versed with pro tools.
valid feedback on the landing page — something we'll add in.
Good luck. I've dabbled with this myself and ultimately decided that DaVinci Resolve would end up doing this natively. But then again they haven't yet so who knows!
Some feedback initially on the landing page, looks great but I thought that there is, for me, too much motion going on on the homepage and the use cases page. May be an unpopular opinion!
Agreed, homepage was confusing for me also. I tried to scroll around and see a demo. For a product like this that is so visual, I expected to be able to find a 30s demo clip somewhere but couldn't see one on the homepage or product page (and the scrolling on the product page was annoying for me).
i playback parts of the cinematic edit I made to the conversation between Dwarkesh Patel and Satya Nadella (e.g. added cinematic captions, motion graphics)
i can post the full edit as well if you're interested
> We got frustrated trying to accomplish simple tasks in video editors like DaVinci Resolve and Adobe Premiere Pro. Features are hidden behind menus, buttons, and icons, and we often found ourselves Googling or asking ChatGPT how to do certain edits.
Hidden behind a UI? Most of the major tools like blade, trim, etc. are right there on the toolbars.
> We recorded hours of cars driving by, but got stuck on how to scrub through all this raw footage to edit it down to just the Cybertrucks.
Scrubbing is the easiest part. Mouse over the clip, it starts scrubbing!
I’m being a bit tongue in cheek and I totally agree there is a learning curve to NLE’s but those complaints were also a bit striking to me.
hey! You're right that most of the basic tools like splitting / trimming are available right in the timeline. but things like adding a keyframe to animate a counter, for instance, I had no idea where to go or how to start.
Scrubbing is easy enough when you have short footage, but imagine scrubbing through the footage we had of 5 hours of cars driving by, or maybe a bunch of assets. This quickly becomes very tedious.
I don’t need to imagine, I do it haha but again I was being tongue in cheek. I personally would love an effective tool that can mark and favorite clips for me based on written prompts. Would save me an awful amount of time!
Now? Mostly long form educational content. But historically? Everything more or less! Freelancer for about 15 years until my current in-house producer role.
I'm really tired of editing videos in the cloud. I'm also also tired of all these AI image and video tools that make you work over a browser. Your workflow seems so second class buried amongst all the other browser tabs.
I understand that this is how to deploy quickly to customers, but it feels so gross working on "heavy" media in a browser.
we've done a ton of work to optimize the uploads / downloads / transcoding of videos to handle beefy files using proxies, and also allow you to XML export back to traditional editing tools that can link back to your "heavy" media, but I hear you and I think anything running locally on device is just going to feel faster
it does present its own set of challenges, but something we've thought about
There's plenty of great native desktop apps for video editing. And there have been for almost 30 years. I also don't understand why anyone would want to use a browser for this.
When I see a hn post with no critical comments I assume all comments are either seeded or biased (commenting on my own bias)
scroll down and you'll see all the critical comments about the landing page lol
Submarining - well-known issue on HN.
it's hilarious how many have less than 5 karma.
I think this is a great endeavor. I was thinking about a channel that I like watching on YouTube. They travel to exotic places by boat and film themselves, nature documentary style. To make good videos requires going to these places, a ton of filming, AND a ton of editing. They put out a video every 2 weeks or so on their trips. I imagine the editing is the hard part.
This is a long winded way of saying that I think creators need what you're making! People who have hours of awesome footage but have to spend dozens of hours cutting it down need this. Then also people who have awesome footage but aren't good at editing or hiring an editor, same thing. I'd love to see someone solve this so that 90th percentile editing is available to all, and then it can be more about who has the interesting content, rather than who has the interesting content and editing skills.
thanks! Mosaic can already do the rough cuts for you — so you can upload all your footage from your travel, and prompt it to "make a 2 minute highlight reel of your trip to Japan", for instance.
soon, we also plan to incorporate style transfer, so you could even give it a video from the channel you enjoy watching + your raw footage, and have the agent edit your footage in the same style of the reference video.
> you can upload all your footage from your travel, and prompt it to "make a 2 minute highlight reel of your trip to Japan"
In relation to the demo requests below, I think this would be a good example of how an average person might use your platform.
for a demo, check out this one that I put together using 81 clips from a skydiving trip we took in Monterey, CA:
https://edit.mosaic.so/links/c51c0555-3114-45f4-ab8f-c25f172...
As a creator who films long form content, editing (specifically clipping for short form) is such a nightmare - this solves such a huge problem and the ui is insanely clean.
Will be using this a ton in the future
These comments real sus.
i agree, things are a bit too kind. give me some more feedback.
You can see the care in every little decision, workflow, and feature — I’ve never had this much fun editing videos.
I didn’t expect great video editing to become democratized so quickly. Kudos to the team!!
- a happy customer
This is one of those ideas that seems obvious after you hear about it, yet somehow didn't exist yet. So many potential applications. Met the founder back in SF and he's one of the coolest, down to earth dudes there is. Best of luck to the team!
thank you so much for the kind word!
Damn, you beat me to it. I was building something similar but got too caught up optimizing the context extraction. I actually ended up building a full spec for it—basically a PoC of "grep for videos."
My end goal was to let an agent make semantic changes (e.g., "remove the parts where the guy in the blue dress is seen") by simply grepping the context spec for the relevant timestamps and using ffmpeg to cut them out.
How are you extracting context from videos?
how would this be different from vector embeddings / semantic search?
Vector embeddings are fuzzy on finding boundaries. With my spec approach, my goal is to get precise start/end times for ffmpeg to do edits. The downside is, that there is a lot of pre-processing of raw footage in my approach. Vectors win on zero-shot flexibility here.
if you have an example you could share i'd be very curious on what you mean.
I just signed up for a Creator plan, but it looks like the automated "Thank you for being a Mosaic Creator" email going out is not configured correctly. Instead of having my company name, it referenced a different business name and description (that seems to exist/be accurate, so not a placeholder).
Hey! Thanks for calling this out — looking into what happened here & fixing right now.
This has been fixed now.
Agree this looks very promising.
thank you! if you get a chance to try it, let me know if you have any feedback
Really interesting direction. The node-based canvas feels like a more scalable abstraction for video automation than the usual chat-only interface. I’m curious how you’re handling long-form content where temporal context matters (e.g., emotional shifts, pacing, narrative cues).
Multimodal models are good at frame-level recognition, but editing requires understanding relationships between scenes, have you found any methods that work reliably there?
hey, thanks for the comment!
we've actually found that multimodal models are surprisingly good at maintaining temporal context as well
that being said, there's also a bunch of additional processing using more traditional CV / audio analysis we do to extract this information out as well (both frame-level and temporal) in your video understanding
for example, with the mean-motion analysis — you can see how subjects move over a period of time, which can help determine where important things are happening in the video, which ultimately can lead to better placements of edits.
I’ve had a lot of fun with Remotion and Claude Code for CLI video editing. I’ve been impressed with how much traditional video editing I can manage.
I will be checking this out!
that's super interesting — what kind of things have you done with remotion and Claude Code?
they're very powerful, when you put them together, it almost feels like Cursor for Video Editing
I absolutely love your approach of "expert tools". If I understand your approach, you aren't just feeding a video into a multimodal LLM and asking it "what is the bounding box of the optimal caption region?" -- you have built tools with discrete algorithms (using traditional CV techniques) that use things like object detection boxes + traditional motion analysis techniques to give "expert opinions" to the LLM in the form of tool calls -- such as finding the regions of minimal saliency + minimal movement to be the best places for caption placement.
If the LLM needs to place captions, it calls one of these expert discrete-algorithm tools to determine the best place to put the captions -- you aren't just asking the LLM to do it on its own.
If I'm correct about that, then I absolutely applaud you -- it feels like THIS is a fantastic model for how agentic tools should be built, and this is absolutely the opposite of AI slop.
Kudos!
thanks for the comment, thats exactly right
we're using a mix of out-of-the-box multimodal AI capability + traditional audio / video analysis techniques as part of our video understanding pipeline, all of which become context for the agent to use during its editing process
Can we stop with the overloaded names? "Mosaic" is a well-known web browser.
naming is hard
our original name was Frame, only to realize that frame.io existed already.
we brainstormed names for a while and had several notes full of possible names
mosaic is one which stood out to us because it not only represents artwork, but also the tiles (nodes) in the canvas come together to form your mosaic — we thought that was a fitting name
Hey, this is super cool. congrats on the product and the launch!
I'm building something exactly similar and couldn't believe my eyes when I saw the HN post. What i'm building (chatoctopus.com) is more like a chat-first agent for video editing, only at a prototype stage. But what you guys have achieved is insane. Wishing you lots of success.
to healthy competition!
thank you! chatoctopus looks pretty cool, I'm trying it out right now!
how did you find the chat-first interface to work out for video? what we found is that the response times can be so long that the chat UX breaks down a bit. how are you thinking about this?
I just clicked the link and encountered a non-scrollable, dark, fixed content pane with loads of flickering images and scrolling text with random font sizes without much meaning. I felt imprisoned, subjected to unexpected suffering, can't scroll away, got scared and raced for the window close button, and then breathed easy.
seems like the landing page is detracting from the main product, this is good feedback so thanks! For now, avoid the scaries and head directly to https://edit.mosaic.so to try the actual canvas interface
Since video is your thing, I feel like you need to just make a very edited demo reel and put all your energy into trying to get people to watch that video. Meaning, remove almost all text and bloat from the site and just show us all the cool stuff the product does for/to video editing. Distill it to 60-120 seconds and put that on your landing, hell put it on auto play if you want to, so long as it's clear that is the one thing I'm supposed to be paying attention to
yeah I think a demo reel of a BEFORE vs AFTER immediately somewhere in the hero even or right below it would be helpful
I've put the /edit and /docs links in the first sentence above to soften the blow as well :)
They really managed to handcraft a unique user experience, that's for sure.
we did but the landing page seems to be detracting from it — head directly to https://edit.mosaic.so to try the actual canvas interface
I had the same reaction. About what you would expect from a team steeped in the Tesla mindset.
Please don't cross into personal attack. We're trying for the opposite on this site.
https://news.ycombinator.com/newsguidelines.html
thanks for the feedback — you can head directly to https://edit.mosaic.so to try the actual canvas interface
Very cool. It definitely feels to me that the power of pro tools should be available to more people with AI.
Would have been nice if there was a killer demo on your landing page of a video made with Mosaic.
that's our perspective as well.
a lot of tooling is being built around generative AI in particular, but there's still a big gap for people that want to share their own stories / experiences / footage but aren't well-versed with pro tools.
valid feedback on the landing page — something we'll add in.
The problem is, any video demo of a tool like this is just an entirely unrelated video.
can you clarify what you mean here? check out this demo video: https://screen.studio/share/SP7DItVD
this is going to save me so much time, hell yeah guys!
thank you! let us know if you have any feedback!
Damn this is good.
Thank you! :)
Good luck. I've dabbled with this myself and ultimately decided that DaVinci Resolve would end up doing this natively. But then again they haven't yet so who knows!
Good luck with it, sincerely.
thanks! curious what you started dabbling with and if you have any thoughts to share :)
Hey, good luck with Mosaic.
Some feedback initially on the landing page, looks great but I thought that there is, for me, too much motion going on on the homepage and the use cases page. May be an unpopular opinion!
Agreed, homepage was confusing for me also. I tried to scroll around and see a demo. For a product like this that is so visual, I expected to be able to find a 30s demo clip somewhere but couldn't see one on the homepage or product page (and the scrolling on the product page was annoying for me).
the sad part is spent so long on the product page scrolling animation haha
very valid point though — I think a demo clip of a BEFORE vs AFTER immediately somewhere in the hero even or right below it would be helpful
thanks for the feedback
valid points, thanks for the feedback. i had gone for a certain aesthetic but you're right in that it may be a bit too overwhelming.
best of luck guys!!
thank you! let us know if you have any feedback!
this is so cool, can we see some demos of edits you'd make with it?
thanks! check out the demo video here of the latest version of the interface: https://screen.studio/share/SP7DItVD
i playback parts of the cinematic edit I made to the conversation between Dwarkesh Patel and Satya Nadella (e.g. added cinematic captions, motion graphics)
i can post the full edit as well if you're interested
Mosaic team dev here Hanging in the comments all day and pushing updates as fast as we can -really appreciate the feedback!
Is there a way to keep up to date on updates and new announcements? TIA.
yes! please join our discord https://discord.gg/26SAZzBTaP or follow us on X https://x.com/mosaic_so to keep up to date on updates
> We got frustrated trying to accomplish simple tasks in video editors like DaVinci Resolve and Adobe Premiere Pro. Features are hidden behind menus, buttons, and icons, and we often found ourselves Googling or asking ChatGPT how to do certain edits.
Hidden behind a UI? Most of the major tools like blade, trim, etc. are right there on the toolbars.
> We recorded hours of cars driving by, but got stuck on how to scrub through all this raw footage to edit it down to just the Cybertrucks.
Scrubbing is the easiest part. Mouse over the clip, it starts scrubbing!
I’m being a bit tongue in cheek and I totally agree there is a learning curve to NLE’s but those complaints were also a bit striking to me.
hey! You're right that most of the basic tools like splitting / trimming are available right in the timeline. but things like adding a keyframe to animate a counter, for instance, I had no idea where to go or how to start.
Scrubbing is easy enough when you have short footage, but imagine scrubbing through the footage we had of 5 hours of cars driving by, or maybe a bunch of assets. This quickly becomes very tedious.
I don’t need to imagine, I do it haha but again I was being tongue in cheek. I personally would love an effective tool that can mark and favorite clips for me based on written prompts. Would save me an awful amount of time!
curious — what kind of content do you edit?
Now? Mostly long form educational content. But historically? Everything more or less! Freelancer for about 15 years until my current in-house producer role.
obligatory https://news.ycombinator.com/item?id=9224
Can you make this a desktop app?
I'm really tired of editing videos in the cloud. I'm also also tired of all these AI image and video tools that make you work over a browser. Your workflow seems so second class buried amongst all the other browser tabs.
I understand that this is how to deploy quickly to customers, but it feels so gross working on "heavy" media in a browser.
we've done a ton of work to optimize the uploads / downloads / transcoding of videos to handle beefy files using proxies, and also allow you to XML export back to traditional editing tools that can link back to your "heavy" media, but I hear you and I think anything running locally on device is just going to feel faster
it does present its own set of challenges, but something we've thought about
There's plenty of great native desktop apps for video editing. And there have been for almost 30 years. I also don't understand why anyone would want to use a browser for this.
there is some friction even in downloading a new app
if our goal is to bring more people into the fold, minimizing the steps for them to start editing is something we want to optimize for
that being said, being on the browser presents its own set of challenges, many of which are rightfully mentioned in this thread
YOOOO, this is super awesome. Love this for you all. Lets make life easier for more creators.
thanks!
This is so cool. Good luck with your venture.
Thank you :)
Not related to NCSA Mosaic (RIP).
if you take a snippet of Ben Horowitz's interview out of context, he has a lot of good things to say about our product :)