This is colossal. It can creates embeddings on pretty much any type of format, video, audio, documents. The context is still a bit small compared to what we are used to in text, but this seems major
How does it compare with qwens open weight multimodal embedding model? Anyone know? This seems lesser form what i read, with the drawback of bei g via some api/model i dont have control over. Qwen gives great ebeddings out of the gate while also being steerable, i.e. you can supply a prompt to focus on embedding specific tasks with higher resolution, which in my tests has been mind-blowingly good. Not seeing the value add here.
This is colossal. It can creates embeddings on pretty much any type of format, video, audio, documents. The context is still a bit small compared to what we are used to in text, but this seems major
How does it compare with qwens open weight multimodal embedding model? Anyone know? This seems lesser form what i read, with the drawback of bei g via some api/model i dont have control over. Qwen gives great ebeddings out of the gate while also being steerable, i.e. you can supply a prompt to focus on embedding specific tasks with higher resolution, which in my tests has been mind-blowingly good. Not seeing the value add here.
what's the pricing and how does it compare to zembed-1 for text only embeddings?
Pricing is here: https://cloud.google.com/vertex-ai/generative-ai/pricing#emb...
Seems to be 20 cents per million tokens of text and 0.012 cents per image.