Author here – a few quick notes that didn’t fit in the main post:
What this is: a semantic routing system that detects bias and directs queries to different LLMs depending on context.
Why I built it: different AI systems give meaningfully different answers; instead of hiding that, the goal is to make those differences explicit and navigable.
Technical details:
Uses BGE-base-en-v1.5 embeddings (768-dim, 512 token capacity) via transformers.js.
Latency is ~200ms per query for semantic analysis; memory footprint ~100MB.
Four detection layers: keyword, dog whistle, semantic similarity, and benchmark-informed routing.
Goal optimization: routing decisions balance safety vs. performance. Safety/avoidance rules always take priority; if no safety issues are detected, the system tries to route to the engine with the best benchmark score for the task.
Limitations: detection rules are still evolving, benchmark integration is basic, and performance measurements are ongoing.
Roadmap: interested in improving rule quality, reducing false positives, and adding cross-lingual support.
Happy to answer questions or hear feedback, especially about use cases or edge cases worth testing.
Author here – a few quick notes that didn’t fit in the main post:
What this is: a semantic routing system that detects bias and directs queries to different LLMs depending on context.
Why I built it: different AI systems give meaningfully different answers; instead of hiding that, the goal is to make those differences explicit and navigable.
Technical details:
Uses BGE-base-en-v1.5 embeddings (768-dim, 512 token capacity) via transformers.js.
Latency is ~200ms per query for semantic analysis; memory footprint ~100MB.
Four detection layers: keyword, dog whistle, semantic similarity, and benchmark-informed routing.
Goal optimization: routing decisions balance safety vs. performance. Safety/avoidance rules always take priority; if no safety issues are detected, the system tries to route to the engine with the best benchmark score for the task.
Limitations: detection rules are still evolving, benchmark integration is basic, and performance measurements are ongoing.
Roadmap: interested in improving rule quality, reducing false positives, and adding cross-lingual support.
Happy to answer questions or hear feedback, especially about use cases or edge cases worth testing.