DeepSeek: Inference-Time Scaling for Generalist Reward Modeling

(arxiv.org)

99 points | by tim_sw 17 hours ago

17 comments

ALLTaken 4 hours ago
Not jus being impressed that every paper coming out is SOTA, but also leads the way in being Open-Source in the pure definition of OSS, even with permissible licensing.
Let's not confuse the company with the country by over-fitting a narrative. Popular media is reenforcing hatred or anything that sponsors them, especially to weaker groups. Less repercussions and more clicks/money to be made I guess.
While Politicians may hate each other, Scientists love to work with other aspiring Scientists who have similar ambitions and the only competition is in achieving measurable success and the reward it means to the greater public.
Without any bias, but it's genuinely admirable when companies release their sources to enable faster scientific progress cycles. It's ironic that this company is dedicated to finance, yet shares their progress, while non-profits and companies dedicated purely to AI are locking all knowledge about their findings from access.
Are there other companies like DeepSeek that you know of that commonly release great papers? I am following Mistral already, but I'd love to enrich my sources of publications that I consume. Highly appreciated!
[-]
- wood_spirit 4 hours ago
  When OpenAI surged ahead Meta ended up giving away its incredibly expensive to make llama model to reduce the OpenAI valuations.
  Is DeepSeeks openness in part to reduce the big American tech companies?
  [-]
  - ALLTaken 3 hours ago
    Correlation isn't causation, I hate to say this, but here's really applicable. Facebook aka Meta has always been very opensource. Let's not talk about the license though. :)
    Why do you imply malice in OSS companies? Or for profit companies opensourcing their models and sourcecode?
    [-]
    - mwigdahl 3 hours ago
      Personally I don't impute any malice whatsoever -- these are soulless corporate entities -- but a for-profit company with fiduciary duty to shareholders releasing expensive, in-house-developed intellectual property for free certainly deserves some scrutiny.
      I tend to believe this is a "commoditize your complement" strategy on Meta's part, myself. No idea what Deepseek's motivation is, but it wouldn't surprise me if it was a similar strategy.
    - throwaway314155 an hour ago
      Meta is decidedly not an "OSS company" no matter how much they put out.
      [-]
      - SXX an hour ago
        In this case there are very few truly "OSS companies" except for Red Hat and few other Linux distribution maintainers. Even companies centered around open source like Gitlab are usually generate most of their revenue of proprietary products or use liceses like BSL.
  - phoronixrly 3 hours ago
    If only totalitarian nation states used their subjects' money to undermine the dominance of US-based software vendors by releasing open-source alternatives created with slave labour... Oh wait, it can't work because software patents are here to the rescue again ... Wait, open source is communism? Always has been. /s
- Febra33 3 hours ago
  > Let's not confuse the company with the country
  What's wrong with China? They're wonderful in the OSS ecosystem.
  [-]
  - echelon 3 hours ago
    It varies on a company to company basis. BOOX, for instance, are notorious GPL violators.
    There's also significant alpha in releasing open weights models. You get to slow down the market leaders to make sure they don't have runaway success. It reduces moats, slows funding, creates a wealth of competition, reduces margin. It's a really smart move if you want to make sure there's a future where you can compete with Google, OpenAI, etc. There's even a chance it makes those companies bleed a little. The value chain moves to differently shaped companies (tools, infra) leaving space for consumer and product to not necessarily be won by the "labs" companies.
- refulgentis 2 hours ago
  I love open source and the general vibe of good vibes you're bringing, but...this isn't SOTA, or close, even on the papers own terms. (i.e. excluding models released the last 6 months, including their own, which is a strange, yet understandable, choice given the results they report)
  Quickest way to show this:
  - Table 2, top of page 7
  - Gemma 2 27B, 0 interventions, has 94.1/56.6/60.2
  - Gemma 2 27B, with all their interventions, has 86/64/69.
  - Gemma 2 27B, with all their interventions, sampled 32 times, is at 90.4/67.2/70.3.
  - Gemma 2 27B came out in...June 2024. :/
  Quick heuristics employed here:
  - What models did they compare against? (this isn't strictly an issue, the big screaming tell is "What models did they compare against compared to their last N papers?"
  - How quickly does the paper have to move towards N samples, and how big does N get before they're happy enough to conclude? (32). How much does that improve performance on their chosen metric? (1.8%)
resters 4 hours ago
DeepSeek R1 is by far the best at writing prose of any model, including Grok-3, GPT-4o, o1-pro, o3, claude, etc.
Paste in a snippet from a book and ask the model to continue the story in the style of the snippet. It's surprising how bad most of the models are.
Grok-3 comes in a close second, likely because it is actually DeepSeek R1 with a few mods behind the scenes.
[-]
- vessenes 3 hours ago
  why do you think that grok 3 is deepseek, out of curiosity?
  [-]
  - azinman2 3 hours ago
    Yes that’s a pretty giant accusation, especially given they’re buying boatloads of GPUs and have previous versions as well (it’s not like they’re starting with 3).
bilsbie 20 minutes ago
Any idea why I lost interest in deep seek? I used it and grok3 a whole bunch when they first came out but now I’ve fallen back to Claude for everything.
mentalgear 2 hours ago
Happy to see deekseek using the correct (and much more idiomatic) term "inference-time scaling", instead of the grotesque construction of "test-time compute" that openAI came up with.
ftbsqcfjm 4 hours ago
Interesting work on open-ending language models to foster imagination and narrative generation. The idea of role-playing as different characters is novel. I wonder how well it would generalize to non-fantasy domains and if the lack of grounding could lead to hallucinations. Excited to see where this research goes!
[-]
- NitpickLawyer an hour ago
  > The idea of role-playing as different characters is novel.
  It is not. I remember Karpathy being really excited about the "1 million gpt personas" dataset and highlighted it as a way to avoid reward hacking in RLAIF. That was 3-6 months ago I believe.
  Of course paper / code / weights beats idea, and it's exciting to see how far this can go.