How to Hack Transformers: Steering LLMs via Prompts, States, and Weight Edits

(arxiv.org)

1 points | by WASDAai a day ago

1 comments

WASDAai a day ago
TL;DR: The paper shows how you can steer LLMs by messing with prompts, hidden states, or weight edits—and warns that the same tricks can be used maliciously.