Decompiling the GPL violated Linux kernel using Evolutionary Algorithms

(far.chickenkiller.com)

10 points | by farooqkz 21 hours ago

14 comments

synctext 20 hours ago
This is a difficult project. The blog post seems to hint at reasonable feasibility, this stuff is hard! We build a less ambitious tool in the university lab: "ASTANA: Practical String Deobfuscation for Android Applications Using Program Slicing" [0].
Would advise to first read the reverse engineering related work. Genetic programming is just a technique best used when everything else has failed :-)
[0] https://arxiv.org/pdf/2104.02612
[-]
- farooqkz 13 hours ago
  I've read the abstract of your article. I am not much in the field of decompilers. Let alone deobfuscation. It's even hard for me to type it :))
  I think it is probably a safe assumption that the kernel binary found on Android devices is not obfuscated. Tho I probably need more research to confirm this.
- farooqkz 17 hours ago
  Thanks for the hints. Of course, it's very very difficult. But one thing I think you missed, is that I'm proposing a "byte equivalent decompilation". And after that, we should go into reading the code readable and understandable.
  If we could create a program doing all this, automatically or semi-automatically, it will be great-great because then not releasing the kernel code doesn't matter. I believe if enough effort and time is put into it, there is a good chance we could see such a thing in like 5-7 years.
  After that, we might be able to target the binary blobs, the propriety firmwares. Those might have some legal issue, of course. But as long as it is used only to write a FOSS alternative, that probably won't be an issue, I think.
temp0826 20 hours ago
Not exactly the same goal but in the ballpark, reminded me of the linux-libre patches (removes binary blobs and non-gpl code from the kernel)-
https://www.fsfla.org/ikiwiki/selibre/linux-libre/
ACCount37 20 hours ago
I thought this would be an actual process, not a blog post with idle musings.
[-]
- farooqkz 18 hours ago
  I wanted to append [idea] but it became too long. I think there should be balance "musings" and actual work. I learned very valuable lessons in my wakegp research. Even though I haven't completed it. If the one doing this research would be me, I would go for no actual work as long as I can. And when I start to do the work, I should do as little as I can and move very slowly. Like the saying "When you move slowly, you can see the path ahead clear". After all, we should know if the next step we go forward, the ground under us is solid.
  [-]
  - ACCount37 17 hours ago
    That sounds like a load of pseudo-profound bullshit to me.
    And this little idea you've described? It's exactly the kind of thing where both the fundamental issues and the specific implementation details would conspire to fuck you over hard at every single turn.
    If you actually tried to implement that idea of yours, you'd fail a lot, and maybe you'd learn something, and then it would be worth writing a post about that. About what you tried, what didn't work, what worked a little bit but not really.
    As is? You picked a domain where ideas are easy but implementation is hilariously hard, and then you had an idea but implemented nothing. You never tried anything and you never learned anything. The value is nil.
    [-]
    - farooqkz 13 hours ago
      I don't welcome rudeness towards people. Not me nor anyone else.
mmarx 20 hours ago
> Theoretically, we could also go for finding the semantically equivalent C code. However, last time I researched, checking semantic equivalency is a very complex problem. I think it was NP hard.
Already deciding whether two finite automata decide the same (regular) language is PSPACE-complete; it's undecidable for anything that can decide arbitrary context-free languages (which C programs can clearly do).
[-]
- farooqkz 18 hours ago
  Well I knew that checking if two binary circuits are equivalent is NP hard. Checking semantic equivalency of C code, of course, should be harder.
ginko 20 hours ago
I'm having a really hard time parsing this title.
[-]
- rnhmjoj 20 hours ago
  A dash between GPL and violated would probably help.
  [-]
  - farooqkz 18 hours ago
    Thanks for the feedback. I don't think I can edit the HN entry, but I'm updating me blog post.
drivingmenuts 12 hours ago
While you might be able to produce code that would re-compile into a binary resembling the original binary, how would you be able to generate the same code that was compiled to produce the original binary? Seems like there would be a significant difference between the actual original code and the decompiled version - enough that it could not be used to prove license violation.