the neuronpedia logo

Decode: Neuronpedia, SAELens, and more

Open Source Interpretability Platform and Tools

DonateStart a fundraiser
A diagram that shows neuronpedia and its data being forked by researchers, startups, and labs, and the researchers in return contributing back to the global Neuronpedia for others to access. Also, bullet points: MIT-licensed code + data, standardized types/APIs, collaborate on shared tools.

Interpretability (the understanding of AI internals) is an unsolved problem - especially with the speed of its advancing capabilities, and the new architectures/developments being released every month.

Is interpretability needed? While it's possible that advanced AI is somehow "naturally aligned" to be pro-human and pro-Earth, there's no benefit to assuming that this is true. It seems unlikely that all advanced AI would be fully aligned in all the possible scenarios and edge cases.

Neuronpedia's role is to accelerate understanding of AI models, so that when they get powerful enough, we have a better chance of aligning them. If we can increase the probability of a good outcome by even 0.01%, that's an expected value of saving many, many current and future lives - certainly a worthwhile and meaningful endeavor.

Check out our announcement post for the details.

Code, issues, documentation available at our Github.

4TB+ in data available at our Public Datasets.

San Francisco, CA
neuronpedia.org

Donors

  • David Chanin

    Neuronpedia is amazing! Thank you for building this

    2