Prof. Martin Steinegger

Proposing New Evolutionary Connections from 214 Million Predicted Protein Structure Database

Proteins are vital for cellular processes, and understanding their structure aids in studying function and evolution. AlphaFold's database offers 214 million predicted protein structures. We developed Foldseek cluster, clustering millions of structures. It revealed 2.27M structural clusters, 31% of which lack annotations, potentially indicating novel structures. Evolutionary analysis suggests ancient origins, with 4% being species-specific. This resource also helps predict domain families and their relationships, uncovering remote homology examples. Notably, human immune-related proteins share structural similarities with prokaryotic species, showcasing the resource's value in studying protein function and evolution across the tree of life.

