SearchSECO

The Software Heritage Graph (SHG) is a database that contains 8 billion source code files that have been collected from the worldwide software ecosystem. This archive is a treasure trove, but it is a big challenge to extract information of value from the SHG. 

We propose SearchSECO, a hash based index for code fragments that enables searching source code at the method level in the worldwide software ecosystem. Currently, it’s possible to identify files by their hashes in the SHG. We want to create a set of parsers that extract fragments (methods) from the code files and makes them findable. By making methods from the worldwide software ecosystem findable, we can perform more reliable license checks, search for vulnerabilities, and extract call graphs from those methods. 

We unearth the relationships between code fragments, code files, and their projects on a worldwide scale. This fine-grained data enables much richer analyses, significantly moving forward the field of empirical software engineering and its sub-field of repository mining.