SourceCred

A Gentle Introduction to Cred


#1

SourceCred, as the name might suggest, is all about attributing cred. Cred is a metric that describes every contribution and contributor in a project, giving a sense of how important they were.

For example, forum posts (like this one) can earn cred. If this post earns 5 cred, and another post earns 10 cred, then that other post is considered twice as important as this one. (Note: SourceCred doesn’t load posts from this forum yet.)

The contributions are arranged in a graph, where contributions are nodes, and have edges indicating how they relate to other contributions. For example, if someone writes a reply to this post, there will be an edge connecting the reply and this post. Edges have “types” which give some information about what kind of edge they are; that edge might have an “IS_REPLY” type.

Contributors—like you or me—are also nodes in the graph, and are connected to the contributions that they create. So there is an “AUTHORS” edge connecting me (the author) to this post.

You can think of each edge in the graph as being a “thank you”. Thus, this post thanks me for writing it. The “thank yous” can be bidirectional; likely your reply thanks this post for being written, and the post thanks your reply for participating with it.

SourceCred converts this graph into a numerical score via PageRank. Basically, we assign cred to the nodes so that every node receives cred from every node that thanks it, and in turn sends its cred to every node that it thanks. This means that cred accumulates at important nodes. For example, a core maintainer is ‘thanked’ by all of the posts, comments, and issues that they’ve written, so they have a lot of cred. On the other hand, a spam post on the forum may have been thanked by no-one, so it will have very little cred. (PageRank is a very interesting algorithm, and was actually the basis of Google search! If you want to learn more, I recommend the original PageRank paper.)

One important thing to remember is that amount of cred a node receives is the same as the amount that it sends to other nodes. This means that being thanked by a high-cred node is much more valuable than being thanked by a low-cred node, especially if that high-cred node didn’t thank many other nodes.

We can explore all of this in the SourceCred explorer. To get started with the explorer, load a repository and then run PageRank. It will then show all of the user nodes in the graph, sorted by their cred.

In the screenshot below, we can see a list of all the users that contributed to SourceCred, sorted by their cred.

Remember that the graph actually contains every contributor and contribution to SourceCred. The explorer defaults to showing just the user nodes, but we can use the filter select to find every node instead.

If we expand a single node, we can see how that node received its cred via its connections to other nodes. At the top level, it aggregates groups of connections based on the type of edge, and the type of node the edge connects to. The percentages show what fraction of the node’s cred came from that connection, and the numbers show how much total cred came from that connection.

Then, diving down within a particular group of connections, we can see all of the individual edges along with how much cred they contributed.

If we want to learn more about a particular edge, we can expand it to see the node that edge connects to. This gives us the ability to dive into the graph from a fresh starting point. As you go “deeper” in your exploration of the graph, the color becomes deeper as well.

Nodes and edges have weights, which make them more or less important. The effect of edge weights is straightforward: when cred is flowing out of a node to its neighbors, the amount that flows to each neighbor is proportional to its edge weight. Node weights take effect by modifying the edge weights: edges pointing to a high weight node get more weight. (We might change how this works in the future.)

Right now, weights can only be configured at the type level. You can open the weight configuration in the cred explorer by clicking the button labeled “Show weight configuration”. Using the weight config, you can make certain node or edge types more important than others. For example, maybe you think authors edges should be more important than references edges, or pull request nodes should be more important than comment nodes. You can express that using the weight config.

The weight config also lets you set “directionality” of edges. Remember, every edge technically points in both directions. The directionality lets you make it point more in one direction than the other. If the directionality is 0.5, then cred flows forward and backwards in equal amounts. If the directionality is higher, say 0.9, then 90% of the cred will flow forward and only 10% will flow back. The edges are always named as a verb phrases, so that the edge point from subject to object, e.g. “authors” edges point from author to content, and “has parent” edges point from child to parent.

In the future, we plan to add a more powerful weight configuration system called heuristics. Heuristics will provide a way of evaluating different nodes within the same type: for example, you could add a heuristic that pull requests that touch many lines of code are more important, or that forum posts that are just a link are not very important. Heuristics will be pluggable, so that projects can define their own heuristics.

One nice thing about this system is that it’s very flexible and general-purpose. The SourceCred core system creates algorithms for attributing cred, and tools for exploring and moderating cred distributions. All of the actual nodes and edges come from plugins. SourceCred is focused on creating cred for open-source, so we’ll be putting a lot of attention into the GitHub plugin, the Git plugin, and other source-code related plugins. However, SourceCred could be used for many different applications. For example, in a music-oriented community, songs and samples could be nodes; songs could thank the samples they use, remixes could thank the originals, etc. I think that academic papers and citations are another natural domain for cred.

Let me know with a reply if anything here is unclear, or if you have improvements to suggest. You’ll earn some cred in the process :wink:


#2