OpenData Community Data Builder Hackathon Results
The OpenData Community hackathon in January of 2023 was judged in early February with bounties distributed by mid-February.
In this forum post, we highlight a few of the themes emerging from the hackathon made possible by the generous support of Gitcoin and our other sponsors and by a growing community of volunteers.
For a slideshow overview of the OpenData Community and the hackathon results - and our immediate plans as well - please take a look here:
ODC summary with summary of hackathon and immediate plans
You can also take a look at all of the results, available in this summary spreadsheet:
Judging Spreadsheet
The OpenData Community addresses two potentially existential threats to the future of web3:
-
Centralization at the data layer
-
Fraud and specifically Sybil attacks
It is crucial to the credibility and growth of web3 that we both try to prevent Sybils by raising their costs of forgery through the Gitcoin Passport protocol and other approaches AND that organizations such as the ODC help to provide transparency during and after the rounds so that communities are confident that their grants and funding rounds are effective. The world needs transparent funding of public goods - built upon a capture-resistant blockchain and closely related data layer.
Theme: Sybil and Grant reviews work together
One observation that many hackathon members and members of the FDD within Gitcoin have made repeatedly is that by looking at Grants we can learn about Sybils and vise-versa.
There are a variety of submissions that looked at these relationships, including one of the projects that the judges rated highly, titled “Grant Exploration”, available here:
When you dig into the project, you can find that a ratio of suspicious Sybils for a given Grant itself should trigger further analysis of that Grant. As such, the project does a detailed analysis of wallet behavior to determine which are likely Sybils.
A self-explanatory graph from the project:
Other projects which examine the validity of grants included a project that looked at a variety of approaches to determine whether a wallet may be a Sybil, including the previously documented “Donor DNA” developed by the Gitcoin FDD and others, and something called the Hamming distance which helps to identify potentially related donors and examines an enormous number of the relationships amongst all of the wallets.
The submission for the project includes a variety of colorful heatmaps - and also uses the signal of an unusual preponderance of Sybil contributions to call into question the integrity of particular Grants. The Jupyter notebook for this submission can be found at: https://nbviewer.org/github/timk11/odc-hackathon/blob/main/EDA_for_Sybil_attack_detection-v2.ipynb (warning - the notebook can be resource intensive to load)
However, these approaches are by design reactive. A decision to squelch a Sybil or to investigate further a Grantee is made based on the behavior of these entities during a round itself. What if we want to use the tools of the hackathon - exploratory data analysis, Lego building, and dashboards - to examine the risk of a grant or a Sybil proactively, before the round transpires?
As it happens, there were a few interesting responses that could be applied to proactive Sybil and Grant analysis, including this analysis stored on the Ocean Protocol marketplace: Grant Scoring via Title Accuracy and other features
The contestant in this submission trained machine learning models on a number of features, including the accuracy of the description of the grant project, the activity levels of the underlying Github, and other characteristics of the Grant project websites. They were able to predict which grants were more likely to be suspicious, a useful finding that could help to focus human reviewers.
Another submission focused specifically on social proof of the Grants and compared that to the votes that were submitted in support of each project. This ratio could itself be used as a Lego - a piece of analysis that takes in a breadth of data and returns a boolean, 1 or 0, indicating whether on the whole the criteria were met.
Theme: Legos are plugable
As the prior example indicates, Legos enable round operators and other analysts to more efficiently reuse algorithms. They also are inherently more transparent - we need to adopt a “transparency all the way down” mantra within web3 and public goods funding in particular in order to protect and boost the credibility of our coordination.
There are multiple useful posts on the Forums including this one introducing the subject of Legos:
https://gov.gitcoin.co/t/public-goods-legos-roadmap/12546
In addition, the documentation for Legos prepared by the Gitcoin FDD proved to be extremely useful to participants in explaining the concept and providing concrete examples as well:
https://github.com/Fraud-Detection-and-Defense/lego-docs
A quick thank you to @j-cook who has been an amazing collaborator; as a lead technical writer for the Eth Foundation and a scientist, he brings a level of professional expertise and intellectual curiosity that reflects itself not only in the above documentation but also in many FDD Forum posts upon which much of what ODC does is based.
Many of the leading submissions embraced the Lego design pattern, often combining Legos that the contestant developed along with other Legos that others had developed. For example, the following project built on the experience of the contestant in web2 fraud analysis to focus on behaviors such as the age of an account and other variables; additionally the contestant looked at whether the first transaction for multiple wallets originated from the same wallet, an approach credited to the TrustaLabs team from the October 2022 hackathon. As pointed out by the contestant and in considerable literature about fighting Sybils - one goal is to increase the cost of Sybil attacks. For example, a solution that flags and potentially squelches wallets with a common seed wallet may result in Sybil attackers needing to execute multiple transactions for each wallet before receiving their initial funds, greatly increasing the cost and complexity of their orchestration in preparation for a Sybil attack. As the potential rewards to Sybil attacks increase due to the success of public goods and other funding using Quadratic approaches, we can anticipate more attacks as well. The infinite game continues.
Using Legos to Defend from Sybil attacks
Theme: Dashboards are valuable
Lastly, starting with this hackathon the OpenData Community is supporting the creation of useful dashboards for Sybil analysis. These dashboards are intended to be used by program operators when examining the health of their rounds and in explaining their findings to their communities. We’d especially like to thank Kevin Owocki, Gitcoin founder and current CEO of Supermodular, for suggesting the inclusion of a dashboard bounty in this hackathon and for providing some guidance as to potentially useful approaches.
The dashboards inevitably bring together the underlying analysis. One of the winning solutions, for example, allows Grants and Wallets to be quickly examined with the aid of several Legos.
A live look at this dashboard is also available here:
https://www.grantlooker.xyz/projects
This work builds off of guidance from the FDD team, who previously published a dashboard for round reviews, as explained here: https://gov.gitcoin.co/t/public-goods-legos-roadmap/12546
We plan to support one or more Dashboards as a Sandbox project.
Sandbox
While the submissions highlighted here - and dozens of others - show the promise of the OpenData Community as a source of analysis, software, and qualified teams, in order to actually rely upon the algorithms, Legos, and software for production usage a program operator likely needs more. In particular, any user of these resources would want to know that the software would be maintained and improved over time and that it would become more broadly used, thereby further accelerating the pace at which it gathers and implements improvements.
We are about to officially launch our Sandbox program. The quick summary is that via the Sandbox program we will also formalize and improve our support of prexisting projects that we believe to be fundamental such as the TrueBlocks index. We will also get behind qualified hackathon projects and we may launch our own as well if we and our partners perceive there to be useful gaps in need of software and analysis.
You can learn more about our approach to Sandbox projects in this blog post:
https://opendatacommunity.org/blog/the-odc-sandbox-launch/
Conclusion:
A special thank you to the organizations that partnered with us to make the hackathon a success. Gitcoin, POKT, Ocean Protocol, Supermodular, TrustaLabs and TrueBlocks each contributed funding - and their expertise and guidance to the hackathon participants and to the OpenData Community itself.
All of these partners joined the ODC in Twitter Spaces and were active on the Discord channels as well. They also worked to suggest topics for the hackathon and promoted it as well.
We are now working closely with our founding sponsor Gitcoin and other organizations to ensure that the OpenData Community grows in a way that helps them. For example, Founding Sponsors play an informal role today - and can play a more formal role via governance in the future - at helping to select our direction including which types of Sandbox projects we should focus upon.
Please join us on our journey to protect web3 from fraudulent activity - and from the perils of centralization at the data layer. If you are a stakeholder in the success of web3 - then we likely view you as our stakeholder as well. Let’s work together.
To get started you can of course drop into our Discord:
Last but not least you can take a look at our website which also links to a variety of resources including the nascent but fast improving Landscape of useful open data tools, projects, and services:
Please do get in touch, provide us feedback, and lend a hand.