Gitcoin Founded OpenData Community kicks off DataBuilder Hackathon

epowell101 · January 4, 2023, 3:52pm

DataBuilder Hackathon: FIght Sybils, Decentralize Data

Organized by the fast-growing OpenData Community, and sponsored by several leading DAOs and web3 organizations, the hackathon offers contestants $40,000 in bounties. The hackathon kicks off on January 5th and runs until January 31st.

Contestants will compete - and collaborate - to find insightful anti-Sybil algorithms and to package anti-Sybil algorithms into composable legos. Contestants that use decentralized technologies including those highlighted on the OpenData Community landscape will be preferred. Bounties are also available for contributions to FAQs and other materials.

Interested competitors can learn more and register here:
https://gitcoin.co/hackathon/DataBuilders/onboard

Community members and judges will be discussing the hackathon on a Twitter Space on January 5th: https://twitter.com/i/spaces/1yoJMZLPdZXxQ

Protecting Public Goods Funding and Data Decentralization

DataBuilders contestants will be provided with data from the recent Unicef / Gitcoin and Fantom / Gitcoin public goods funding rounds. Gitcoin helped to found the OpenData Community in part to ensure that as Gitcoin decentralizes - it also decentralizes its expertise in defending public goods funding from Sybil attacks. More about the founding of the OpenData Community is available in this November 2022 Forum post: https://gov.gitcoin.co/t/gitcoin-starts-and-supports-the-opendata-community/11886

Many organizations are coming together to organize, promote, and fund the DataBuilders hackathon. In addition to Gitcoin and the FDD, Ocean Protocol, Pocket Network, Supermodular, TrueBlocks, and a former OpenData Community hackathon winner, TrustaLabs, are sponsors and contributors to the hackathon. TrustaLabs has contributed back the bounties it won in the inaugural OpenData Hackathon.

The DataBuilders hackathon winners will be announced in early February. Again, contestants can register and learn more about the hackathon at: https://gitcoin.co/hackathon/DataBuilders/onboard

OpenData Community:

The mission of the OpenData Community is to defend web3 from the related risks of Sybil attacks and centralization at the data layer. Current initiatives of the ODC include a Landscape of decentralized solutions including TrueBlocks and others, AMAs with project creators, how-to videos and FAQs, and hackathons such as the current DataBuilders hackathon. The ODC was founded in 2022 by Gitcoin and benefits tremendously from the expertise of Gitcoin data scientists and the support of the Gitcoin DAO and Gitcoin community. http://opendatacommunity.org/

epowell101 · January 12, 2023, 3:59pm

One week into the DataBuilders hackathon and the status is positive. We have a lot of interest as this chart indicates. We have had dozens of downloads of the data sets as well and see 133 individuals have shown interest on the Gitcoin / Supermodular hackathon site we are using.

This chart also indicates that so far we have not seen much of that hackathon interest translate into help in building the OpenData Community itself - even though we have a bounty to reward that behavior. There could be a lag - and we are seeing an increase in the rate of requests for write access to our Github, which is promising. It could also be that we are not yet attracting many community builders and that most potential hackathon participants are attracted by the data analysis and lego and dashboard building.

Please feel free to drop into the ODC discord at any time to look around yourself! I am more convinced than ever that both the focus at the ODC on Sybil identification and related data science and software development AND the emphasis on decentralization at the data layer are strategically important to our shared intents and overall mission at Gitcoin. We need to protect and grow our commons - allowing potential capture at the data layer, and of course hijacking of pluralism by Sybils - simply cannot be allowed.

epowell101 · February 7, 2023, 12:56am

OpenData Community ‘DataBuilders’ Hackathon: Preliminary Results

The OpenData Community, founded by Gitcoin, has just completed a hackathon. As mentioned above, contestants competed across four categories for $40,000 in bounties:

Exploratory data analysis
Packaging of algorithms into reusable Legos
Design of a dashboard for public goods operators
Hack on the Hackathon

In addition to founding sponsor Gitcoin, many other web3 organizations supported the hackathon including Pocket Networks, Ocean Protocol, Supermodular.xyz, TrustaLabs, and TrueBlocks.

The hackathon ran through January 31st. Final results will be available after the judging is completed in the next week.

This Data Builders hackathon was the first one primarily organized by the OpenData Community itself. Community participation levels grew quickly over the month of January, including:

The addition of 398 new community members, a growth of 680% from our first months of November and December
Similar growth in the number of active members, Twitter followers, pull requests, and Discord engagement

There were 52 submissions by teams to the hackathon, an increase of 62% from the inaugural OpenData Community Hackathon, which had 32 submissions.

The growth in these top-level metrics is starting to translate into the growth of materials that are useful for the entire open data and web3 community. These resources are authored and reviewed by community members, reflecting crowd-sourced expertise free from commercial influence. While only weeks old, these projects are already delivering value and are attracting a growing list of contributors. These projects include:

A Landscape of useful data analysis tools and related projects, covering node, indexing, analysis, and other components of an open data stack:

https://opendatacommunity.org/docs/landscape/

A set of FAQs that addresses common challenges faced by data analysts

https://opendatacommunity.org/docs/faq/

The community is currently working on two additional projects that are planned to be made available for broader collaboration by April 1st:

A sandbox for interesting projects - including Legos and Dashboards developed via the recent Data Builders hackathon: This sandbox will house useful projects and seek to catalyze the use of these projects and contributions to the projects as well.
An OpenData stack: This operating stack will be used by ODC members in the analysis and support of sandbox and other interesting projects.

Finally, the OpenData Community is working to achieve sustainability by mid-April. In addition to further developing a governance structure, the community is:

Planning a hackathon for April 2023
Establishing relationships with founding sponsors and partners
Considering services to be delivered via bounties sourced from community members, starting with trials in support of our founding sponsors

The mission of the OpenData Community remains the growth and protection of web3. We seek to grow an ecosystem that can better identify and prevent Sybils and perform other useful data analysis, through data science and software development and by nurturing promising open data projects. We seek to resist the centralization of the data layer and are inclusive in our approach and values.

If you are a web3 project concerned about the rising cost of data analysis or the increased threat of Sybils and other bad actors, please join our Discord and get in touch. We would be happy to partner and early supporters can assist in designing upcoming hackathons and other projects. We would very much appreciate your input and feedback.

epowell101 · February 26, 2023, 10:29pm

OpenData Community Data Builder Hackathon Results

The OpenData Community hackathon in January of 2023 was judged in early February with bounties distributed by mid-February.

In this forum post, we highlight a few of the themes emerging from the hackathon made possible by the generous support of Gitcoin and our other sponsors and by a growing community of volunteers.

For a slideshow overview of the OpenData Community and the hackathon results - and our immediate plans as well - please take a look here:
ODC summary with summary of hackathon and immediate plans

You can also take a look at all of the results, available in this summary spreadsheet:

Judging Spreadsheet

The OpenData Community addresses two potentially existential threats to the future of web3:

Centralization at the data layer
Fraud and specifically Sybil attacks

It is crucial to the credibility and growth of web3 that we both try to prevent Sybils by raising their costs of forgery through the Gitcoin Passport protocol and other approaches AND that organizations such as the ODC help to provide transparency during and after the rounds so that communities are confident that their grants and funding rounds are effective. The world needs transparent funding of public goods - built upon a capture-resistant blockchain and closely related data layer.

Theme: Sybil and Grant reviews work together

One observation that many hackathon members and members of the FDD within Gitcoin have made repeatedly is that by looking at Grants we can learn about Sybils and vise-versa.

There are a variety of submissions that looked at these relationships, including one of the projects that the judges rated highly, titled “Grant Exploration”, available here:

When you dig into the project, you can find that a ratio of suspicious Sybils for a given Grant itself should trigger further analysis of that Grant. As such, the project does a detailed analysis of wallet behavior to determine which are likely Sybils.

A self-explanatory graph from the project:

Other projects which examine the validity of grants included a project that looked at a variety of approaches to determine whether a wallet may be a Sybil, including the previously documented “Donor DNA” developed by the Gitcoin FDD and others, and something called the Hamming distance which helps to identify potentially related donors and examines an enormous number of the relationships amongst all of the wallets.

The submission for the project includes a variety of colorful heatmaps - and also uses the signal of an unusual preponderance of Sybil contributions to call into question the integrity of particular Grants. The Jupyter notebook for this submission can be found at: https://nbviewer.org/github/timk11/odc-hackathon/blob/main/EDA_for_Sybil_attack_detection-v2.ipynb (warning - the notebook can be resource intensive to load)

However, these approaches are by design reactive. A decision to squelch a Sybil or to investigate further a Grantee is made based on the behavior of these entities during a round itself. What if we want to use the tools of the hackathon - exploratory data analysis, Lego building, and dashboards - to examine the risk of a grant or a Sybil proactively, before the round transpires?

As it happens, there were a few interesting responses that could be applied to proactive Sybil and Grant analysis, including this analysis stored on the Ocean Protocol marketplace: Grant Scoring via Title Accuracy and other features
The contestant in this submission trained machine learning models on a number of features, including the accuracy of the description of the grant project, the activity levels of the underlying Github, and other characteristics of the Grant project websites. They were able to predict which grants were more likely to be suspicious, a useful finding that could help to focus human reviewers.

Another submission focused specifically on social proof of the Grants and compared that to the votes that were submitted in support of each project. This ratio could itself be used as a Lego - a piece of analysis that takes in a breadth of data and returns a boolean, 1 or 0, indicating whether on the whole the criteria were met.

Theme: Legos are plugable

As the prior example indicates, Legos enable round operators and other analysts to more efficiently reuse algorithms. They also are inherently more transparent - we need to adopt a “transparency all the way down” mantra within web3 and public goods funding in particular in order to protect and boost the credibility of our coordination.

There are multiple useful posts on the Forums including this one introducing the subject of Legos:
https://gov.gitcoin.co/t/public-goods-legos-roadmap/12546
In addition, the documentation for Legos prepared by the Gitcoin FDD proved to be extremely useful to participants in explaining the concept and providing concrete examples as well:

https://github.com/Fraud-Detection-and-Defense/lego-docs
A quick thank you to @j-cook who has been an amazing collaborator; as a lead technical writer for the Eth Foundation and a scientist, he brings a level of professional expertise and intellectual curiosity that reflects itself not only in the above documentation but also in many FDD Forum posts upon which much of what ODC does is based.

Many of the leading submissions embraced the Lego design pattern, often combining Legos that the contestant developed along with other Legos that others had developed. For example, the following project built on the experience of the contestant in web2 fraud analysis to focus on behaviors such as the age of an account and other variables; additionally the contestant looked at whether the first transaction for multiple wallets originated from the same wallet, an approach credited to the TrustaLabs team from the October 2022 hackathon. As pointed out by the contestant and in considerable literature about fighting Sybils - one goal is to increase the cost of Sybil attacks. For example, a solution that flags and potentially squelches wallets with a common seed wallet may result in Sybil attackers needing to execute multiple transactions for each wallet before receiving their initial funds, greatly increasing the cost and complexity of their orchestration in preparation for a Sybil attack. As the potential rewards to Sybil attacks increase due to the success of public goods and other funding using Quadratic approaches, we can anticipate more attacks as well. The infinite game continues.

Using Legos to Defend from Sybil attacks

Theme: Dashboards are valuable

Lastly, starting with this hackathon the OpenData Community is supporting the creation of useful dashboards for Sybil analysis. These dashboards are intended to be used by program operators when examining the health of their rounds and in explaining their findings to their communities. We’d especially like to thank Kevin Owocki, Gitcoin founder and current CEO of Supermodular, for suggesting the inclusion of a dashboard bounty in this hackathon and for providing some guidance as to potentially useful approaches.

The dashboards inevitably bring together the underlying analysis. One of the winning solutions, for example, allows Grants and Wallets to be quickly examined with the aid of several Legos.

A live look at this dashboard is also available here:
https://www.grantlooker.xyz/projects

This work builds off of guidance from the FDD team, who previously published a dashboard for round reviews, as explained here: https://gov.gitcoin.co/t/public-goods-legos-roadmap/12546

We plan to support one or more Dashboards as a Sandbox project.

Sandbox

While the submissions highlighted here - and dozens of others - show the promise of the OpenData Community as a source of analysis, software, and qualified teams, in order to actually rely upon the algorithms, Legos, and software for production usage a program operator likely needs more. In particular, any user of these resources would want to know that the software would be maintained and improved over time and that it would become more broadly used, thereby further accelerating the pace at which it gathers and implements improvements.

We are about to officially launch our Sandbox program. The quick summary is that via the Sandbox program we will also formalize and improve our support of prexisting projects that we believe to be fundamental such as the TrueBlocks index. We will also get behind qualified hackathon projects and we may launch our own as well if we and our partners perceive there to be useful gaps in need of software and analysis.

You can learn more about our approach to Sandbox projects in this blog post:
https://opendatacommunity.org/blog/the-odc-sandbox-launch/

Conclusion:

A special thank you to the organizations that partnered with us to make the hackathon a success. Gitcoin, POKT, Ocean Protocol, Supermodular, TrustaLabs and TrueBlocks each contributed funding - and their expertise and guidance to the hackathon participants and to the OpenData Community itself.

All of these partners joined the ODC in Twitter Spaces and were active on the Discord channels as well. They also worked to suggest topics for the hackathon and promoted it as well.

We are now working closely with our founding sponsor Gitcoin and other organizations to ensure that the OpenData Community grows in a way that helps them. For example, Founding Sponsors play an informal role today - and can play a more formal role via governance in the future - at helping to select our direction including which types of Sandbox projects we should focus upon.

Please join us on our journey to protect web3 from fraudulent activity - and from the perils of centralization at the data layer. If you are a stakeholder in the success of web3 - then we likely view you as our stakeholder as well. Let’s work together.

To get started you can of course drop into our Discord:

https://discord.gg/Ye3QrWf6fG
And there is our wiki which can be useful for seeing the current activities and orienting yourself:
Home · OpenDataforWeb3/Resources Wiki · GitHub

Last but not least you can take a look at our website which also links to a variety of resources including the nascent but fast improving Landscape of useful open data tools, projects, and services:

https://opendatacommunity.org/

Please do get in touch, provide us feedback, and lend a hand.