Load Grants Stack data into Python!

How to load Grants-stack data into Python?

Are you working with Python and feel the pressing need to obtain fresh Gitcoin Grants Data as nice dataframe without going through boring data-processing steps? If so, check this out:

import pandas as pd

grants_stack_rounds = pd.read_parquet('http://grant-data.xyz/rounds.parquet')
grants_stack_rounds.info()

#This uses public IPFS gateway, so ~1/10 times it can be slow/fail.

This gets you dataframe with all Grants Stack rounds that you would normally obtain from Grants Stack Indexer (v1) packaged in a single pandas dataframe ready for analysis, processed like so:

  • columns β†’ renamed to snake_case,
  • nested metadata β†’ extracted to columns
  • addresses β†’ normalized to lowercase.

More interesting files (data refreshes weekly, on monday!):

Buyer beware this is WIP so use HTTP for now => if clicking link doesn’t work in your browser, try pasting the entire link in your searchbar including http:// part.

I also don’t reccomend using this for anything mission-critical, but it is prety neat for interactive terminal session if you want to load data quickly to check something.

How this works?

Now that you have the data, check out:

TL;DR @davidgasquez, with some help from myself is running a Dagster pipeline that uses Github Actions to grab data from grants-stack-indexer, clean it and send it to IPFS bucket.

Domain redirect can take some extra time, it can be faster to refer to the bucket using its IPNS name instead. Here is an example of linking /projects, that is equivalent to link above:

https://cloudflare-ipfs.com/ipns/k51qzi5uqu5dhn3p5xdkp8n6azd4l1mma5zujinkeewhvuh5oq4qvt7etk9tvc/projects.parquet

Because IPFS bucket is open to anyone, you can also use it to ask SQL questions about Grants-Stack data using DuckDB webshell, like so:

DuckDB_query

What now?

I am writing this post because I believe that fetching Gitcoin Data for analysis of any kind is something that should be easy, and effortless!

At this moment data from Gitcoin Grants Data Portal has been in use by @ccerv1, @rohit, @umarkhaneth and some Open Data Community folks, but I would like to open the project to wider audience to collect more feedback and encourage folks to test it out!

So for anyone reading this I would like to ask for some feedback and open discussion!

  • Do you find the solution presented here useful?
  • Is there some other format in which you would like data to be served (I am thinking excell spreadsheets for non-nerds)?
  • Is there some data about Gitcoin rounds that you would like to see, that is hard to obtain?

I would also be interested to hear about other community-run Gitcoin β€œdata-sphere” projects that people are working on? I am currently aware of oss-observer and RegenData, but those are both high profile made by Gitcoin insiders. At risk of sounding pragmatic, Citizens Retro Funding #3 right around the corner, makes this excellent time to surface any community contributions in this area!

5 Likes

This is great!! I loved getting parquet files directly. Thanks for sharing and raising awareness for this awesome resource.

3 Likes

This wonderful abstraction brings the onchain data into familiar territory for anyone comfortable with SQL. Thanks, @davidgasquez and @DistributedDoge!

If anyone is looking for sample queries/projects using Gitcoin Grants Data Portal, here are a few you can fork to start building upon:

2 Likes