GG23 Predictive Funding with the Omniacs.DAO
Our submission to the GG23 Predictive Funding Challenge leverages a high level embedding of the funded project pages and basic feature engineering in combination with standard best practices for modelling tabular data.
- Submission under “Omniacs.DAO” on CryptoPond with a score of 0.0425. (1st place at the time of writing)
Executive Summary
- We employed a careful yet simple feature engineering step that found a consistent set of features across both the test and training sets.
- We used a combination of features extracted from each project’s description on Gitcoin as transformed via a Nomic embedding representation.
- These embeddings served as inputs into a grid search optimized gradient boosting machine to achieve a top score.
Approach
In a quick “cookbook” format, our approach to developing the model involved grabbing the data from the CryptoPond platform. We then had to solve the first problem, finding a consistent set of features across both the training and test sets so that, given a new project, we’d be able to estimate the amount of funding it would receive. Utilizing the updated project descriptions, (example here) we scraped then extracted a vectorization of the text using the nomic-embed-text:v1.5 embedding model. For projects with insufficient project descriptions, we either replaced it with information from its website or appended the readme of its repo. Given the time dependency between rounds within the training data, we opted for a simple mean aggregation where; for each project in the training set, we simply averaged the matching amount, the number of contributors and contribution amount across each round they participated in. This gave us 3 dependent variables we could potentially treat as responses. We then utilized a grid search to find the optimal hyperparameters for a standard gradient boosted model set to predict the percentage of the pool each project got per round. We then took this value, scaled it by the rounds they participated in and structured everything for submission.
Takeaways
-
Things Done Well
- Found a consistent set of features across the training and test set.
- Accounted for the time dependencies across rounds with simple averages.
- Completed everything in under 24 hours!
-
Things Needing Improvement
- More features that measured external popularity could have been included (Github Stars & Twitter followers).
- We could have reimplemented a version of the QF formula to apply to the ML predictions instead of using a model to estimate the amount of matching in addition to the amount of donations.