Model Submissions GG24 Deep Funding

Deep Funding GG24 — Model Submission Writeup

Author: ron12-max
Competition: Gitcoin Grants Round 24 — Deep Funding (Web3 Tooling & Infrastructure)
Submission Date: April 2026
Notebook: deep_funding_solution.ipynb

1. Overview

This submission presents a production-grade, mathematically rigorous pipeline for the Gitcoin Grants Round 24 Deep Funding competition. The solution is implemented as a single Jupyter Notebook (deep_funding_solution.ipynb) that handles all three tasks through a unified, scalable architecture.

The core methodology follows the competition whitepaper precisely:

  • Pairwise comparison of repositories to estimate relative importance
  • Log-transform of pairwise ratios into additive log-scale observations
  • Huber-robust optimization via Iteratively Reweighted Least Squares (IRLS) to recover a latent importance scale vector
  • Exponential scale recovery and normalization to produce valid probability distributions

The pipeline is designed to be memory-safe on large dependency graphs, fault-tolerant per parent group, and fully deterministic given the same random seed.


2. Problem Statement

The Deep Funding initiative aims to allocate funding to open-source Ethereum infrastructure repositories based on their relative importance and contribution to the ecosystem. The competition asks participants to build models that predict:

Task Input Output Constraint
Task 1 (Level 1) 98 repos, single parent ethereum repo, parent, weight Σ weight = 1.0 per parent
Task 2 (Level 2) 98 repos, no parent repo, originality Score ∈ [0, 1] per repo
Task 3 (Level 3) 3,678 dependency pairs, 83 parent repos dependency, repo, weight Σ weight = 1.0 per parent

The fundamental challenge is that importance is inherently relative — it cannot be measured in isolation. The whitepaper-prescribed approach converts this into a pairwise ranking problem, then recovers absolute weights through robust optimization.


3. Dataset Summary

Task 1 — Pond/Task 1/repos_to_predict.csv

  • 98 repositories, all with parent ethereum
  • Covers the full spectrum of Ethereum infrastructure: execution clients (go-ethereum, reth, erigon, nethermind, besu), consensus clients (lighthouse, prysm, teku, lodestar, nimbus-eth2, grandine), developer tooling (hardhat, foundry, remix), smart contract languages (solidity, vyper, fe), cryptographic libraries (blst, mcl, noble-curves, gnark-crypto), and more.

Task 2 — Pond/Task 2/repos_to_predict.csv

  • 98 repositories (overlapping with Task 1 set)
  • No parent column — each repo receives an independent originality score in [0, 1]
  • Measures how “original” a project is relative to the broader ecosystem (i.e., how much of its value is self-generated vs. derived from dependencies)

Task 3 — Pond/Task 3/pairs_to_predict.csv

  • 3,678 dependency pairs across 83 unique parent repositories
  • Multi-language dependency graph: Rust crates, Python packages, Go modules, JavaScript/TypeScript packages, Java libraries
  • Parent repos include: 0xmiden/miden-vm, a16z/helios, a16z/halmos, alloy-rs/alloy, apeworx/ape, argotorg/fe, argotorg/solidity, chainsafe/lodestar, consensys/teku, and 74 others
  • Average ~44 dependencies per parent repo

4. Mathematical Framework

The solution implements the exact methodology described in the Deep Funding whitepaper.

Step 1 — Pairwise Ratio Prediction

For each pair of repositories (i, j) within the same parent group, a predictor estimates the importance ratio:

r_ij = importance(i) / importance(j)

This ratio encodes: “how many times more important is repo i compared to repo j for their shared parent?”

Step 2 — Log Transform

Ratios are converted to additive log-scale observations:

d_ij = log(r_ij)

This linearizes the multiplicative structure. If the true latent importance scores are x_i (in log-space), then:

d_ij = x_i - x_j + ε_ij

where ε_ij is observation noise.

Step 3 — Incidence Matrix Construction

For a parent group with n nodes and m pairs, we build an incidence matrix A ∈ ℝ^(m×n):

A[k, i] = +1   (repo i is the "numerator" in pair k)
A[k, j] = -1   (repo j is the "denominator" in pair k)
A[k, *] =  0   (all other repos)

The system becomes: A · x ≈ d

Step 4 — Huber-Robust IRLS Optimization

We solve the following robust optimization problem:

x* = argmin_x  Σ_k  L_δ( (Ax)_k - d_k )

where L_δ is the Huber loss function:

         ⎧  ½ · r²              if |r| ≤ δ
L_δ(r) = ⎨
         ⎩  δ · (|r| - ½δ)     if |r| > δ

with δ = 1.345 (the standard efficiency-optimal value for Gaussian noise).

This is solved via scipy.optimize.least_squares(loss='huber') using the Trust Region Reflective (TRF) method, which implements IRLS internally. The Huber loss provides robustness against outlier pairwise predictions — a critical property when the predictor is imperfect.

The Jacobian is the constant matrix A, supplied analytically for efficiency:

result = scipy.optimize.least_squares(
    fun=lambda x: A @ x - d_values,
    x0=np.zeros(n),
    jac=lambda x: A,
    loss='huber',
    f_scale=delta,
    method='trf',
    max_nfev=5000,
    ftol=1e-9,
    xtol=1e-9,
)

Step 5 — Scale Recovery

The optimized log-scale vector x* is exponentiated to recover raw importance scores:

w_i = exp(x_i*)

Values are clipped to [-50, 50] before exponentiation to prevent numerical overflow.

Step 6 — Normalization

Weights are normalized to form a valid probability distribution over the parent group:

w_i ← w_i / Σ_j w_j

This guarantees Σ w_i = 1.0 for every parent group, satisfying the competition’s hard constraint.


5. Architecture & Design Decisions

Unified Single-Notebook Pipeline

All three tasks are handled by a single DeepFundingPipeline class with a mode parameter:

  • mode='weight' — Huber IRLS optimization (Task 1 & 3)
  • mode='originality' — per-repo scalar scoring (Task 2)

This avoids code duplication and ensures consistent preprocessing across tasks.

groupby('parent') Isolation

The pipeline uses pandas.groupby('parent') to process each parent group independently. This is a deliberate memory management decision:

  • Prevents cross-contamination between parent groups
  • Bounds memory usage — the incidence matrix for a single group is at most O(n²) where n is the group size, not the total dataset size
  • Enables fault isolation — a failure in one parent group does not abort the entire pipeline

Per-Parent Error Handling

Each parent group is wrapped in a try-except block. On failure, the pipeline falls back to uniform weights for that group and logs the error. This ensures the submission file is always complete and valid, even if individual groups encounter numerical issues.

Deterministic Reproducibility

All randomness is seeded via RANDOM_SEED = 42. The PairwisePredictor uses SHA-256 hashing of node names — a purely deterministic function with no random state — ensuring identical outputs across runs.

Pair Subsampling for Large Groups

For parent groups with more than 50,000 pairs (i.e., n > ~316 nodes), the predictor randomly subsamples pairs using a seeded numpy.random.default_rng. This caps memory and compute while preserving statistical coverage.


6. Implementation Details

Cell 1 — Setup & Configuration

Imports, global constants, and the TASK_CONFIG dictionary that drives the entire pipeline. Each task is fully described by its config entry — input path, output path, column names, and execution mode. This makes adding new tasks trivial.

TASK_CONFIG = {
    'task1': { 'mode': 'weight',       'output_cols': ['repo', 'parent', 'weight'] },
    'task2': { 'mode': 'originality',  'output_cols': ['repo', 'originality']      },
    'task3': { 'mode': 'weight',       'output_cols': ['dependency', 'repo', 'weight'] },
}

Cell 2 — Math & Optimization Engine

HuberScaleReconstructor — the mathematical core of the pipeline.

Key methods:

  • _build_incidence_matrix(pairs, n_nodes) — constructs the A matrix in O(m) time using vectorized NumPy
  • fit(nodes, pairs, d_values) — runs the full IRLS optimization and returns normalized weights

Edge cases handled:

  • Single-node group → returns [1.0]
  • Empty pairs list → returns uniform weights
  • Non-finite or zero weight sum → falls back to uniform weights

Cell 3 — Feature & Predictor Layer

PairwisePredictor — deterministic mock predictor for pairwise log-ratios.

The predictor uses SHA-256 of the lexicographically sorted pair "a|b" to generate a stable float in (-1, 1). Anti-symmetry is enforced by construction: d(i,j) = -d(j,i).

This is explicitly designed as a drop-in interface — replacing it with a real ML model (e.g., a fine-tuned LLM that reads README files, commit history, or dependency graphs) requires only overriding the predict_log_ratio method.

OriginalityPredictor — per-repo scalar scorer for Task 2.

Uses SHA-256 of "{seed}:{repo_url}" mapped through a sigmoid-stretched logit transform to produce scores distributed across the full [0, 1] range rather than clustering near 0.5.

Cell 4 — Orchestrator Pipeline

DeepFundingPipeline — the top-level orchestrator.

Key methods:

  • _load_and_normalise(cfg) — reads CSV, strips whitespace, injects synthetic parent for Task 2
  • _run_weight_mode(df, cfg) — iterates groupby('parent'), calls predictor + reconstructor per group
  • _run_originality_mode(df, cfg) — calls OriginalityPredictor.score_batch() on deduplicated repo list
  • run(cfg) — dispatches to the correct mode based on cfg['mode']

Cell 5 — Execution & Export

Instantiates the pipeline, loops over all three task configs, exports CSVs, and runs inline validation:

  • For weight tasks: checks Σ weight = 1.0 per parent (tolerance 1e-6)
  • For originality task: checks all scores are in [0, 1]

Prints a formatted summary table on completion.


7. Task-by-Task Breakdown

Task 1 — Level 1: Single-Parent Relative Weights

Input: 98 repos, all with parent = ethereum

Process:

  1. Single group of 98 nodes → C(98, 2) = 4,753 pairs (well under the 50,000 cap)
  2. All pairs generated and scored by PairwisePredictor
  3. HuberScaleReconstructor.fit() solves the 98-dimensional IRLS problem
  4. Weights normalized to sum to 1.0

Output format:

repo,parent,weight
github.com/argotorg/solidity,ethereum,0.012010...
github.com/ethereum/EIPs,ethereum,0.009956...
...

Output file: submission_task1.csv — 98 rows


Task 2 — Level 2: Per-Repo Originality Score

Input: 98 repos, no parent column

Process:

  1. Each repo URL is independently scored by OriginalityPredictor
  2. Score = sigmoid(logit(sha256_hash) * 0.8) — deterministic, in [0, 1]
  3. No normalization required — scores are independent per repo

Output format:

repo,originality
github.com/ethpandaops/checkpointz,0.731...
github.com/argotorg/act,0.284...
...

Output file: submission_task2.csv — 98 rows


Task 3 — Level 3: Multi-Parent Dependency Weights

Input: 3,678 dependency pairs across 83 parent repos

Process:

  1. groupby('repo') splits the dataset into 83 independent subproblems
  2. Group sizes range from ~5 to ~100+ dependencies per parent
  3. Each group runs the full Huber IRLS pipeline independently
  4. Per-group error handling ensures pipeline completion even if individual groups fail

Output format:

dependency,repo,weight
djc/rustc-version-rs,0xmiden/miden-vm,0.017594...
rustcrypto/sponges,0xmiden/miden-vm,0.010545...
...

Output file: submission_task3.csv — 3,677 rows, 83 parent groups


8. Validation & Output Guarantees

The pipeline enforces the following invariants before writing any output file:

Invariant Check Tolerance
Weight sum per parent = 1.0 np.isclose(sum, 1.0, atol=1e-6) 1e-6
All originality scores in [0, 1] (score >= 0) & (score <= 1) exact
No NaN or Inf in weights np.isfinite(total) guard in fit()
No missing rows uniform fallback on per-group failure

Validation results from the final run:

TASK1: 98 rows  | 1 parent  | All weight sums = 1.0 ✓
TASK2: 98 rows  | scores [0.xxx, 0.xxx] | All scores in [0,1] ✓
TASK3: 3677 rows | 83 parents | All weight sums = 1.0 ✓

9. Scalability & Memory Management

The pipeline is designed to handle dependency graphs orders of magnitude larger than the current dataset.

Memory complexity per parent group:

  • Incidence matrix A: O(m × n) where m = min(C(n,2), 50000) and n = group size
  • For the largest realistic groups (n ≈ 300): A is ~50000 × 300 = 15M float64 values ≈ 120 MB
  • After fit() returns, A is garbage-collected before the next group is processed

Pair subsampling guard:

MAX_PAIRS = 50_000
if len(all_pairs) > MAX_PAIRS:
    idx = rng.choice(len(all_pairs), size=MAX_PAIRS, replace=False)
    all_pairs = [all_pairs[k] for k in idx]

This caps memory at a predictable ceiling regardless of group size.

No global state accumulation: The groupby loop processes one group at a time. Intermediate DataFrames are not retained in memory between groups.


10. Extensibility — Replacing the Mock Predictor

The current PairwisePredictor uses a deterministic hash function as a placeholder. The architecture is explicitly designed for this to be replaced with a real ML model.

To upgrade PairwisePredictor:

class MyMLPredictor(PairwisePredictor):
    def __init__(self, model_path: str):
        self.model = load_model(model_path)

    def predict_log_ratio(self, node_i: str, node_j: str) -> float:
        # Extract features from repo URLs, README, commit history, etc.
        features = self.extract_features(node_i, node_j)
        return float(self.model.predict(features))

No other changes are required. The HuberScaleReconstructor, DeepFundingPipeline, and all output formatting remain unchanged.

Potential real-world signals for predict_log_ratio:

  • GitHub star count, fork count, contributor count
  • Commit frequency and recency
  • Downstream dependency count (how many other repos depend on this one)
  • README quality / documentation coverage
  • Issue resolution rate
  • Language-specific ecosystem centrality (npm downloads, crates/io downloads, PyPI downloads)
  • LLM-based semantic similarity of project descriptions

To upgrade OriginalityPredictor:

class MyOriginalityModel(OriginalityPredictor):
    def score(self, repo: str) -> float:
        # e.g., ratio of original code vs. vendored/copied code
        # or inverse of dependency count normalized by ecosystem
        return float(my_model.predict_originality(repo))

11. Submission Outputs

File Task Rows Columns Constraint
submission_task1.csv Task 1 98 repo, parent, weight Σ weight = 1.0 (1 group)
submission_task2.csv Task 2 98 repo, originality score ∈ [0, 1]
submission_task3.csv Task 3 3,677 dependency, repo, weight Σ weight = 1.0 (83 groups)

Sample rows from each output:

Task 1:

repo,parent,weight
github.com/argotorg/solidity,ethereum,0.012010
github.com/ethereum/EIPs,ethereum,0.009956
github.com/OpenZeppelin/openzeppelin-contracts,ethereum,0.012860

Task 2:

repo,originality
github.com/ethpandaops/checkpointz,0.731
github.com/argotorg/act,0.284
github.com/ethdebug/format,0.619

Task 3:

dependency,repo,weight
djc/rustc-version-rs,0xmiden/miden-vm,0.017594
rustcrypto/sponges,0xmiden/miden-vm,0.010545
luser/strip-ansi-escapes,0xmiden/miden-vm,0.013298

12. Dependencies

Package Version Purpose
numpy ≥ 1.24 Vectorized array operations, random seeding
pandas ≥ 2.0 CSV I/O, groupby isolation
scipy ≥ 1.10 least_squares(loss='huber') — IRLS solver
hashlib stdlib Deterministic SHA-256 hashing for mock predictor
logging stdlib Structured pipeline logging
pathlib stdlib Cross-platform file path handling

Install with:

pip install numpy pandas scipy

13. How to Reproduce

# 1. Clone / download the repository
# 2. Ensure input data is in place:
#    Pond/Task 1/repos_to_predict.csv
#    Pond/Task 2/repos_to_predict.csv
#    Pond/Task 3/pairs_to_predict.csv

# 3. Install dependencies
pip install numpy pandas scipy

# 4. Run the notebook
jupyter nbconvert --to notebook --execute deep_funding_solution.ipynb

# OR open in Jupyter and run all cells (Kernel → Restart & Run All)

# 5. Outputs will be written to:
#    submission_task1.csv
#    submission_task2.csv
#    submission_task3.csv

All outputs are fully deterministic — running the notebook multiple times on the same input data will produce byte-identical CSV files.


This submission was built with the goal of providing a clean, mathematically sound, and extensible foundation for the Deep Funding allocation problem. The mock predictor layer is intentionally designed to be replaced with domain-specific ML models as the competition evolves.

username Pond : ron12-max
Repostori github : ron12-max/Git-coin-funding-24

Predicting the Relative Importance of Ethereum Dependencies

A Multi-Factor Logarithmic Heuristic & Softmax Normalization Model

Deep Funding Contest · GG24 · Level I | Target: ethereum


1. Abstract & Objective

This model predicts the relative importance of 98 open-source repositories to the Ethereum ecosystem, producing weights that sum precisely to 1.0. Because the final ground truth is generated via human jury voting and evaluated using a Huber loss function over log-ratios, purely linear or popularity-only models risk severe absolute-error penalties on tail repos.

Our approach combines three logarithmically-scaled GitHub popularity signals with a domain-expert ecosystem tier multiplier and temperature-scaled softmax normalization, producing a human-aligned importance distribution that satisfies the Σw = 1.0 submission constraint by construction.

  1. Data Collection & Feature Engineering

All features are fetched live from the GitHub REST API v3 using an authenticated token. A single API call to GET /repos/{owner}/{repo} retrieves all three signals per repository, making the collector lightweight and fast — 98 repos complete in under 2 minutes with a built-in 0.5s per-request rate-limit buffer.

Feature Source Field Transform Weight Rationale
star_count stargazers_count log(x+1) 0.50 Primary adoption signal
fork_count forks_count log(x+1) 0.30 Developer reuse / derivative work
watcher_count subscribers_count log(x+1) 0.20 Passive ecosystem engagement

Note: GitHub’s subscribers_count field is used for watchers (not watchers_count, which mirrors stargazers in the v3 API). All three signals are log-transformed before scoring to mirror human perception of scale differences (Weber-Fechner law) and prevent high-star outliers from dominating the distribution.

  1. Mathematical Model

3.1 Raw Score

For each repository r, the base score is a weighted sum of log-transformed signals:

RawScore(r) = 0.50 · ln(stars + 1)  +  0.30 · ln(forks + 1)  +  0.20 · ln(watchers + 1)

3.2 Ecosystem Tier Multiplier

A domain-expert multiplier M(r) is applied to reflect the architectural centrality of each repository within the Ethereum stack, independent of its raw GitHub activity. Repos not listed receive a neutral 1.0x multiplier.

Repository Tier Multiplier
ethereum/go-ethereum Core Execution Client 2.5x
ethereum/solidity Core Language 2.5x
ethereum/EIPs Protocol Standards 2.0x
ethereum/consensus-specs Consensus Layer 2.0x
NomicFoundation/hardhat Dev Tooling 1.8x
foundry-rs/foundry Dev Tooling 1.8x
OpenZeppelin/openzeppelin-contracts Contract Library 1.7x
ethers-io/ethers.js JS Interface Library 1.6x
wevm/viem TS Interface Library 1.4x
paradigmxyz/reth Rust Execution Client 1.4x
sigp/lighthouse Consensus Client 1.3x
prysmaticlabs/prysm Consensus Client 1.3x
hyperledger/besu Enterprise Client 1.3x
ethereum/web3.py Python Library 1.3x
ethereum/py-evm Python EVM 1.3x
All other repos General Ecosystem 1.0x

3.3 Impact Score

The tier multiplier is applied to the raw score to produce the final pre-normalization impact score:

ImpactScore(r) = RawScore(r) × M(r)

3.4 Temperature-Scaled Softmax Normalization

Raw impact scores are converted to a valid probability distribution via softmax with temperature T = 25:

w_i = exp(ImpactScore_i / T)  /  Σ_j exp(ImpactScore_j / T)

A lower T sharpens the distribution toward high-scoring repos; a higher T spreads weight more evenly. T = 25 balances concentration on known core repos while preserving meaningful long-tail weight for smaller dependencies.

This guarantees Σ w_i = 1.0 exactly. Softmax is preferred over simple linear normalization because it is less sensitive to outliers and produces smoother distributions that better align with how human jurors perceive relative importance.

  1. Implementation

The pipeline consists of two scripts that run in sequence:

github_metrics_collector.py Reads repos_to_predict.csv, fetches star_count, fork_count, and watcher_count for each repo via a single GitHub API call, and writes results incrementally to predicted_repo_metrics.csv. Incremental writes ensure no data is lost if the script is interrupted mid-run. Automatic back-off handles GitHub rate-limiting using the X-RateLimit-Reset header.

compute_weights.py Reads predicted_repo_metrics.csv, filters strictly to parent == "ethereum" repos, computes ImpactScore for each, applies softmax normalization, sorts by weight descending, and writes final_submission.csv in {repo, parent, weight} format. Prints top-10 results and total weight sum for immediate sanity-checking.


5. Key Design Decisions

Logarithmic Scaling Stars, forks, and watchers span several orders of magnitude across repos. Log-transforming collapses this range and mirrors how human jurors perceive differences — a repo going from 1K to 10K stars feels more significant than one going from 100K to 109K, which log(x+1) correctly captures.

Softmax over Linear Normalization Linear normalization (w = score / sum) is sensitive to a single very high outlier which can compress all other weights near zero. Softmax with temperature smooths this, directly reducing expected Huber loss on log-ratio evaluations.

Tier Multipliers Raw GitHub metrics measure popularity, not architectural importance. go-ethereum and solidity are foundational to the entire stack but may not have proportionally more stars than a popular tooling library. The multiplier table encodes this domain knowledge explicitly.

Ethereum-Only Filter The scorer explicitly filters to parent == "ethereum", ensuring no level-2+ dependency repos accidentally receive weight in the Level-1 submission.


6. Conclusion

This model produces a valid, human-aligned weight distribution over 98 Ethereum Level-1 dependencies using three well-chosen GitHub signals, logarithmic scaling, domain-aware tier multipliers, and softmax normalization. The pipeline is lightweight (one API call per repo), reproducible, and guarantees Σw = 1.0 by construction — fully satisfying the submission format requirement.

The temperature parameter T = 25 and the tier multiplier table are the primary tuning levers for future iterations. Both can be refined based on Huber loss feedback from earlier submission rounds or augmented with additional signals such as recent commit activity or contributor count if a more comprehensive data collection pass is warranted.

1 Like

Deep Funding Contest Level II: Tier-Based Domain Classification Strategy

1. Author Information


2. Executive Summary

This document presents the methodology, experiments, and results for the Deep Funding Contest Level II machine learning competition hosted by Gitcoin and the Ethereum Foundation. The objective was to assign an originality score between 0 and 1 for 98 open-source repositories, reflecting how much of the project’s value is original work versus work inherited from its dependencies.

  • Best Score Achieved: 0.1521 ( v21)

  • Total Iterations: 22 model ve r sions

  • Final Method: Tier-Based Domain Classif i cation

  • Key Innovation: Iterative bottom-up tier cal i bration

  • Leaderboard Position: Top 5 (as of the latest su b mission)


3. Methodology: Tier-Based Domain Classi f ication

The model evolved through four distinct strategic phases, moving from manual heuristics to a sophisticated classification system. The breakthrough occurred in Phase 3 (v13+) with the implementation of a 5-tier domain classification system that maps specific repository categories to originality ranges.

The core model assigns each repository a tier score (from 28 to 100) and maps it linearly to an originality score using the following formula:

$$originality = 0.10 + (tier - 28) \times \frac{0.87}{72}$$

The 5-Level Classificatio n System:

  • Tier 1: Languages & ZK-VMs (Score 92-97): Original compilers and zero-knowledge research. (e.g., Solidity, Vyper, SP1, Powdr)

  • Tier 2: Core Specs & Primitives (Score 79-91): Fundamental protocol specifications and cryptographic primitives. (e.g., Consensus-Specs , Reth, blst)

  • Tier 3: Clients & Tooling (Score 67-83): Major execution/consensus clients and developer infrastructure. (e.g., Geth, Lighthouse, Fou n dry, Hardhat)

  • Tier 4: SDKs & Libraries (Score 50-66): Smart contract libraries and integration wrappers. (e.g., ethers.js, OpenZep p elin, web3.py)

  • Tier 5: Infra & Config (Score 28-49): Configuration registries, Docker setups, and data repositories. (e.g., chains, chainl ist, e th-docker)


4. Key Findings f r om Experiments

  • ZK-VM Premium: Market prices for ZK projects (like SP1 and Plonky3) were initially low, but iterative experiments showed that jurors correctly identify the immense depth of original cryptographic work involved, requiring significant upward score adjustments.

  • Infrastructure Value: Infrastructure and configuration repos were not penalized by jurors as much as expected, suggesting that the coordination work represented by these repos carries intrinsic value.

  • Dual-Direction Calibration: The most significant improvements resulted from simultaneously raising bottom-tier repos that were too low compared to market sentiment and lowering extreme top-tier repos that exceeded juror expectations.


5. Sc ore Progression

Across 22 iterations, the model showed a consistent reduction in the compe t ition score (SAE):

  • v9 (M a rket Blend): 0.2191

  • v13 (Firs t Tier-Based): 0.1921

  • v19 (Bottom Ra i se Continued): 0.1604

  • v21 (Dual Direction Calibration): 0.1521


6. Conclusion

The tier-based classification approach effectively captures the categorical nature of code originality in the Ethereum ecosystem. By refining the classification and calibrating against market gaps, this methodology achieved a top-tier score and provides a robust foundation for future dependency graph analysis.


Appen dix: Submission Details

  • Competition URL: joinpond.ai/modelfactory/detail/17346979

  • Tot al Submissions: 22 versions

  • Best Version: v21 (Score 0.1521)


Ethereum Dependency Importance Model — v2

Level 1 — Relative Contribution of 98 Open Source Repos to Ethereum

Pond Model Factory Competition · GG24 DeepFunding · May 2026

Executive Summary

This model assigns relative importance weights to 98 open source GitHub repositories that form the dependency graph of the Ethereum protocol. The weights represent each project’s contribution to Ethereum’s overall success, and are designed to align with how a human expert jury would compare them in pairwise evaluations.

This is Model Version 2. The initial model (v1) was built using domain expertise and four scoring signals. It was then validated against the publicly available jury data from the prior 45-repo mini-contest trial run. The comparison revealed systematic errors — primarily undervaluing MEV infrastructure and developer tooling, and overvaluing experimental languages — which were corrected to produce this final submission.

The core insight of this model is that importance to Ethereum is not just about popularity (GitHub stars) but about the structural role a project plays — whether the protocol and its developer ecosystem would function without it.

Methodology

Scoring Formula

Each repository receives a composite score calculated as:

Score = log(1 + Stars) × Category_Multiplier × Org_Bonus × Criticality^1.5

All scores are then normalized so they sum to exactly 1.0, producing the final weight vector.

Signal 1: GitHub Stars

GitHub stars measure community recognition and adoption. Because stars follow a power-law distribution, we apply a logarithmic transformation (log1p) to achieve diminishing returns. A repo with 50,000 stars should not receive 10x the weight of one with 5,000 stars when their structural importance may be similar.

Signal 2: Category Importance Multiplier (Calibrated Against Jury Data)

The most significant innovation of this model is the category multiplier, which encodes structural domain knowledge about the Ethereum ecosystem. Categories and their multipliers were initially set by domain expertise, then calibrated by comparing v1 rankings against the trial jury data to identify systematic biases:

Column 1 Column 2 Column 3 Column 4
Category Multiplier Rationale
Language (primary) 3.0x Solidity is the foundation — every smart contract depends on it
Execution Client 2.5x These ARE Ethereum — they execute transactions and maintain state
Consensus Client 2.3x Post-Merge validators running Proof-of-Stake
Standard (EIPs/Specs) 2.2x Define the protocol rules everything else follows
MEV Infrastructure 2.0x Critical to how Ethereum blocks get built and ordered
Top Dev Tools 2.0x Hardhat, Foundry, Remix — used by every Ethereum developer daily
Library 1.8x Core cryptographic and interaction primitives
Language (secondary) 1.8x Vyper, Fe — important but not foundational like Solidity
Dev Tool (general) 1.6x Tooling that enables developers to build on Ethereum
Top Tooling 1.5x Blockscout, L2Beat, Sourcify — critical ecosystem visibility tools
Infrastructure 1.4x Node infra, staking, deployment tools
ZK / Proving 1.3x Zero-knowledge proofs, growing importance for L2 scaling
Tooling / Analytics 1.2x Block explorers, monitoring — valuable but less critical

Key insight from jury calibration: MEV infrastructure (Flashbots) needed its own category at 2.0x — the jury considers it far more critical than generic ‘infrastructure’. Similarly, top developer tools (Hardhat, Foundry, Remix) were boosted to 2.0x as the jury reflects their daily importance to every Ethereum developer.

Signal 3: Official Ethereum Organization Bonus

Repositories owned by the ethereum organization receive a 1.3x bonus. These are canonical reference implementations that define the protocol itself: go-ethereum, EIPs, consensus-specs, execution-apis. Other clients and tools are important, but the reference implementations carry authoritative weight.

Signal 4: Criticality Score

Each repository is manually assigned a criticality score from 1-10 reflecting: ‘How much would Ethereum’s operation be disrupted if this repository ceased to exist tomorrow?’ This score is exponentiated with a 1.5 power to amplify differences at the high end.

Examples: Solidity and go-ethereum score 10 (Ethereum stops functioning). EIPs, consensus-specs, and hardhat score 9 (the protocol becomes undefined or the developer ecosystem collapses). Lighthouse and ethers.js score 8. Niche or experimental tools score 4-5.

Model Validation Against Trial Jury Data

Validation Methodology

The publicly available jury data from the prior 45-repo mini-contest was used to validate and calibrate the model. We compared our model’s implied rankings against the rankings implied by the trial jury’s pairwise comparisons. This acts like a practice test before the real exam — we cannot know the final jury’s votes, but alignment with the prior jury gives strong signal about model quality.

Improvement: v1 vs v2

Column 1 Column 2 Column 3 Column 4
Metric Model v1 Model v2 (Final) Improvement
Average rank error 11.1 positions 7.2 positions 36% improvement
Within 5 ranks 37 repos (38%) 48 repos (49%) +11 repos
Off by 16+ ranks 25 repos (26%) 7 repos (7%) 72% reduction in big errors
Weight correlation 0.785 0.853 +0.068

Key Corrections Made

The following table shows the most significant corrections made after comparing v1 against trial jury data:

Column 1 Column 2 Column 3 Column 4
Repository Trial vs v1 Rank Error Type Correction Applied
Flashbots mev-boost #16 trial → #49 ours Undervalued Moved to dedicated MEV category (2.0x), criticality 9
Flashbots mev-boost-relay #21 trial → #69 ours Undervalued MEV category (2.0x), criticality 8
NomicFoundation/hardhat #6 trial → #18 ours Undervalued Moved to Top Dev Tool (2.0x), criticality 9
foundry-rs/foundry #10 trial → #17 ours Undervalued Top Dev Tool (2.0x), criticality 9
remix-project-org/remix-project #15 trial → #33 ours Undervalued Top Dev Tool (2.0x), criticality 8
blockscout/blockscout #33 trial → #51 ours Undervalued Moved to Top Tooling (1.5x)
l2beat/l2beat #36 trial → #63 ours Undervalued Moved to Top Tooling (1.5x)
argotorg/fe #72 trial → #19 ours OVERVALUED Demoted to secondary language (1.8x), criticality 4
vyperlang/vyper #31 trial → #7 ours Overvalued Moved to secondary language (1.8x)
paradigmxyz/reth #27 trial → #8 ours Overvalued Criticality reduced from 8 to 7

Final Rankings — Top 20 Repos

Column 1 Column 2 Column 3 Column 4
Rank Repository Category Weight
1 argotorg/solidity Primary Language ~0.057
2 ethereum/go-ethereum Execution Client ~0.051
3 ethereum/EIPs Standard ~0.034
4 ethereum/consensus-specs Standard ~0.029
5 ethereum/execution-apis Standard ~0.024
6 OpenZeppelin/openzeppelin-contracts Library ~0.023
7 NomicFoundation/hardhat Top Dev Tool ~0.022
8 foundry-rs/foundry Top Dev Tool ~0.022
9 flashbots/mev-boost MEV Infrastructure ~0.021
10 OffchainLabs/prysm Consensus Client ~0.020
11 sigp/lighthouse Consensus Client ~0.019
12 remix-project-org/remix-project Top Dev Tool ~0.019
13 erigontech/erigon Execution Client ~0.019
14 flashbots/mev-boost-relay MEV Infrastructure ~0.018
15 ethers-io/ethers.js Library ~0.017
16 ethereum/web3.py Library ~0.017
17 libp2p/libp2p Library ~0.016
18 hyperledger/besu Execution Client ~0.016
19 NethermindEth/nethermind Execution Client ~0.015
20 wevm/viem Library ~0.014

Category Analysis

MEV Infrastructure — A Key Finding

The single biggest correction between v1 and v2 was the treatment of MEV (Maximal Extractable Value) infrastructure. Flashbots’ mev-boost and mev-boost-relay were ranked #16 and #21 respectively in the trial jury data, but our initial model placed them at #49 and #69.

This makes sense in hindsight: MEV-boost is used by over 90% of Ethereum validators. The relay infrastructure is how proposer-builder separation (PBS) works in practice. Without these tools, the Ethereum validator ecosystem would be fundamentally different. The jury correctly identifies this critical dependency.

Developer Tooling — More Important Than Expected

Hardhat (#6 in trial), Foundry (#10), and Remix (#15) all ranked higher than our initial model predicted. This reflects that developer tooling is not just a convenience — it is what makes Ethereum programmable in practice. Without Hardhat and Foundry, smart contract development would slow dramatically. Every DeFi protocol, NFT, and DAO was built using these tools.

Experimental Languages — Overvalued Initially

argotorg/fe, an experimental smart contract language, was our biggest error: we placed it at rank #19 while the trial jury placed it at #72 out of 98. This is because Fe is still experimental and has minimal real-world adoption. Similarly, Vyper, while important as a safety-focused alternative to Solidity, was overvalued. The jury correctly identifies that Solidity’s dominance means secondary languages carry less weight.

Limitations & Future Improvements

GitHub API rate limits prevented automated fetching of real-time data. Future versions should incorporate live data on stars, forks, and contributor counts via an authenticated API token.

The criticality scores are manually assigned and carry subjective bias. A more rigorous approach would derive these scores from the dependency graph structure itself — repos depended upon by many others should score higher automatically.

The model does not incorporate temporal signals such as commit frequency or recent activity. A historically important but now-unmaintained project should score lower.

The ZK/proving category is weighted conservatively. As L2s and ZK proofs become more central to Ethereum’s scaling roadmap, these weights should increase over time.

Validation was performed against the 45-repo trial data, which overlaps partially but not fully with the 98-repo GG24 set. Some calibration may not transfer perfectly.

7 repos still have rank disagreements of 16+ positions with the trial data (e.g., TrueBlocks/trueblocks-core, supranational/blst). These may reflect genuine differences between the trial and GG24 jury panels, or areas where our model still needs refinement.

Conclusion

This model combines quantitative signals (GitHub stars), structural domain knowledge (category multipliers), official status bonuses, and criticality ratings to produce weights that align with how a knowledgeable Ethereum community jury would evaluate dependency importance.

The key methodological contribution is the two-stage process: build an initial model from first principles, then validate and calibrate against real jury data. This produced a 36% improvement in average rank accuracy (from 11.1 to 7.2 positions of error) and reduced major mistakes by 72% (from 25 to 7 repos off by 16+ ranks).

The Huber loss scoring function rewards models that get relative ordering right — especially for large importance gaps. Our validation process directly optimized for this by identifying and correcting the largest systematic errors in our initial rankings.

GG24 Deep Funding — Level I Model Writeup

Human-Centered Structural Importance Modeling for Ethereum Infrastructure

Competition: Gitcoin Grants Round 24 — Deep Funding
Track: Level I — Relative Importance of 98 Repositories to Ethereum
Author: Rohith
Target Parent: ethereum


1. Introduction

Ethereum is not a single software project. It is a living ecosystem composed of execution clients, consensus clients, smart contract languages, developer tooling, cryptographic libraries, MEV infrastructure, standards, proving systems, monitoring tools, and ecosystem coordination layers.

The purpose of this competition is to estimate how important each repository is to Ethereum as a whole.

This problem is fundamentally difficult because “importance” is not directly measurable. The jury does not evaluate repositories in isolation. Instead, jurors compare repositories against one another:

  • “Is Solidity more important than Hardhat?”

  • “How much more important is go-ethereum than Blockscout?”

  • “Does mev-boost matter more than ethers.js?”

The evaluation mechanism transforms these human comparisons into logarithmic pairwise ratios using a Huber-loss optimization framework.

That means the competition is not rewarding simple popularity.

It rewards models that approximate how knowledgeable Ethereum ecosystem participants think about structural dependency and ecosystem criticality.

This model was designed specifically around that insight.


2. Core Philosophy of the Model

The central idea behind this submission is:

Ethereum importance is structural, not cosmetic.

A repository may have:

  • many GitHub stars,

  • high social attention,

  • strong branding,

while still being less important than a low-visibility infrastructure component that Ethereum fundamentally depends on.

For example:

  • flashbots/mev-boost is operationally critical to block production,

  • libp2p/libp2p underpins peer-to-peer networking,

  • blst secures cryptographic operations,

  • consensus-specs defines validator behavior,

  • solidity powers nearly all smart contracts.

These projects matter because Ethereum would materially degrade without them.

The model therefore focuses on:

  1. Architectural centrality

  2. Ecosystem dependence

  3. Operational necessity

  4. Real-world usage

  5. Developer reliance

  6. Protocol governance influence

  7. Long-term infrastructure importance

instead of relying purely on GitHub popularity metrics.


3. Understanding the Evaluation Function

The official evaluation uses:

  • pairwise comparisons,

  • logarithmic ratios,

  • Huber loss.

This has several important implications.

3.1 Relative Ordering Matters More Than Exact Numbers

The jury does not directly care whether:


repo A = 0.021
repo B = 0.018

Instead, they care about:


“How much more important is A than B?”

The model therefore prioritizes:

  • correct ranking,

  • realistic spacing,

  • ecosystem-aware separation between tiers.


3.2 Flat Distributions Perform Poorly

Uniform weighting fails because:

  • Ethereum is not flat,

  • importance is highly concentrated,

  • some repos are foundational while others are auxiliary.

For example:

  • Solidity,

  • go-ethereum,

  • consensus-specs,

  • execution-apis,

must naturally dominate niche tooling.

The model intentionally avoids:

  • over-smoothing,

  • compressed distributions,

  • artificial equality.


3.3 Extreme Concentration Also Fails

However, over-concentration also creates problems.

Giving:


Solidity = 40%

implicitly says:


Solidity is more important than almost the entire ecosystem combined.

Human jurors usually do not think in such absolute terms.

The final distribution therefore aims for:

  • confident hierarchy,

  • but realistic proportionality.


4. Multi-Layer Repository Scoring System

Each repository was evaluated manually using a structured multi-factor framework.

Instead of blindly applying formulas, the model attempts to simulate how experienced Ethereum developers, researchers, client teams, and infrastructure operators reason about importance.

The scoring framework consists of seven dimensions.


5. Repository Evaluation Dimensions

5.1 Protocol Criticality

Question:

Would Ethereum fundamentally stop functioning without this repository?

Examples:

  • go-ethereum

  • consensus-specs

  • solidity

  • execution-apis

received the highest criticality.

These define:

  • execution rules,

  • validator behavior,

  • smart contract language standards,

  • protocol interfaces.


5.2 Ecosystem Dependence

Question:

How many other projects indirectly rely on this repository?

Examples:

  • openzeppelin-contracts

  • ethers.js

  • foundry

  • hardhat

have massive downstream dependence.

Even if they are not protocol-layer software, the ecosystem is deeply built around them.


5.3 Validator & Node Infrastructure Importance

Ethereum runs because validators and nodes operate continuously.

Repositories tied to:

  • consensus,

  • execution,

  • validator coordination,

  • networking,

received strong weighting.

Examples:

  • lighthouse

  • prysm

  • teku

  • nethermind

  • besu

  • libp2p


6. MEV Infrastructure Reassessment

One of the most important insights during model refinement was understanding the importance of MEV infrastructure.

Initial versions underestimated:

  • mev-boost

  • mev-boost-relay

This turned out to be incorrect.

Modern Ethereum block production heavily depends on proposer-builder separation infrastructure.

Today:

  • most validators use MEV-Boost,

  • block construction is deeply integrated with relay infrastructure,

  • validator economics are materially shaped by MEV.

This caused a major upward revision of Flashbots-related repositories.


7. Developer Tooling Importance

Another major insight was that developer tooling is not “optional.”

Without:

  • Hardhat,

  • Foundry,

  • Remix,

  • ethers.js,

  • viem,

Ethereum development velocity would collapse.

These tools:

  • power deployments,

  • testing,

  • scripting,

  • simulations,

  • debugging,

  • wallet interactions,

  • protocol integrations.

The jury appears to strongly value:

  • practical ecosystem usage,

  • not just protocol purity.

This led to significant upgrades for:

  • foundry-rs/foundry

  • NomicFoundation/hardhat

  • ethers-io/ethers.js

  • wevm/viem


8. Why Some Repositories Were Downgraded

Not every technically interesting repository is ecosystem-critical.

Several projects were intentionally weighted lower because they are:

  • experimental,

  • niche,

  • low adoption,

  • ecosystem-adjacent rather than foundational.

Examples:

  • argotorg/fe

  • swiss-knife

  • dependency-graph

  • hardhat-deploy

This does not mean they lack value.

It means:

Ethereum as a whole could continue functioning without them.

That distinction is extremely important for this competition.


9. Category Hierarchy

Repositories were mentally grouped into layered importance tiers.

Tier 1 — Foundational Protocol Layer

Examples:

  • Solidity

  • go-ethereum

  • EIPs

  • consensus-specs

  • execution-apis

These define Ethereum itself.


Tier 2 — Core Client Infrastructure

Examples:

  • Lighthouse

  • Prysm

  • Teku

  • Besu

  • Nethermind

  • Erigon

  • Reth

These operate the chain.


Tier 3 — Ecosystem Development Layer

Examples:

  • Foundry

  • Hardhat

  • ethers.js

  • viem

  • OpenZeppelin

These make Ethereum usable for developers.


Tier 4 — Operational & Infrastructure Layer

Examples:

  • mev-boost

  • mev-boost-relay

  • libp2p

  • Sourcify

  • Blockscout

These improve scalability, coordination, and observability.


Tier 5 — Specialized / Experimental / Auxiliary

Examples:

  • Fe

  • swiss-knife

  • act

  • niche zk tooling

These contribute value but are not structurally central.


10. Weight Distribution Strategy

The final weights were designed to satisfy four objectives simultaneously:

Objective 1 — Strong Hierarchy

The distribution must reflect obvious importance differences.


Objective 2 — Human Realism

The output should resemble how actual Ethereum participants think.


Objective 3 — Avoid Over-Concentration

No single repo should unrealistically dominate the ecosystem.


Objective 4 — Long Tail Preservation

Smaller repos still receive meaningful non-zero contribution.


11. Why Human Judgment Matters

Pure GitHub metrics are insufficient.

Examples of problems:

  • stars can be inflated,

  • older repos accumulate visibility advantages,

  • some critical infra remains invisible,

  • many infrastructure repos are backend-only.

For example:

  • blst

  • libp2p

  • consensus-specs

may appear less popular publicly,

but are absolutely foundational.

The model therefore combines:

  • GitHub visibility,

  • architectural reasoning,

  • dependency centrality,

  • ecosystem knowledge,

  • validator usage,

  • developer reliance.


12. Refinement Process

The model underwent several refinement stages.

Early Versions

Problems:

  • overly flat,

  • underweighted MEV infra,

  • overvalued experimental repos,

  • insufficient separation between core and peripheral tooling.


Intermediate Versions

Improvements:

  • stronger protocol emphasis,

  • client importance corrections,

  • better dev tooling recognition.


Final Version

The final model balances:

  • ecosystem realism,

  • structural dependency,

  • human intuition,

  • operational centrality.


13. Key Insights Learned During Modeling

Insight 1

Ethereum is much more tooling-dependent than initially expected.


Insight 2

MEV infrastructure has become core infrastructure.


Insight 3

Protocol specifications matter almost as much as implementations.


Insight 4

Developer adoption matters more than theoretical elegance.


Insight 5

Human jurors reward realistic ecosystem understanding more than mathematical purity.


14. Limitations

This model still has limitations.

14.1 Subjectivity

Some repository scoring inevitably involves human judgment.


14.2 Dynamic Ecosystem Evolution

Ethereum changes rapidly:

  • new zk systems,

  • new clients,

  • account abstraction,

  • rollup infrastructure,

  • proving systems.

Importance can shift over time.


14.3 Limited Public Jury Data

Only partial historical jury information was available for calibration.


15. Future Improvements

Future iterations could incorporate:

  • dependency graph centrality,

  • crates.io download statistics,

  • npm download counts,

  • validator client market share,

  • contributor activity,

  • commit recency,

  • L2 ecosystem integrations,

  • GitHub dependency network analysis,

  • semantic repo classification using LLMs.

A future version could combine:

  • graph theory,

  • probabilistic ranking,

  • human preference modeling,

  • ecosystem telemetry.


16. Final Conclusion

This submission attempts to model Ethereum the way experienced ecosystem participants perceive it:

not as a popularity contest,

but as a layered infrastructure system with unequal structural dependencies.

The final weights were built through:

  • architectural analysis,

  • ecosystem reasoning,

  • iterative refinement,

  • protocol understanding,

  • developer tooling evaluation,

  • validator infrastructure assessment,

  • MEV infrastructure correction,

  • human-centered ranking logic.

The final distribution aims to:

  • reflect realistic ecosystem importance,

  • align with jury intuition,

  • preserve meaningful hierarchy,

  • and satisfy the pairwise comparison framework used by Deep Funding GG24.

Ethereum is not built by one repository.

It is an interconnected civilization of infrastructure.

This model attempts to measure that structure as faithfully as possible

Ethereum Ecosystem Originality Estimation Model

DeepFunding GG24 – Level II Submission


Executive Summary

This model estimates the originality of 98 repositories within the Ethereum ecosystem by assigning each project a score between 0 and 1 representing the proportion of value generated internally versus inherited from dependencies.

The core objective is to approximate how technically informed Ethereum contributors evaluate originality in practice. Rather than treating originality as a simple function of dependency count or repository popularity, the model attempts to capture a deeper concept:

How much independent architectural, computational, and protocol-level work is actually performed by the repository itself?

The final distribution intentionally favors:

  • protocol-defining systems

  • execution engines

  • cryptographic primitives

  • independently implemented infrastructure

while penalizing:

  • orchestration layers

  • deployment wrappers

  • aggregation repositories

  • configuration-heavy systems

The resulting scores are designed to align with human expert judgement rather than purely statistical software metrics.


1. Problem Definition

Ethereum’s open-source ecosystem contains highly heterogeneous repositories:

  • consensus implementations

  • execution clients

  • cryptographic libraries

  • developer tooling

  • deployment systems

  • SDK abstractions

  • infrastructure orchestration layers

A major challenge in originality estimation is that:

  • operational importance
    does not necessarily imply:

  • architectural originality

For example:

  • a deployment framework may be operationally useful while relying heavily on existing components

  • a cryptographic primitive may appear small in size while containing highly original mathematical implementation work

The model therefore separates:

  • ecosystem utility
    from

  • originality

and focuses specifically on estimating the proportion of internally generated contribution.


2. Core Hypothesis

The central modeling hypothesis is:

Originality within Ethereum is fundamentally determined by architectural responsibility rather than dependency volume.

Repositories receive higher originality scores when they:

  • define protocol rules

  • implement execution semantics

  • introduce novel computation systems

  • implement cryptographic primitives

  • contain substantial independent logic

Repositories receive lower originality scores when they primarily:

  • coordinate existing systems

  • wrap external tooling

  • aggregate dependencies

  • provide deployment orchestration

  • expose interfaces over existing implementations

This framework intentionally prioritizes conceptual ownership over repository scale or popularity.


3. Model Architecture

The originality estimator is built as a layered scoring system composed of three independent components:

  1. Structural Role Prior

  2. Dependency Sensitivity Adjustment

  3. Development Signal Calibration

Each layer captures a distinct dimension of originality.


4. Layer 1 — Structural Role Prior

The primary signal in the model is functional repository classification.

Each repository is assigned to a structural category representing its architectural role inside Ethereum infrastructure.

This produces a baseline originality prior before refinements are applied.


Protocol and Specification Layer

Examples:

  • ethereum/eips

  • ethereum/consensus-specs

  • ethereum/execution-apis

These repositories define canonical protocol behavior and therefore occupy the highest originality tier.

Expected range:

0.86 – 0.92

Reasoning:

  • defines standards directly

  • creates ecosystem-wide rules

  • protocol cannot exist without them


Compiler / Execution Layer

Examples:

  • solidity

  • vyper

  • evmone

  • miden-vm

  • sp1

  • powdr

These repositories define or execute computation systems and therefore contain substantial independent engineering complexity.

Expected range:

0.82 – 0.90

Reasoning:

  • independent execution logic

  • virtual machine implementation

  • compiler semantics

  • heavy algorithmic contribution


Cryptographic Infrastructure

Examples:

  • blst

  • gnark-crypto

  • py_ecc

  • noble-curves

  • lambdaworks

These repositories implement foundational cryptographic systems and low-level mathematical primitives.

Expected range:

0.80 – 0.88

Reasoning:

  • advanced mathematical implementation

  • protocol-critical primitives

  • minimal orchestration behavior


Full Clients

Examples:

  • geth

  • reth

  • lighthouse

  • besu

  • prysm

  • nethermind

  • erigon

These repositories integrate multiple components while still implementing substantial protocol logic internally.

Expected range:

0.72 – 0.82

Reasoning:

  • high implementation complexity

  • protocol execution responsibility

  • integration-heavy but still architecturally significant


Developer Tooling

Examples:

  • foundry

  • hardhat

  • remix

  • blockscout

  • l2beat

These repositories enable ecosystem development and usability but often build on existing protocol infrastructure.

Expected range:

0.60 – 0.75

Reasoning:

  • substantial engineering effort

  • abstraction over protocol primitives

  • partial dependence on lower layers


Libraries and SDKs

Examples:

  • ethers.js

  • viem

  • web3.py

  • alloy

These repositories expose interfaces and abstractions over protocol systems.

Expected range:

0.55 – 0.70

Reasoning:

  • developer abstraction layer

  • moderate implementation complexity

  • lower architectural ownership


Wrappers and Adapters

Examples:

  • mev-boost

  • hardhat-deploy

  • op-succinct

  • DefiLlama adapters

Expected range:

0.40 – 0.60

Reasoning:

  • primarily coordination logic

  • relies heavily on external systems

  • lower independent computational contribution


Infrastructure and Deployment Systems

Examples:

  • scaffold-eth

  • eth-docker

  • ethereum-helm-charts

  • simple-optimism-node

Expected range:

0.25 – 0.50

Reasoning:

  • orchestration-heavy

  • configuration-oriented

  • limited independent protocol logic


Registry and Data Repositories

Examples:

  • chainlist

  • ethereum-lists/chains

Expected range:

0.20 – 0.35

Reasoning:

  • minimal implementation complexity

  • primarily structured data maintenance


5. Layer 2 — Dependency Sensitivity Adjustment

Dependency count alone is an unreliable measure of originality.

Modern software systems are naturally modular and therefore expected to depend on external packages.

Instead of applying linear penalties, the model uses a non-linear adjustment curve:

Dependency Profile Adjustment
Minimal dependencies +0.03 to +0.05
Moderate dependencies Neutral
Heavy dependency reliance −0.05 to −0.10

This prevents:

  • over-penalizing modern modular architectures

while still penalizing:

  • dependency-heavy wrappers

  • orchestration systems

  • aggregation repositories


6. Layer 3 — Development Signal Calibration

To approximate expert human reasoning more closely, additional implementation-level signals are incorporated.

These include:

  • contributor diversity

  • commit activity

  • implementation scale

  • language composition

  • infrastructure/configuration ratio

These are treated as calibration terms rather than primary signals.

The purpose is to distinguish:

  • genuine implementation complexity
    from

  • operational complexity


7. Score Composition

The final originality estimate is computed as:

Originality = Structural Prior + Dependency Adjustment + Development Calibration

The result is clipped within:

[0.15, 0.95]

to avoid unrealistic extremes and preserve distribution stability.


8. Distribution Design Philosophy

One of the most common failure modes in originality estimation is score compression.

Naive approaches tend to cluster most repositories around:

~0.65–0.75

which poorly reflects actual expert judgement.

This model intentionally produces:

  • high category separation

  • sharper penalties for orchestration systems

  • elevated protocol-layer originality

  • broader variance across repository classes

The resulting distribution better matches how technically informed evaluators differentiate:

  • protocol innovation
    from

  • infrastructure integration


9. Human Alignment Strategy

The model is explicitly designed to emulate how experienced Ethereum contributors reason about originality.

The primary evaluation question is:

Could this repository meaningfully exist without most of its dependencies?

Repositories whose value derives primarily from:

  • novel protocol logic

  • cryptographic implementation

  • execution semantics

  • independent architecture

receive high originality estimates.

Repositories whose value derives primarily from:

  • orchestration

  • deployment

  • aggregation

  • interface exposure

receive lower estimates.


10. Observed Behavioral Outcomes

The final scoring distribution exhibits several intended characteristics:

  • protocol repositories consistently occupy the highest originality tier

  • cryptographic primitives outperform orchestration systems

  • SDK abstractions remain below execution engines

  • deployment frameworks receive strong penalties

  • tooling systems stabilize in mid-tier ranges

  • infrastructure repositories avoid artificial inflation

This produces a distribution that is:

  • structurally coherent

  • technically interpretable

  • closer to expert human judgement


11. Improvements Over Baseline Approaches

Compared to naive dependency-based approaches, the model introduces several improvements:

Structural Awareness

The model understands architectural role rather than treating all repositories uniformly.

Human-Oriented Calibration

Scoring behavior is aligned with evaluator reasoning instead of purely statistical software metrics.

Reduced Dependency Inflation

Repositories are not rewarded simply for integrating many systems.

Higher Distribution Quality

Avoids artificial clustering and creates stronger differentiation between repository categories.


12. Conclusion

This submission proposes a structurally informed originality estimation framework specifically designed for Ethereum’s layered open-source architecture.

Rather than relying on simplistic dependency statistics, the model prioritizes:

  • architectural ownership

  • independent implementation complexity

  • protocol responsibility

  • conceptual innovation

The resulting originality distribution is intentionally designed to align more closely with technically informed human judgement while remaining internally consistent across heterogeneous repository classes.

By rewarding innovation over orchestration, the framework aligns with DeepFunding’s broader objective of funding meaningful long-term contributions to Ethereum infrastructure.


[Deep Funding Level III] Frequency-Weighted Dependency Importance Scoring (FWDIS)

Author: Achankun

Email: ichsanbit45@gmail.com

Pond Profile: Achankun

Best Leaderboard Score: 0.2402 (v383C)


1. Executive Summary

This writeup outlines the methodology for the Deep Funding Contest - Level III. The objective is to assign relative importance weights to 3,677 dependencies across 98 focal repositories. My solution, Frequency-Weighted Dependency Importance Scoring (FWDIS), introduces a global frequency signal to adjust local dependency weights. By identifying “foundational” dependencies used across multiple projects, the model achieves a high alignment with human jury evaluations.

2. Contest Objectives & Constraints

In Gitcoin Grants Round 24, we are tasked with predicting how much value a dependency contributes to its parent repository.

  • Goal: Predict weights for {dependency, repo, weight}.

  • Constraint: The sum of weights for all dependencies of a specific repository must equal 1.0.

  • Evaluation: Scored against a human jury’s subjective valuation (Mean Absolute Error).

3. Methodology: FWDIS Model

The model development followed a rigorous three-stage pipeline:

3.1. Anchor Selection (The Baseline)

Rather than using a uniform distribution, the model starts with a pre-calibrated baseline (v353, score: 0.2472). This anchor provides a high-quality initial distribution of weights based on basic structural signals in the Ethereum ecosystem.

3.2. Global Frequency Signal (Feature Engineering)

The core insight of the FWDIS model is that ecosystem-wide utility is a strong proxy for importance.

  • Hypothesis: A dependency that is essential enough to be used by 20 different repos is likely more “foundational” than a niche dependency used by only one.

  • Metric: I calculated a freq_score by counting the unique repositories that utilize each dependency, normalized by the total number of repositories in the contest (98).

3.3. Frequency-Weighted Boost

I applied a multiplicative amplification to the anchor weights based on the frequency signal. This allows universal tools (like web3.py or eth-account) to naturally float to the top of the importance ranking.

The core formula:

w_new = w_anchor * (1 + γ * freq_score)

Where γ (Gamma) is the boost coefficient. Through extensive grid search, 0.42 was identified as the optimal value for balancing global foundational importance with local repository specifics.

4. Technical Implementation

After applying the boost, a critical Re-normalization step was performed. Since the boost increases the raw weight values, I grouped the data by repo and divided each weight by the sum of weights for that repo to ensure the total weight remains exactly 1.0.

Model Logic (Python):

Python

# 1. Compute global dependency frequency
freq_count = df['dependency'].value_counts()
total_repos = df['repo'].nunique()
df['freq_score'] = df['dependency'].map(freq_count) / total_repos

# 2. Apply the Gamma-tuned boost (Gamma = 0.42)
df['weight'] = df['weight'] * (1 + 0.42 * df['freq_score'])

# 3. Ensure mathematical integrity (Normalization)
df['weight'] = df.groupby('repo')['weight'].transform(lambda x: x / x.sum())

5. Results and Validation

The final submission (v383C) resulted in a score of 0.2402, placing it within the top tier of the leaderboard.

Data Integrity Checks:

  • Weight Conservation: Verified that every repository’s dependency weights sum to exactly 1.0 (Precision < 1e-9).

  • Non-negativity: All weights are strictly non-negative.

  • Ecosystem Alignment: The model successfully identified key infrastructure projects and assigned them higher importance scores, reflecting the likely consensus of the human jury.

6. Conclusion

The FWDIS model demonstrates that foundational importance is not just a local property but a global one. By leveraging cross-repository frequency, we can approximate the subjective “value” that human experts assign to critical open-source infrastructure. This approach provides a scalable, transparent, and mathematically sound framework for dependency valuation in the Gitcoin ecosystem.



Delegated voting is interesting but it creates its own power dynamics. You end up with a small group of delegates controlling most voting power. Rotation mechanisms or term limits for delegates might be worth exploring.

One thing that gets overlooked in governance discussions is voter fatigue. When there are too many proposals, participation drops. Batching related proposals or having sub-committees handle routine decisions could help.