Thanks for the meaningful and direct feedback.
Let’s say you are a sybil attacker. You make a bunch of accounts Simona1, Simona2, Simona3… This is a clear signal, but we wouldn’t see the usernames being closely related without
- Noticing the behavior
- Turning the insight into a data problem - A model
- Creating a process for continually evaluating and updating the model
Here is the example of how we recently created another model for fighting off your attack: Fighting Simple Sybils: Levenshtein Distance - HackMD
Now we know how to detect it, but we need to either manual squelch those accounts or algorithmically do it. Algorithmic scales, manual does not.
To make this “feature” a part of the algorithmic detection we must go a step further. We can turn the detection of the behavior into a feature for the ml model. This is the overall algorithm that combines many features to determine if an account is sybil.
But what happens when the legit user “Simona69” gets caught in the heuristic? This is why we have humans in the loop. https://medium.com/vsinghbisen/what-is-human-in-the-loop-machine-learning-why-how-used-in-ai-60c7b44eb2c0
The humans will notice that Simona69 is not like the 68 before it. This info will then train the algo that the high quality signal DOES have exceptions.
The more humans that we have evaluate, the more opportunities for us to spot the algorithm being unfair to classes of users. The community model is finding new features and models to incorporate into what you might call a “meta-model” or ensemble which is best capable of accurately identifying sybil, and more importantly - NOT SYBIL accounts.
We have discussed moving the data science initiative to DAOops and maybe this is the time to do that. It helps FDD, but could be more beneficial to the DAO as a whole. The interesting part of connecting the group to FDD is that spotting sybil behavior is a lot like moving your Scrabble pieces around to see if you have a word. Sometimes we know exactly what data to request to build a feature, but spotting new behaviors is a mix of active analysis and serendipity.
We could lower the overall ask to $115k for the round to round improvements and ethical guardrails to be in place by transfering the data science budget to come from DAOops.