Questions about efficiency of filter list

General information, announcements and questions about the EasyList subscriptions.
Locked
Contact Form Bot
Contact Bot
Contact Bot
Posts: 0
Joined: Fri Mar 12, 2021 1:18 am

Questions about efficiency of filter list

Post by Contact Form Bot »

Name: Yuyang

Subject: Questions about efficiency of filter list
Hey EasyList authors,

I'm Yuyang (from China). I'm a student in PolyU and mainly focus on web performance and privacy. Recently I'm digging into ad blocker things, and I found most popular ad blockers use EasyList for filting ads, so I'm curious about how EasyList works. However, after searching lots of things on Google and GitHub, I can only found few knowledge about it, I have no idea about the questions so I want to reach out to you guys directly. Below are my questions:

1. How to determine what rules should be included.
2. More and more rules being added to the list, are there any rules be removed?
3. How to evaluate the quality of rules? For example, apply EasyList to a sample of 10,000 websites in a regular time and check the coverage and how many of them really works.

I'm really looking forward to your reply, many thanks!
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12231
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

Code: Select all

1. How to determine what rules should be included.
  • Whether the filter is actually needed or not
  • Knowledge of existing filters
  • History of previously removed filters
  • Can suggested filter be improved or optimised?

Code: Select all

2. More and more rules being added to the list, are there any rules be removed?
See, https://github.com/easylist/easylist/is ... -821931112 We're currently around 360-370k since that graph was made.

Code: Select all

3. How to evaluate the quality of rules? For example, apply EasyList to a sample of 10,000 websites in a regular time and check the coverage and how many of them really works.
Sites change, rules will also change over time. Even then, some sites have active rules from 12 years when it was committed. During the pruning/trimming of rules, rules can be checked against the site to see if they're still in use. Its a manual task.
Attachments
115134444-387fb780-a064-11eb-9945-5cea1fe47f95.png
Paul Parker
Site Member
Site Member
Posts: 23
Joined: Thu May 09, 2019 5:15 pm

Post by Paul Parker »

A huge part of ad blocking performance is determined by how the ad blocking software applies the filters. For example, uBlock Origin (originally named "HTTP Switchboard") made it's debut by dramatically reducing memory consumption in comparison to Ad Block Plus. See https://github.com/gorhill/httpswitchbo ... onsumption

You might be interested in this 2020 ad blocker performance benchmark:

https://www.debugbear.com/blog/2020-chr ... d-blockers
Nick757
Forum Junkie
Forum Junkie
Posts: 157
Joined: Mon Feb 08, 2021 7:40 am

Post by Nick757 »

@fanboy annoyance encountered after scrolling down.
Bizdelnick
New Member
New Member
Posts: 8
Joined: Sun Oct 17, 2021 7:36 am

Post by Bizdelnick »

fanboy wrote: Tue Jun 01, 2021 2:41 am Sites change, rules will also change over time. Even then, some sites have active rules from 12 years when it was committed. During the pruning/trimming of rules, rules can be checked against the site to see if they're still in use. Its a manual task.
Has anyone tried to automate this process? Now I see many rules for undelegated domains and hosts that are down or have no web server running. It is quite easy to write a script that will find such rules and remove them. I can help with this work if you are interested.
Bizdelnick
New Member
New Member
Posts: 8
Joined: Sun Oct 17, 2021 7:36 am

Post by Bizdelnick »

fanboy wrote: Tue Jun 01, 2021 2:41 am See, https://github.com/easylist/easylist/is ... -821931112 We're currently around 360-370k since that graph was made.
Sorry, I missed that link. That's great if easylist is cleaned up this way, but it doesn't seem enough. I thought about different variants and decided that relying on whois only is not the best idea. It is better to check DNS records. What looks reliable enough to me is querying NS records and consider domain is dead if reply is NXDOMAIN.
There are also IP addresses listed that need to be checked. The only idea I have is to try connecting web servers on them. This should be repeated several times in case of failure to ensure that it is permanently down.
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12231
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

From https://github.com/easylist/easylist/tree/master/docs an update of Easylist. Pruning from 2019-2021
Attachments
Easylist-2019-2021-stats (1).png
Locked