Redundancy checker

General information, announcements and questions about the EasyList subscriptions.
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Redundancy checker

Post by Famlam »

Hi everyone,

Just because I still had a few minutes of spare time in my schedule and I hate to be bored, I recently wrote a redundant AdBlock Plus rule checker.
I know there already is another redundancy check available, but that one has some limitations, like not being able to check hiding rules and a no support for the $domain=* option. This checker does contain that functionality. Although support for hiding rules is limited (try ##a[href^="http://ad."] and ##a[href*="ad."] for example), it's sufficient in most cases. And, something that is very important if you just have a few minutes of spare time, it is faster than the other one in most cases :-) !
For more advanced users, there also is an option to ignore the options of blocking rules (except for the ones that have to be specified in order to have any effect, like $document and $popup), so you can find the redundancy of ||foo.com^/bar. and /bar.$domain=foo.com. But be aware, this option also causes a lot of false positives!

You can check it out here: https://arestwo.org/famlam/redundantRuleChecker.html (mirror http://abp.surge.sh/redundantRuleChecker/)

Kind regards,
Famlam
anonmouse
Senior Member
Senior Member
Posts: 51
Joined: Fri Sep 30, 2011 2:14 am

Post by anonmouse »

awesome work !
User avatar
Hubird
Adversity Author
Adversity Author
Posts: 1768
Joined: Sun Sep 30, 2007 4:31 am
Location: Australia

Post by Hubird »

Seems to be an issue with your checker

eg:

Code: Select all

-ad-banner. has been made redundant by -banner.$domain=coastvoip.com.au|nationalturk.com|totalscifionline.com|reviewlinux.com
The above statement is one of the many redundant rules reported for Adversity however it is not true.
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

Hi, I think you enabled the option "Do not check if the options for whitelisting and blocking rules match", which strips all things behind $ (document, elemhide, donottrack, popup). I can't reproduce it with the given filters if I do not enable that option.
User avatar
Hubird
Adversity Author
Adversity Author
Posts: 1768
Joined: Sun Sep 30, 2007 4:31 am
Location: Australia

Post by Hubird »

Yes I did enable that option while trying to get the redundancy checker to properly parse $domain switches.

For example, if I add the following 2 rules (without the box ticked)

Code: Select all


||example.com/ads/
/ads/*$domain=example.com
One rule should be marked as redundant, but this is not the case.
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

But they aren't redundant:
||example.com/ads/ can also match on foo.com (if used as a third party resource). /ads/*$domain=example.com however, won't ever match on foo.com. Therefore they aren't redundant.
User avatar
Hubird
Adversity Author
Adversity Author
Posts: 1768
Joined: Sun Sep 30, 2007 4:31 am
Location: Australia

Post by Hubird »

Famlam wrote:But they aren't redundant:
||example.com/ads/ can also match on foo.com (if used as a third party resource). /ads/*$domain=example.com however, won't ever match on foo.com. Therefore they aren't redundant.
:oops: True... Sorry about the false alarm.

Thanks
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

Just to mention it, as this is a bit bigger update than any before since the initial 'official' release, the hiding rules check has been improved a lot today :)

The total changelog of today would be:
Hiding rules:
- sequence independent in the last part of the tree selector (###a.b is redundant of ##.b#a, but ##.b#a > c and ###a.b > c not (yet)).
- FOO.COM##* and foo.com##* are now redundant
- ###ads and ##[id="ads"] and ##[id] and and are now redundant (except when in :not(...) selector)
- ##.ads and ##[class="ads"] and ##[class] and and are now redundant (except when in :not(...) selector)
- attribute selectors: ##[x=A], ##[x='A'], ##[x="A"], [x] are redundant (except when in :not(...) selector)
- attribute selectors: ##[x] and ##[x(* or ~ or | or ^ or $)="something"] are redundant (except when in :not(...) selector)
- ##div and ##DIV are now redundant (except when in :not(...) selector)
- the attributes in attribute selectors are now redundant (##[border="a"] and ##[BoRdEr="a"]) (except when in :not(...) selector)
- ##* makes, well, everything redundant
- speed improvements (however, sequence independent checking costs more time, so actually it'll be a little bit slower)

Blocking rules:
- $domain=abc,image and $dOmAiN=aBc,ImAgE are redundant
- FIXED: a line with nothing but tabs did make every resource redundant (hopefully no-one followed that suggestion :D)
- FIXED: ||foo.com and ^foo.com were reported as redundant
- small speed improvements
User avatar
Crits
Liste FR Author
Liste FR Author
Posts: 682
Joined: Sun Dec 18, 2011 6:21 pm
Location: France

Post by Crits »

What would be cool to have:
If we only input example.com#@##foobar, the script would tell us that this filter is useless because ###foobar doesn't exist anymore.

It would be interesting for supplementary subscriptions so as to detect obsolete exception hiding rules, if the hiding rule in question has been deleted from EasyList (and it's likely to happen as the deleted hiding rules are the ones that were causing many false-positives).
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

True. I'll however not implement this by default, because the tool should also work when either
1. a supplemental list only wants to find it's own issues
2. a non-supplemental list may have a couple of exceptions for other filter lists that are commonly used
3. a user has #@# rules in their custom filters and they only check those (although this won't be often, especially as ABP doesn't allow creating hiding exception rules via a wizard)
4. you only check a part of the list
and I don't want to risk the less smarter people who may use it to remove filters that shouldn't be removed, only because my tool says so. (The rule should find redundancies based upon the syntax, but not based upon absence of filters)
I'm however thinking of implementing a new 'Tools' tab in the future, which could contain a few of such features. (Including a request from MonztA recently, to report rules with the same rule but a different domain, like abcdef#@##ads and ghijkl#@##ads)
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

Hi everyone,
My supposed-to-be-absence ended 1.5 day earlier than I initially expected, so I've spend a couple of hours on implementing the suggestions.
@Crits and @MonztA (the latter via PM): your suggestions have been implemented.
This is the complete changelog, for who is interested:
- Add several tools:
  • A similar rules finder tool
  • Tools to ignore domains and blocking options (this replaces the checkbox)
  • A tool to use a less strict matching algorithm
  • A tool to find rules which have the same rule, but a different domain (requested by MonztA)
  • A tool to find the rules that make whitelisting rules necessary, which also displays if no rules could be found for a whitelisting rule (requested by Crits)
Note: those tools try their best, but certainly aren't perfect. So don't rely on their results.

- Fixes
  • '//' isn't a regex
  • ' !x' (with whitespace in front of it) is a comment, not a blocking rule
  • '*!x' and similar will no longer trigger 'unnecessary preceeding wildcard found' warnings, since it would become a comment without *
  • '/\|\|x/' and similar regex rules no longer make ||x redundant
- Added/Modified
  • depricate $donottrack, since it's removed from ABP too
  • ||x will now also be matched if both /x and .x are present
  • A lot of internal code changes certainly worth mentioning, but you wouldn't notice the difference anyway, so I'm too lazy to explain them
User avatar
Crits
Liste FR Author
Liste FR Author
Posts: 682
Joined: Sun Dec 18, 2011 6:21 pm
Location: France

Post by Crits »

Awesome, thanks!
MonztA
EasyList Author
EasyList Author
Posts: 8121
Joined: Thu Jul 26, 2007 4:19 pm
Location: Germany

Post by MonztA »

Thanks!
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

Hi all, I'm glad to announce that I got stuck in a train that didn't continue to its destination a couple of times in a time span of a few weeks and thus found some time to update the code of the redundancy checker such that it is about 30 percent faster now. Enjoy ;)
User avatar
gymka
EasyList Lithuania Author
EasyList Lithuania Author
Posts: 60
Joined: Sun May 04, 2014 10:15 am

Post by gymka »

filter list:

Code: Select all

site1.lt##.banner
site2.lt##.banner
actual result: none, no redundancies/optimizations found
expected result: write rule "##.banner" and you'll get same result as with "site1.lt##.banner site2.lt##.banner"

or i'm wrong and it's faster to have few lines for same item?
"To born stupid is not shame, just to die stupid is shameful." E. M. Remarque
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

It's not a redundancy, therefore it's not listed immediately. However, there is a tool to do so. Check "Tools" > "Search for equal rules for which only the domain differs"
ZolPar2
Site Member
Site Member
Posts: 23
Joined: Mon Mar 31, 2014 9:13 am

Post by ZolPar2 »

Today found unoptimized usually fix it, please:

Code: Select all

@@||jjcast.com^$elemhide and footstream.tv,jjcast.com,leton.tv##div[id^="timer"] : They are redundant for at least domain 'jjcast.com'
@@||imagebam.com/image/$popup! Last modified: 06 Jul 2014 08:20 UTC : Unnecessary whitespace character(s) found
vier986
New Member
New Member
Posts: 1
Joined: Sun Aug 17, 2014 3:32 am

Post by vier986 »

it‘s so awesome, thanks! :banana: :banana:
monnawynter
Forum Junkie
Forum Junkie
Posts: 189
Joined: Mon Sep 29, 2008 9:48 am

Post by monnawynter »

So the checker on Adblock's site is inferior to this one? Out of curiosity I checked it and it gave some results like
'||webstats.sapo.pt^' has been made redundant by '.webstats.'
'||c.bigmir.net^' has been made redundant by '||bigmir.net^$third-party'
while this one showed nothing. Are these false positives?
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

monnawynter wrote:So the checker on Adblock's site is inferior to this one? Out of curiosity I checked it and it gave some results like
'||webstats.sapo.pt^' has been made redundant by '.webstats.'
'||c.bigmir.net^' has been made redundant by '||bigmir.net^$third-party'
while this one showed nothing. Are these false positives?
They are false positives indeed:
First one is not redundant because '.webstats. will not match 'http://webstats.sapo.pt/' (and the ||webstats.sapo.pt^ one will)
Second one is not redundant because ||c.bigmir.net^ will also work on first-party base, while the other filter will only match on third-party domains.
I would advice to use this redundancy checker over the one on adblockplus.org, because adblockplus.org has quite some issues.
alexz
Guest

Post by alexz »

i just did a check for EasyList without rules for adult sites and the result is
Finished (after 3125 seconds)! 13 redundant rules found!
discovery.com##.banner-video has been made redundant by @@||discovery.com^$elemhide
mangabird.com#@#.ad468 has been made redundant by @@||mangabird.com^$elemhide
locatetv.com##.adBlock has been made redundant by ##.adBlock
search.yahoo.com###ysch #doc #bd #results #cols #left #main .ads .left-ad has been made redundant by ##.left-ad
kclu.org,notjustok.com##.widget_openxwpwidget has been made redundant by ##.widget_openxwpwidget
vodly.to,vodly.unblocked2.co##a[href^="http://ads.integral-marketing.com/"] has been made redundant by ##a[href^="http://ads.integral-marketing.com/"]
search.yahoo.com###doc #cols #right #east has been made redundant by search.disconnect.me,search.yahoo.com###east
search.yahoo.com###ysch #doc #bd #results #cols #right #east .ads has been made redundant by search.disconnect.me,search.yahoo.com###east
search.yahoo.com###ysch #doc #bd #results #cols #left #main .ads .more-sponsors has been made redundant by yahoo.com##.more-sponsors
search.yahoo.com###ysch #doc #bd #results #cols #left #main .ads .spns has been made redundant by yahoo.com##.spns
search.yahoo.com###ysch #doc #bd #results #cols #left #main .ads has been made redundant by 1sale.com,7billionworld.com,abajournal.com,altavista.com,androidfilehost.com,arcadeprehacks.com,asbarez.com,birdforum.net,coinad.com,cuzoogle.com,cyclingweekly.co.uk,disconnect.me,domainnamenews.com,eco-business.com,energylivenews.com,facemoods.com,fcall.in,flashx.tv,foxbusiness.com,foxnews.com,freetvall.com,friendster.com,fstoppers.com,ftadviser.com,furaffinity.net,gentoo.org,gmanetwork.com,govtrack.us,gramfeed.com,gyazo.com,hispanicbusiness.com,html5test.com,hurricanevanessa.com,i-dressup.com,iheart.com,ilovetypography.com,isearch.whitesmoke.com,itar-tass.com,itproportal.com,kingdomrush.net,laptopmag.com,laweekly.com,lfpress.com,livetvcafe.net,lovemyanime.net,malaysiakini.com,manga-download.org,maps.google.com,marinetraffic.com,mb.com.ph,meaningtattos.tk,mmajunkie.com,movies-online-free.net,mugshots.com,myfitnesspal.com,mypaper.sg,nbcnews.com,news.nom.co,nsfwyoutube.com,nugget.ca,panorama.am,pastie.org,phpbb.com,playboy.com,pocket-lint.com,pokernews.com,previously.tv,radiobroadcaster.org,reason.com,ryanseacrest.com,savevideo.me,sddt.com,searchfunmoods.com,sgcarmart.com,shopbot.ca,sourceforge.net,tcm.com,tech2.com,thecambodiaherald.com,thedailyobserver.ca,thejakartapost.com,thelakewoodscoop.com,themalaysianinsider.com,theobserver.ca,thepeterboroughexaminer.com,theyeshivaworld.com,tiberium-alliances.com,tjpnews.com,today.com,tubeserv.com,turner.com,twogag.com,ultimate-guitar.com,wallpaper.com,washingtonpost.com,wdet.org,wftlsports.com,womanandhome.com,wtvz.net,yahoo.com,youthedesigner.com,yuku.com##.ads
||wikifeet.com/mgid.html has been made redundant by /mgid.html
||pitchfork.com^*/ads.css has been made redundant by /ads.css
thers also 1 warning
The following error, warning or optimalization was encountered while checking the rules:
@@||discovery.com^$elemhide and discovery.com,freemake.com###top-advertising : They are redundant for at least domain 'discovery.com'
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

For those of you who are interested: I released an update for the redundancy checker. A newly included tool allows one to check whether domains are live, dead or redirected. Unfortunately such checks do not depend on the speed of your computer, but rather on the response times of servers. Therefore it may take a long while to process all domains. Fortunately you can export intermediate results, and by pasting them as if they were a new filter list, you can resume the check at a later point.
At this very moment the tool only works on Chrome. Opera will follow once I receive confirmation that the server can handle the necessary operations. Other browsers are unlikely to follow soon. The cause of this is that web pages itself are not allowed to perform the necessary checks, so a browser extension is needed to aid in this process (and I'm not familiar with the FireFox APIs).
Example output (for EasyList Dutch) can be viewed here: http://pastebin.com/v1LckS5L . One should take care when interpreting the results:
  1. (resources on) subdomains may exist, even though the main domain is dead. To be sure that no resources exists on a domain, one could Google for site:thedomainyouwantto.check
  2. in the case of blocking rules where the checked domain is the domain between the || and the ^ or / (e.g. xx.xx in ||xx.xx/url$domain=yy.yy) even dead or redirected resources may still consume space on the original website. The only way to check this is by visiting the URL for which the filter was originally added.
Have fun!
JordanElliott
Contributor
Contributor
Posts: 1369
Joined: Wed Jan 16, 2013 4:53 pm

Post by JordanElliott »

pastebin.com/xTcNmupS
User avatar
Lain_13
RU AdList Author
RU AdList Author
Posts: 1041
Joined: Fri Aug 20, 2010 11:20 am

Post by Lain_13 »

Hi, following duplicates are still in the list:

search.yahoo.com###ysch #doc #bd #results #cols #left #main .ads .left-ad has been made redundant by ##.left-ad
search.yahoo.com###cols > #left > #main > ol > li[id^="yui_"] has been made redundant by search.yahoo.com###main > ol li[id^="yui_"]
search.yahoo.com###doc #cols #right #east has been made redundant by search.disconnect.me,search.yahoo.com###east
search.yahoo.com###ysch #doc #bd #results #cols #right #east .ads has been made redundant by search.disconnect.me,search.yahoo.com###east
search.yahoo.com###left > #main > div[id^="yui_"][class] > ul[class] > li[class] has been made redundant by search.yahoo.com###left > #main > div[id^="yui_"]
search.yahoo.com###left > #main > div[id^="yui_"][class]:first-child > div[class]:last-child has been made redundant by search.yahoo.com###left > #main > div[id^="yui_"]
search.yahoo.com###right .first > div[style="background-color:#fafaff;border-color:#FAFAFF;padding:4px 10px 12px;"] has been made redundant by search.yahoo.com###right div[style="background-color:#fafaff;border-color:#FAFAFF;padding:4px 10px 12px;"]
search.yahoo.com###right ol li[id^="yui_"] > .dd > .layoutMiddle has been made redundant by search.yahoo.com###right li[id^="yui_"] .dd > .layoutMiddle
search.yahoo.com###ysch #doc #bd #results #cols #left #main .ads .more-sponsors has been made redundant by yahoo.com##.more-sponsors
search.yahoo.com###ysch #doc #bd #results #cols #left #main .ads .spns has been made redundant by yahoo.com##.spns
search.yahoo.com###ysch #doc #bd #results #cols #left #main .ads has been made redundant by 1sale.com,7billionworld.com,abajournal.com,altavista.com,androidfilehost.com,arcadeprehacks.com,asbarez.com,birdforum.net,coinad.com,cuzoogle.com,cyclingweekly.co.uk,disconnect.me,domainnamenews.com,eco-business.com,energylivenews.com,facemoods.com,fcall.in,flashx.tv,foxbusiness.com,foxnews.com,freetvall.com,friendster.com,fstoppers.com,ftadviser.com,furaffinity.net,gentoo.org,gmanetwork.com,govtrack.us,gramfeed.com,gyazo.com,hispanicbusiness.com,html5test.com,hurricanevanessa.com,i-dressup.com,iheart.com,ilovetypography.com,irennews.org,isearch.whitesmoke.com,itar-tass.com,itproportal.com,kingdomrush.net,laptopmag.com,laweekly.com,lfpress.com,livetvcafe.net,lovemyanime.net,malaysiakini.com,manga-download.org,maps.google.com,marinetraffic.com,mb.com.ph,meaningtattos.tk,mmajunkie.com,movies-online-free.net,mugshots.com,myfitnesspal.com,mypaper.sg,nbcnews.com,news.nom.co,nsfwyoutube.com,nugget.ca,panorama.am,pastie.org,phpbb.com,playboy.com,pocket-lint.com,pokernews.com,previously.tv,radiobroadcaster.org,reason.com,ryanseacrest.com,savevideo.me,sddt.com,searchfunmoods.com,sgcarmart.com,shopbot.ca,sourceforge.net,tcm.com,tech2.com,thecambodiaherald.com,thedailyobserver.ca,thejakartapost.com,thelakewoodscoop.com,themalaysianinsider.com,theobserver.ca,thepeterboroughexaminer.com,theyeshivaworld.com,tiberium-alliances.com,tjpnews.com,today.com,tubeserv.com,turner.com,twogag.com,ultimate-guitar.com,wallpaper.com,washingtonpost.com,wdet.org,wftlsports.com,womanandhome.com,wtvz.net,yahoo.com,youthedesigner.com,yuku.com##.ads
groups.yahoo.com##.yg-mbad-row > * has been made redundant by groups.yahoo.com##.yg-mbad-row


Additionally this filter:
||topbinaryaffiliates.ck-cdn.com^$third-party
Could be replaced with this:
||ck-cdn.com^$third-party
barbaz
Postaholic
Postaholic
Posts: 204
Joined: Mon Sep 15, 2014 12:55 am

Post by barbaz »

https://arestwo.org/famlam/changelog.html wrote:Fixwhitelist rules that imply $document were made redundant by rules that do not imply $document if no options were present (@@|http:// was made redundant by @@http)
Is the behavior that was fixed now correct behavior in light of https://hg.adblockplus.org/adblockplus/rev/cc3f3887226a?
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

I'll have to check it a bit more carefully, but on first sight it seems you're right. Thanks for telling me! I'll fix it later this week!
beast
Guest

Post by beast »

My firefox runs slowly. So I just want url filters, not elementary hide rules.

Could you add a new option to the "Redundancy checker" :cut away elementary hide rules ?
Famlam
EasyList Author
EasyList Author
Posts: 1782
Joined: Sun May 09, 2010 11:37 am
Location: The Netherlands

Post by Famlam »

That's not the purpose of the redundancy checker ;).
A much faster method is to install a text editor that allows you to mark lines containing a specific character ("#" in this case), then remove all marked lines (Example: Notepad++).
For just plain EasyList, there is a filter list which does not contain hiding filters: https://easylist-downloads.adblockplus. ... emhide.txt
OnlyHereForTheBeer
Guest

Post by OnlyHereForTheBeer »

459 redundancies in the current Fanboy’s Ultimate list.
459 redundant rules found!
monnawynter
Forum Junkie
Forum Junkie
Posts: 189
Joined: Mon Sep 29, 2008 9:48 am

Post by monnawynter »

The 12-12-15 Fanboy’s Ultimate list contains 459 redundancies.
Finished (after 1227 seconds)! 459 redundant rules found!
[Adblock Plus 2.0]
! Checksum: 6ABuG+WEF7Dozj8VngKjQA
! Title: Fanboy+Easylist-Merged Ultimate List
! Version: 201512121200
! Last modified: 12 Dec 2015 12:00 UTC
Locked