Ever considered dropping the non-domain filters? [n/a]

Post by **Guest** » Sat Mar 31, 2007 8:54 am

I tend to not use any of the subscription lists, because they all seem to include non-domain-specific rules like "/partner/*?" and "_ban_" and "_files/*.htm" (all three obtained from a quick scan of EasyList), any of which could easily cause false positives. A quick look on this forum confirmed that most of the false positives seem to come from these general rules, not domain-specific rules.

By contrast, all of the rules I use look like this:
://ads.tripod.lycos.de/
://popinads.com/
://www.koin.com/ads/
://*.ads-click.com/
://ad.zanox.com/
These rules match particular ad sites, avoid blocking those same strings in other components of an URL (important for short, potentially ambiguous site names), and should almost never generate false positives. (I do have a small handful of non-domain-specific rules, but they all match very specific ad-serving scripts at specific points in an URL, such as "://*/adjs.php?".)

In general, I'd rather have ten false negatives that I need to add or loosen a rule for than one false positive that causes me to miss something. This approach to blocking also means I don't run into the kinds of problems referenced in the topic about turning off adblock for online shopping; in the absence of domain-specific rules, I *can* distinguish between (for instance) ads and product images.

Have you ever considered dropping some of the non-domain-specific rules in favor of domain-specific rules? Do you just use the non-domain-specific approach to keep EasyList short, or do you use them because you expect them to block ads from ad servers you haven't seen yet?

Post by **Guest** » Sat Mar 31, 2007 9:04 am

Additional note: because all of my filters include domains, they almost always provide an 8 character substring, and thus help adblock plus go faster.

Post by **rick752** » Sat Mar 31, 2007 1:01 pm

I'm sure they work fine for YOU. I'm trying to do maximum blocking for one million subscribers worldwide with the least false-positives. Have I ever though of changing my filter structure after building it for 2 years to be domain specific for every site? ... "no".

In a perfect world, all ads on all domains would have the exact same address day after day .... and they would never change their directory structure ...and they would never rotate between different adservers ... and blocking one domain specific string would not have a false-positive effect on something else in the same domain .... and blocking a known adserver would not cause something on a reputable major site not to work .... in a perfect world.

You COULD visit every domain in the world and write code to block the ads specific to that site .... and then go back to them again in a couple of months because 30% of them had changed their addressing, naming, and structuring (see the now-defunct Dutchblock subscription). Sounds easy to me ... are you volunteering?

It's funny you say that the false-positives here are mainly caused by 'general' strings. While that may happen from time to time, the biggest problems are caused by strings made specifically to target a specific domain (even though it may not say it) .... a lot of times, the false-positives occur within the same domain that the string was actually designed for. These happen a lot for webmail and video applications where a third-party ad domain is serving normal content through a reputable website. Try watching a Fox News video while blocking "ad.doubleclick.net" .... and there are MANY more "not so cut and dry" examples just like that.

These things are easier said, my friend. No offense, but I would probably take you more seriously if you were actually USING my 2 subscriptions instead of just visually analyzing them. One million steady subscribers and you are looking at my relatively small list of false-positive reports over the last year and a half. I also pride myself at whitelisting only the 'targeted' problem string of a site rather than simply whitelisting the whole thing. If you think this is easy, just think about trying to satisfy 10,000 people with your list .... then times that by 100.

Note:
A few months ago, a lot of major sites did some major restructuring to the way that they serve info, ads, and media. A lot of the stuff that was reported and changed here was because of that. It has since quieted down here as we're catching up again ... just waiting for the next round of changes.

ps: I really don't have any major problems on shopping sites with my filters (but you never know as I can't visit them all). I tell people to disable ABP for shopping because I also don't want their own filtering or another subscription to cause problems .... it's just easier that way.

Post by **Guest** » Mon Apr 02, 2007 12:47 am

First, a clarification: I didn't mean to suggest any problem with your lists, and I do find the . I genuinely wondered, and your response describes some issues I didn't know about; now that I know about those issues, I realize that my approach doesn't work quite as easily as I thought. Thank you.

rick752 wrote:I'm sure they work fine for YOU. I'm trying to do maximum blocking for one million subscribers worldwide with the least false-positives. Have I ever though of changing my filter structure after building it for 2 years to be domain specific for every site? ... "no".

When I said "domain-specific", I didn't mean "specific to every site users visit"; I meant "specific to every ad vendor". While still a hard problem, it involves many orders of magnitude fewer sites. I naturally wouldn't suggest trying to write filters specific to each site users visit.

rick752 wrote:In a perfect world, all ads on all domains would have the exact same address day after day .... and they would never change their directory structure ...and they would never rotate between different adservers ... and blocking one domain specific string would not have a false-positive effect on something else in the same domain .... and blocking a known adserver would not cause something on a reputable major site not to work .... in a perfect world.

Most of those seem like easy problems until you get to the last one, which completely invalidates the approach I use. Once I discover a new ad vendor, I assume they serve nothing but ads, and I block "://ad.vendor.com/". I had no idea that semi-legitimate content originated from such domains. Now that I know that, I can understand where the remaining problems come from: if you have to block only particular content from an ad vendor, rather than all content, then suddenly you become sensitive to their site structure.

rick752 wrote:You COULD visit every domain in the world and write code to block the ads specific to that site .... and then go back to them again in a couple of months because 30% of them had changed their addressing, naming, and structuring (see the now-defunct Dutchblock subscription). Sounds easy to me ... are you volunteering?

A question for clarification: when you said this, did you mean "every domain" or "every ad domain"? For the former: no, I have a life.

For the latter: sure, I plan to continue to do that, but with a little more care now thanks to your advice.

rick752 wrote:It's funny you say that the false-positives here are mainly caused by 'general' strings. While that may happen from time to time, the biggest problems are caused by strings made specifically to target a specific domain (even though it may not say it) .... a lot of times, the false-positives occur within the same domain that the string was actually designed for. These happen a lot for webmail and video applications where a third-party ad domain is serving normal content through a reputable website. Try watching a Fox News video while blocking "ad.doubleclick.net" .... and there are MANY more "not so cut and dry" examples just like that.

As I said above, I had no idea that sites would do something as brain-damaged as serving useful content from an ad domain. I also likely did not notice because I never watch embedded video; I always download it with something like UnPlug or youtube-dl, and watch it locally. Thank you for pointing out this issue; I'll have to watch my filtering more carefully in the future.

rick752 wrote:These things are easier said, my friend. No offense, but I would probably take you more seriously if you were actually USING my 2 subscriptions instead of just visually analyzing them. One million steady subscribers and you are looking at my relatively small list of false-positive reports over the last year and a half. I also pride myself at whitelisting only the 'targeted' problem string of a site rather than simply whitelisting the whole thing. If you think this is easy, just think about trying to satisfy 10,000 people with your list .... then times that by 100.

I actually maintain software libraries used by that many people, and I tend to get lots of "bug" reports from users of buggy software invoking the library. I understand the maintenance burden you describe, especially now that you have explained the subtlety required for quality blocking.

rick752 wrote:Note:
A few months ago, a lot of major sites did some major restructuring to the way that they serve info, ads, and media. A lot of the stuff that was reported and changed here was because of that. It has since quieted down here as we're catching up again ... just waiting for the next round of changes.

Now I can see why this kind of thing would matter: you have to care about structure because you have to block selectively even from ad domains.

rick752 wrote:ps: I really don't have any major problems on shopping sites with my filters (but you never know as I can't visit them all). I tell people to disable ABP for shopping because I also don't want their own filtering or another subscription to cause problems .... it's just easier that way.

Good to know.

Post by **rick752** » Mon Apr 02, 2007 1:55 am

I am not offended by your questions at all. I just though I would explain the problems of serving filters to a massive audience vs making one for just yourself or a small group. I was feeling kind of chatty when I read your post.

I do block the domains of known adservers except for the ones that are already captured by other strings in the filter ... no sense blocking the same thing twice. I think I have most major ones blocked either through name, subdomain, or addressing structure.

When I said that you would have to visit every site in the world, it was obviously an exaggeration even though you really DO have to visit a lot of them (sometimes I will scan a thousand random sites a week).

Not all ads come 3rd-party either. There are many ads served by the site itself and ones that use their own ad programs and/or adservers or server scripting, so you really have to dig for all things possible. Some sites share ad server addresses between sister sites .... like ZDNet and CNet. Some newspaper chains also share their own adserver domains. These type are usually not considered ad companies as there is really no home website for that domain .... it is just like they use a new domain as another directory.

As far as the regular sites using places like "doubleclick" and such to run video and other things, the main reason is simply to discourage adblocking on their sites. I think places like doubleclick sell the idea to regular sites on the premise that the site features won't work unless the user whitelists the whole site area so that ALL ads will show .... obviously, anyone who has been using my filters will tell you this is a lie.

Some sites are using technology that uses the server to see if the ads are being received by the user. If not, then it blocks the page from loading. Little do they realize is that all you have to do is whitelist the entire site and then "#element hide" all of the ads. The site will work fine and show no ads .... nothing can tell that you are are hiding ads on your own browser. They'll come to the conclusion eventually that they shouldn't have ever wasted the time to begin with.

Things have become more complicated but can ALWAYS be beaten ....

Ever considered dropping the non-domain filters? [n/a]

Ever considered dropping the non-domain filters? [n/a]

Login • Register