automate flter list generation

General information, announcements and questions about the EasyList subscriptions.
Locked
elypter
Site Member
Site Member
Posts: 23
Joined: Mon Oct 12, 2015 1:28 pm

automate flter list generation

Post by elypter »

a script could crawl the internet and search for known ad content like spam filters scan for known spam mails. if content is found and the url/domain is not in the list yet then a new filter rule would be added. if a script like that existed then an adblocker could have a simple report button which starts a scan on the reported site. so as long as new ad networks or locally hosted ads do not only show content that has not been shown somewhere else yet they can be blocked without the need of manual maintainer interaction.
Khrin
EasyList Author
EasyList Author
Posts: 3562
Joined: Fri Mar 26, 2010 8:50 pm

Post by Khrin »

That wouldn't be too unmanageable? How could a script to distinguish between legit content and advertising? Still you also need someone to compile and check the lists and someone else to check and correct false positives...
User avatar
LanikSJ
Site Owner
Site Owner
Posts: 1806
Joined: Thu Feb 15, 2007 7:44 am
Location: /dev/null

Post by LanikSJ »

Not to mention you'll likely get blocked by many sites for wasting their bandwidth. Unless you can pretend you're Google.

Not very scalable solution. It will work for a short period of time.
"If it ain't broke don't fix it."
elypter
Site Member
Site Member
Posts: 23
Joined: Mon Oct 12, 2015 1:28 pm

Post by elypter »

>How could a script to distinguish between legit content and advertising?
if there is a pink kangaroo selling toilet paper image in the database then the script will be able to idetify it on a different ad network or website.

a manual filter would still be better in terms of speed, element hiding and correctness but manual labor doesnt scale very well. imagine what would happen if ad network domain count exploded like the spam mail count exploded some years in the past. and its not an either or choice. the automatic list could be used as a source for the manual one

>you'll likely get blocked by many sites for wasting their bandwidth.
checking a site once in a while doesnt get you blocked. you dont have to crawl every page on a site like google does.
jimmy perez uribe
Guest

Post by jimmy perez uribe »

Por no hablar de lo más probable conseguir bloqueado por muchos sitios para perder su ancho de banda. A menos que usted puede fingir que eres Google. No es solución muy escalable. Se trabajará por un corto período de tiempo.
User avatar
LanikSJ
Site Owner
Site Owner
Posts: 1806
Joined: Thu Feb 15, 2007 7:44 am
Location: /dev/null

Post by LanikSJ »

elypter wrote:checking a site once in a while doesnt get you blocked. you dont have to crawl every page on a site like google does.
Most small sites won't care but those that have traffic analyzers will know what you're doing. You're welcome to try it and see how fast you won't be able to hit some sites.
"If it ain't broke don't fix it."
elypter
Site Member
Site Member
Posts: 23
Joined: Mon Oct 12, 2015 1:28 pm

Post by elypter »

if curl or wget wont do the trick then there is still selenium. telling a remote controlled browser appart from a regular one wont be that easy.
Locked