FOP - Filter Orderer and Preener

General information, announcements and questions about the EasyList subscriptions.

Moderator: EasyList authors

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

FOP - Filter Orderer and Preener

Post by Michael » Fri Sep 02, 2011 4:11 pm

I have recently finished writing FOP (Filter Orderer and Preener), a Python program for the automatic sorting of subscription files and parts of subscription files. It is also able to automatically commit the changes to Mercurial repositories and validate comments to ensure that they match the form used in the EasyList repository messages.

FOP offers several advantages over other programs. It can:
  • detect the filter type in sections and sort automatically based on this information;
  • change sorting patterns when switching sections, which are identified by comments;
  • commit automatically and where appropriate;
  • validate commit comments;
  • warn when unrecognised options are used on filters but still sort them alphabetically and append the domain option on the end;
  • automatically make the correct parts of element hiding rules and regular filters lower case.
It also believe FOP to be more efficient than Erunno's script, as it has virtually identical functionality, but is expressed in half the number of lines and using less processor intensive instructions, and more similar to Adblock Plus due to its use of regular expressions from the extension. Finally and most importantly, FOP has a silly name.

I intend to test this script in the EasyList repository and, with MonztA's permission, the EasyList Germany repository. The only major change that I have noticed is that, as it sorts the domains attached to element hiding filters with no regard for whether or not they have been negated, some of the "~pregecko2" options appear before their attached domain names, while some appear after dependent on the affected domain's name relative to "p".

When I have finished testing the program I will commit it to the repository and release the first tested version under the GPL.

MonztA
EasyList Author
EasyList Author
Posts: 8099
Joined: Thu Jul 26, 2007 4:19 pm
Reputation: 0
Location: Germany

Post by MonztA » Fri Sep 02, 2011 4:18 pm

Michael wrote:I intend to test this script in the EasyList repository and, with MonztA's permission, the EasyList Germany repository.
Sure, I don't mind at all.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Fri Sep 02, 2011 4:34 pm

I have committed successfully (https://hg.adblockplus.org/easylist/rev/2d6b430492ca and https://hg.adblockplus.org/easylistgerm ... 98473bcccf), but have noticed that better detection of general element hiding rules will be required to prevent bbc.co.uk##.bbccom-advert and ##iframe[name^="AdbriteFrame"] from constantly switching places. It shouldn't take too long...

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Fri Sep 02, 2011 4:45 pm

After I made FOP sample the first ten rules in a section to determine the filter type, the order of mixed sections has been settled. I'll just make a couple more alterations and then commit the program to the repository.

MonztA and Khrin, what commands do Windows users have to run in terminals for Mercurial?

MonztA
EasyList Author
EasyList Author
Posts: 8099
Joined: Thu Jul 26, 2007 4:19 pm
Reputation: 0
Location: Germany

Post by MonztA » Fri Sep 02, 2011 5:06 pm

Michael wrote:MonztA and Khrin, what commands do Windows users have to run in terminals for Mercurial?
To do what exactly?

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Fri Sep 02, 2011 5:14 pm

Anything - I think the commands are the same, but the location of the executable may not be. Your version of the update script should provide all the necessary commands, and I would therefore be grateful to know the commands contained within the file.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Fri Sep 02, 2011 7:20 pm

Thanks, I have received the upload script and have found the Windows Mercurial commands are actually identical to those used in Linux, which is surprising but useful to know. I'll add the program to the EasyList combinations repository tomorrow after I've tinkered a little more and you and Khrin have been granted access - there seems to be no point in preventing other authors from using the repository when there are only three of us.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sat Sep 03, 2011 10:14 am

I've added a little more error handling and validation with regard to Mercurial, but need a few filter suggestions to test the system.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sat Sep 03, 2011 12:59 pm

FOP version 1.0 has been added to the repository. Please feel free to test the program.

MonztA
EasyList Author
EasyList Author
Posts: 8099
Joined: Thu Jul 26, 2007 4:19 pm
Reputation: 0
Location: Germany

Post by MonztA » Sat Sep 03, 2011 1:22 pm

"M: Set FOP to remove unnecessary asterisks" https://hg.adblockplus.org/easylist/rev/dec194e5f897

The asterisk for the filter /?addyn|* is necessary. Otherwise the pipe is interpreted by Adblock Plus as the end of the address.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sat Sep 03, 2011 1:43 pm

Interpreted in what manner?

MonztA
EasyList Author
EasyList Author
Posts: 8099
Joined: Thu Jul 26, 2007 4:19 pm
Reputation: 0
Location: Germany

Post by MonztA » Sat Sep 03, 2011 2:02 pm

If the asterisk is gone, ABP interprets the filter as it should only block the item if it ends with "/?addyn" (which is not intended).

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sat Sep 03, 2011 2:13 pm

Sorry, I'd forgotten about that syntax - I thought you were referring to regular expressions. I'm updating FOP now to prevent this from occurring and to correct a couple of regular expressions, but feel free to fix the filter yourself. I can't commit at the moment because my version of the repository contains several changes to FOP.

MonztA
EasyList Author
EasyList Author
Posts: 8099
Joined: Thu Jul 26, 2007 4:19 pm
Reputation: 0
Location: Germany

Post by MonztA » Sat Sep 03, 2011 2:43 pm


Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sat Sep 03, 2011 3:28 pm

Version 1.1 of FOP has been released. This version fixes the aforementioned problem with "|" and can identify tags more accurately.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sat Sep 03, 2011 3:47 pm

Version 1.2 has also been released. The program is now able to detect ":not(" at the exclusion of other similar CSS. This is version 1.2 because 1.1.1 is not a valid number and because I want to imitate Firefox - I intend to release FOP 342.9 by the end of the week.

MonztA
EasyList Author
EasyList Author
Posts: 8099
Joined: Thu Jul 26, 2007 4:19 pm
Reputation: 0
Location: Germany

Post by MonztA » Sat Sep 03, 2011 3:51 pm

:lol:

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sun Sep 04, 2011 6:22 pm

I've now released FOP 1.4 (there weren't many significant changes in 1.3 so it wasn't announced). In this version repository clones can be specified via command line (Linux examples provided):

Code: Select all

./FOP.py
This command sets FOP to sort all subscription files in the current directory of the program and all subdirectories.

Code: Select all

./FOP.py /easylistgermany /easyprivacy
This command runs FOP in two successive locations: the "easylistgermany" and "easyprivacy" folders. In each location FOP sorts the subscription files and, if applicable, offers to commit the changes to the repository.

If you are on Windows and want to run the program in, say, the EasyList Germany repository, you should use the command:

Code: Select all

C:/location/of/FOP.py C:/location/of/the/easylistgermany/repository
This command can be used as part of a script in the repository. I would advise against copying the FOP to the required location, as it will then not be updated by the EasyList repository.

FOP does not accept any other command line options; it automatically detects the correct sorting mechanism for a section of filters and sorts all text files not explicitly blacklisted by the program.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sun Sep 04, 2011 9:33 pm

I have released FOP 1.5, which has been heavily optimised and checks the version of Python being used; FOP requires Python 3, as I probably should have mentioned earlier.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Mon Sep 05, 2011 6:17 am

I've made another update to FOP to make the current version of the program run, on my machine, 20% faster than FOP 1.0, taking only one and a half seconds to reach the commit stage (I did not include this part of the process in my measurements because I had nothing to commit).

P.S. This is post 4000!

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Mon Sep 05, 2011 6:54 am

After further testing, I have realised that FOP 1.5 never got round to checking the Python version number due to the import of a module not present in Python 2 before check. This has been corrected in FOP 1.6. If you're not sure which version of Python you have, just run the program and it will warn if you you do not have Python 3, along with a message stating the version of Python being used.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Mon Sep 05, 2011 9:26 pm

Tonight I've released FOP 1.7, which ensures that filter files are read using the correct encoding, and FOP 1.8, which fixes an error introduced in 1.6: urllib.parse was imported locally, in the start function, rather than globally.

IceDogg
Contributor
Contributor
Posts: 580
Joined: Tue Mar 21, 2006 9:50 pm
Reputation: 0

Post by IceDogg » Tue Sep 06, 2011 2:35 am

Michael wrote:I intend to release FOP 342.9 by the end of the week.
I'm now starting to wonder if you meant this :D You sure are a hard worker, that is without doubt. Can't say how great you work on this is because it's above my head, but I'd be willing to bet it's top notch. Good job!!

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Tue Sep 06, 2011 3:21 pm

I try my best with the program, but there always seems to be some niggle that remains with the program, hence the release of FOP 1.9, which accepts relative directory references correctly, removes duplicate filters from subscription files and only saves a file if changes have been made.

Besides, I bet I wouldn't have to make half as many releases if the original ones were free of bugs (see FOP 1.8, 1.7, 1.6, 1.1)...

arflech
Senior Member
Senior Member
Posts: 76
Joined: Thu Feb 24, 2011 5:49 pm
Reputation: 0

Post by arflech » Fri Sep 09, 2011 8:12 am

well the script does look dandy lol

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Fri Sep 09, 2011 2:53 pm

I have just released FOP 1.91, which should work with Git and SVN in addition to Mercurial. However, I only have a Mercurial repository and therefore have not tested the other types of repositories - feedback and corrections would therefore be much appreciated.
arflech wrote:well the script does look dandy lol
I was rather wondering when someone would mention the name - I intended it to indicate that FOP can concern itself entirely with formatting while the authors do the tasks that require human intervention.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sat Sep 10, 2011 10:11 am

FOP 2.00 has just been released, which will accept completely blank files and has a few minor changes to the comments and messages.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sat Sep 10, 2011 12:08 pm

I have just released FOP 2.1, which corrects the regular expression used for finding options (thanks to MonztA for highlighting this) and does not set the filter text to lower case if the option "match-case" is present (thanks to Famlam for finding this bug).

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Sat Sep 10, 2011 6:11 pm

There have been a lot of updates to FOP today, predominantly optimising the program. The latest release, FOP 2.6, is ~30% faster than FOP 1.0.

Michael
Contributor
Contributor
Posts: 4126
Joined: Sun Aug 23, 2009 8:08 pm
Reputation: 0

Post by Michael » Tue Sep 13, 2011 8:04 pm

I have just released FOP 2.99 (yes, it's been a while since I announced releases, but that's because there's been nothing exciting to announce). This version only stores a section of filters in memory and saves them as soon as the section has finished. This should allow the program to better handle large files and reduce memory use (not that there have been any problems up to this point). However, the more frequent saving means that this version is marginally slower than previous releases, although I still think that, overall, FOP 2.99 is an improvement.

Post Reply

Who is online

Users browsing this forum: No registered users and 5 guests