EasyList Element Efficiency

Discussion of EasyList subscription policy
Michael
Contributor
Contributor
Posts: 4124
Joined: Sun Aug 23, 2009 8:08 pm

EasyList Element Efficiency

Post by Michael »

I was recently motivated to improve the time it takes to match patterns to items in EasyList after fanboy provided the following comparison of EasyList and his subscription based on his subscription benchmarking tool:

EasyList (24 Apr 2010 17:40 UTC):

Code: Select all

Load Time: 1632ms (Average: 163.2)   
Pattern Total: 17.531 (1.7531)
Fanboy's List (25 Apr 2010):

Code: Select all

Load Time: 2109ms  (Average: 210.9)
Pattern Total: 14.8925 (Average: 1.48925)
As far as I am aware the primary reason that EasyList is slower is due to the element-hiding filters, which are less specific in EasyList. I would therefore like to rewrite the inefficient filters in a more precise manner that is designed with the new style selectors (3.6) in mind. Unless there are any issues I have missed I will make a start immediately.
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12220
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

Maybe its not just elements?
User avatar
Hubird
Adversity Author
Adversity Author
Posts: 1768
Joined: Sun Sep 30, 2007 4:31 am
Location: Australia

Post by Hubird »

Looks like the credit for the Pattern matching benchmark has to go to Wladimir as noted in the page title (have to give credit where it is due :-) )
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12220
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

I've had the benchmarking tool for a while now, slightly modified from Wladimir's copy. The benchmark rates can vary a bit depending on the computer and browser, probably why I ran the tests a few times.
User avatar
Hubird
Adversity Author
Adversity Author
Posts: 1768
Joined: Sun Sep 30, 2007 4:31 am
Location: Australia

Post by Hubird »

Yes, I just had a play around. I noticed the variance too. I found IE was useless for running it as it kept complaining about script errors (taking too long to respond). Minefield complained about non-responsive scripts too but allows you to easily ignore the warning. I ended up using SRWare Iron which seemed to do the trick.
Michael
Contributor
Contributor
Posts: 4124
Joined: Sun Aug 23, 2009 8:08 pm

Post by Michael »

I added the blocking filters and the element filters separately to calculate the following statistics:
Blocking Filters

Code: Select all

Time to load filters: 118ms
Average time for pattern matching: 1.7005ms (Loops: 2000)
Element Filters

Code: Select all

Time to load filters: 169ms
Average time for pattern matching: 7.078ms (Loops: 2000)
I think the results are fairly conclusive about where the issue is...
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12220
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

Hubird wrote:Yes, I just had a play around. I noticed the variance too. I found IE was useless for running it as it kept complaining about script errors (taking too long to respond). Minefield complained about non-responsive scripts too but allows you to easily ignore the warning. I ended up using SRWare Iron which seemed to do the trick.
For testing I'll generally use chrome or opera, seems using Firefox the numbers vary more. Never used IE though
User avatar
Erunno
Emeritus Contributor
Emeritus Contributor
Posts: 1866
Joined: Fri Dec 05, 2008 5:21 pm

Post by Erunno »

Some more numbers:

only general id (###) filters:

Number of filters: ~465
Time to load filters: 27ms
Average time for pattern matching: 1.40525ms (Loops: 4000)

Number of filters: ~931
Time to load filters: 39ms
Average time for pattern matching: 2.11925ms (Loops: 4000)

Time for pattern matching difference: ~1.5x longer

only general class (##.) filters:

Number of filters: ~508
Time to load filters: 28ms
Average time for pattern matching: 1.279ms (Loops: 4000)

Number of filters: ~1016
Time to load filters: 43ms
Average time for pattern matching: 1.57875ms (Loops: 4000)

Time for pattern matching difference: ~1.23x longer

general hiding filters without slow ones (last rows in general hiding sublist):

Time to load filters: 68ms
Average time for pattern matching: 3.45525ms (Loops: 4000)

all general hiding filters:

Time to load filters: 69ms
Average time for pattern matching: 3.459ms (Loops: 4000)



Judging by the numbers the so-called slow general filters (i.e. those not defined via ### or ##.) seem to not contribute much to the overall matching time. It's the sheer size of our general hiding filter list which contributes to the slowdown. Now, the question is, whether this is significant or not. Is the average time just for matching all filters against 1 element or all addresses defined in the addresses1.txt and adresses2.txt?
Zombie Contributor
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12220
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

The benchmark tool is really mainly useful for filters rather than elements fyi
Michael
Contributor
Contributor
Posts: 4124
Joined: Sun Aug 23, 2009 8:08 pm

Post by Michael »

Why would some blocking filters be less efficient than others? I would have assumed that, provided there are no hash conflicts, all filters would take an equal amount of time to process.
User avatar
Hubird
Adversity Author
Adversity Author
Posts: 1768
Joined: Sun Sep 30, 2007 4:31 am
Location: Australia

Post by Hubird »

fanboy wrote:The benchmark tool is really mainly useful for filters rather than elements fyi
I was under the same impression. I remember there was a different test floating around a while back which seemed to focus more on hiding rules (site does not work any more).
User avatar
Erunno
Emeritus Contributor
Emeritus Contributor
Posts: 1866
Joined: Fri Dec 05, 2008 5:21 pm

Post by Erunno »

I'm trying to understand how this benchmark works. As far as I can see addresses1.txt and addresses2.txt provide a list of bogus addresses. This addresses are used to test the provided filters against this address list with a re-implementation of ABP's matching algorithm. But since hiding filters work on HTML documents, how are hiding filters tested at all?

EDIT: Does ABP do any matching for the hiding filters at all or does it delegate this work to Gecko completely?

EDIT 2: Didn't Wladimir write a test suite specifically for hiding filters a few months ago?
Zombie Contributor
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12220
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

User avatar
Erunno
Emeritus Contributor
Emeritus Contributor
Posts: 1866
Joined: Fri Dec 05, 2008 5:21 pm

Post by Erunno »

Wladimir to the rescue!
Zombie Contributor
Wladimir Palant
Adblock Plus Author
Adblock Plus Author
Posts: 444
Joined: Thu Mar 09, 2006 1:01 pm
Location: Cologne, Germany

Post by Wladimir Palant »

Oh my... I recognize this webpage - but it very, very old. Back when I created it it was a "good enough" approximation of what Adblock Plus did. However, it is only an approximation, it wasn't identical to Adblock Plus implementation even then. By now what Adblock Plus really does it quite different. Note that I removed this page very soon after publishing it - the results usually lead to wrong conclusions.

If you set up a development environment for Adblock Plus - there are a bunch of performance tests, particularly chrome://mochikit/content/tests/performance/filter_fromText.html (filter parsing time), chrome://mochikit/content/tests/performance/matcher_init.html (initialization of internal data structures, blocking and exception rules only) and chrome://mochikit/content/tests/performance/matching.html (actual filter matching performance, blocking and exception rules only). These tests measure "live" Adblock Plus performance, they are running the same code that is executed whenever you use Adblock Plus. Both use a fixed list, I used some version of EasyList for that (no point for variable filter list since this is meant to measure Adblock Plus efficiency). I can add a tweak to make the filters configurable however...

Note that there is no proper way to measure element hiding performance - whatever you thought you measured, it's not it.
Wladimir Palant
Adblock Plus Author
Adblock Plus Author
Posts: 444
Joined: Thu Mar 09, 2006 1:01 pm
Location: Cologne, Germany

Post by Wladimir Palant »

Ok, there we go - the performance tests allow changing the test data now. In case you are interested, here the results from my laptop with the current Minefield nightly:

chrome://mochikit/content/tests/performance/filter_fromText.html
EasyList: 107.7 ms +/- 3.6 ms
EasyList without element hiding: 50.1 ms +/- 1.3 ms
Fanboy's List: 127.1 ms +/- 2.7 ms

chrome://mochikit/content/tests/performance/matcher_init.html
EasyList: 35.7 ms +/- 5.2 ms
Fanboy's List: 46.5 ms +/- 0.5 ms

chrome://mochikit/content/tests/performance/matching.html
EasyList: 57.0 ms +/- 2.8 ms
Fanboy's List: 57.5 ms +/- 2.7 ms
Two filters (testtest and @@testtest): 56.9 ms +/- 2.1 ms

The first two tests are mainly determined by the number of filters. The last one - by the number of "slow" filters. So not much to "optimize" here...
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12220
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

chrome://mochikit/content/tests/performance/filter_fromText.html
EasyList: 107.7 ms +/- 3.6 ms
EasyList without element hiding: 50.1 ms +/- 1.3 ms
Fanboy's List: 127.1 ms +/- 2.7 ms
without elements for my list also?
Wladimir Palant
Adblock Plus Author
Adblock Plus Author
Posts: 444
Joined: Thu Mar 09, 2006 1:01 pm
Location: Cologne, Germany

Post by Wladimir Palant »

I only copied the file contents as they are available on the web already - I'm not aware of any variant of your list without element hiding.
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12220
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

Ah right, just thought you were coping/pasting the lists in
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12220
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

The results aren't too bad, considering the size of the lists.. but as always, theres room for improvements :)

Code: Select all

153K 2010-05-17 17:40 easylist.txt
179K 2010-05-17 22:53 fanboy-adblocklist-current-expanded.txt
Wladimir Palant
Adblock Plus Author
Adblock Plus Author
Posts: 444
Joined: Thu Mar 09, 2006 1:01 pm
Location: Cologne, Germany

Post by Wladimir Palant »

Yes, the results aren't bad even though in the real system more things come together. E.g. in the startup sequence with EasyList I see 200ms for parsing patterns.ini - filter_fromText test measures only a part of it. Then there is 100ms in FilterListener.startup() - again, matcher_init test measures only part of it. And there is 300ms for registering element hiding stylesheet - that part isn't measured at all (and cannot be really measured in a meaningful way given that in a running browser it depends on the number of open tabs for example).

You can see the startup sequence in the console if you are running a development build. On Windows you need to run Firefox with a "-console" command line parameter.
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12220
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

how will these speeds matter when they sandboxing for each tab in firefox.next? (if they're still doing that..)
Wladimir Palant
Adblock Plus Author
Adblock Plus Author
Posts: 444
Joined: Thu Mar 09, 2006 1:01 pm
Location: Cologne, Germany

Post by Wladimir Palant »

I don't know when they are going to do this - almost certainly after Firefox 4, that change is too scary to be ready by autumn. We will see when it is there of course but I guess that nothing changes fundamentally as long as they only separate chrome and content rather than create a separate process per tab. When they move on to the latter Adblock Plus will most likely be in for large changes...
User avatar
Erunno
Emeritus Contributor
Emeritus Contributor
Posts: 1866
Joined: Fri Dec 05, 2008 5:21 pm

Post by Erunno »

I've attended the chat accompanying Mike Beltzner's presentation and one of Mozilla's developers confirmed that Electrolysis won't be ready for Firefox 4.

And many thanks for doing the performance tests for us, Wladimir. This is more than I actually expected when I issued my cry for help :D
Zombie Contributor
Michael
Contributor
Contributor
Posts: 4124
Joined: Sun Aug 23, 2009 8:08 pm

Post by Michael »

I also didn't expect this level of detail after my (e-mailed) "cry for help". Thanks for your all your assistance with this issue, Wladimir.
User avatar
fanboy
EasyList Author
EasyList Author
Posts: 12220
Joined: Wed Sep 05, 2007 8:17 pm

Post by fanboy »

Wladimir Palant wrote:I only copied the file contents as they are available on the web already - I'm not aware of any variant of your list without element hiding.
Now available with out elements,

http://www.fanboy.co.nz/adblock/fanboy-adblocklist.txt
Wladimir Palant
Adblock Plus Author
Adblock Plus Author
Posts: 444
Joined: Thu Mar 09, 2006 1:01 pm
Location: Cologne, Germany

Post by Wladimir Palant »

Here we go, filter_fromText performance test again:

EasyList: 108.2 ms +/- 2.8 ms
EasyList without element hiding: 50.9 ms +/- 2.0 ms
Fanboy's List: 134.6 ms +/- 4.3 ms
Fanboy's List without element hiding: 62.8 ms +/- 2.2 ms

Note that these results can deviate from what I measured previously: not only is this a newer Firefox nightly build now, my laptop's performance also varies due to CPU stepping.
Wladimir Palant
Adblock Plus Author
Adblock Plus Author
Posts: 444
Joined: Thu Mar 09, 2006 1:01 pm
Location: Cologne, Germany

Post by Wladimir Palant »

I created a new performance test - page_load_overhead.html. This one takes local copies of five popular pages and measures the time required to load them. Normally it takes around 2 seconds on my computer. Adblock Plus adds around 100-150ms to that, mostly due to "recording" of all requests I guess. As to filters - the difference introduced by either EasyList or Fanboy's List (with nothing being blocked) is too small to be measured, in the area of 10ms or below. This means in particular that element hiding rules used in EasyList and Fanboy's List right now are good enough, usually they won't have any noteworthy performance impact (there might be pathological cases however, these five websites are not representative for the entire web).
User avatar
Erunno
Emeritus Contributor
Emeritus Contributor
Posts: 1866
Joined: Fri Dec 05, 2008 5:21 pm

Post by Erunno »

Are the restrictions still in effect that we should stay well below 30 "slow" general hiding filters, e.g. ##a[href^="http://www.liutilities.com/products/campaigns/adv/"]?
Zombie Contributor
Wladimir Palant
Adblock Plus Author
Adblock Plus Author
Posts: 444
Joined: Thu Mar 09, 2006 1:01 pm
Location: Cologne, Germany

Post by Wladimir Palant »

I didn't set these restrictions because I don't know how the number of such filters influences performance. But given that EasyList performance right now is good this is probably a sane threshold.
Locked