H. G. Muller wrote on Thu, Sep 20, 2012 01:10 PM UTC:
> (Commoner) One situation where your empirical measurement differs from common wisdom, and you think it is probably due to a failure of your engine;
Not 'probably'. I just cannot exclude it. But I don't really expect it to make much difference if the engine would take full account of mating potential, and that it is probably the common wisdom that got it wrong. But it obviously has to be checked.
Anyway, first results of the Alibaba trial:
I played a pair of AD (replacing the Knights in FIDE) against 2N - P and against N + P. Both for about 200 games, which makes the statistical error slightly under 3%. With Pawn-odds producing an excess score of 15% (i.e. 65-35), this translates to an error of 0.2 Pawn units (1 standard deviation).
2N - P proved superior, scoring 15.5% in excess of 50%. 2 + P proved inferior, with a score deficit of 6%. Translated to Pawns that would be 1.0 and 0.4 Pawns, respectively, suggesting that the difference between 2N - P and N + P, which is N - 2P, would be 1.4 P. So that would make N = 3.4P. That fits very well with the Kaufman value N=325, especially taking into account that the statistical error of the measurements combined is 28. It would be even better when recognizing a Pawn on f2/f7 is not a particularly strong Pawn, and correct its value to 95.
It seems that an AD is almost exactly worth 1 Pawn less than a Knight (230 on the Kaufman scale). But as there no doubt will a Pair bonus, and this was measured on pairs of AD, a single AD will be worth slightly less, the second AD slightly more. The trials were performed with internal value AD=270, however, and are thus not entirely self-consistent. I will repeat them with AD=240 and P=90 now.