Derek Nalls wrote on Mon, May 12, 2008 07:06 PM UTC:
'You hardly have the possibility of trading it before there are open
files. So it stands to reason that you might as well use the higher value
during the entire game.'
Well, I understand and accept your reasons for leaving your lower rook
value in CRC as is. It is interesting that you thoroughly understand and
accept the reasons of others for using a higher rook value in CRC as
well. Ultimately, is not the higher rook value in CRC more practical and useful to the game by your own logic?
_____________________________
'... if we both play a Q-2R match from the opening, it is a serious
problem if we don't get the same result. But you have played only 2
games. Statistically, 2 games mean NOTHING.'
I never falsely claimed or implied that only 2 games at 10 minutes per
move mean everything or even mean a great deal (to satisfy probability
overwhelmingly). However, they mean significantly more than nothing.
I cannot accept your opinion, based upon a purely statistical viewpoint,
since it is at the exclusion another applicable mathematical viewpoint.
They definitely mean something ... although exactly how much is not
easily known or quantified (measured) mathematically.
__________________________________________________
'I don't even look at results before I have at least 100 games, because
before they are about as likely to be the reverse from what they will
eventually be, as not.'
Statistically, when dealing with speed chess games populated
exclusively with virtually random moves ... YES, I can understand and
agree with you requiring a minimum of 100 games. However, what you
are doing is at the opposite extreme from what I am doing via my
playtesting method.
Surely you would agree that IF I conducted only 2 games with perfect
play for both players that those results would mean EVERYTHING.
Unfortunately, with state-of-the-art computer hardware and chess variant
programs (such as SMIRF), this is currently impossible and will remain
impossible for centuries-millennia. Nonetheless, games played at 100
minutes per move (for example) have a much greater probability of
correctly determining which player has a definite, significant advantage
than games played at 10 seconds per move (for example).
Even though these 'deep games' play of nowhere near 600 times better
quality than these 'shallow games' as one might naively expect
(due to a non-linear correlation), they are far from random events
(to which statistical methods would then be fully applicable).
Instead, they occupy a middleground between perfect play games and
totally random games. [In my studied opinion, the example
'middleground games' are more similar to and closer to perfect play
games than totally random games.] To date, much is unknown to
combinatorial game theory about the nature of these 'middleground
games'.
Remember the analogy to coin flips that I gave you? Well, in fact,
the playtest games I usually run go far above and beyond such random
events in their probable significance per event.
If the SMIRF program running at 90 minutes per move casted all of its
moves randomly and without any intelligence at all (as a perfect
woodpusher), only then would my 'coin flip' analogy be fully applicable.
Therefore, when I estimate that it would require 6 games (for example)
for me to determine, IF a player with a given set of piece values loses
EVERY game, that there is only a 63/64 chance that the result is
meaningful (instead of random bad luck), I am being conservative to the
extreme. The true figure is almost surely higher than a 63/64 chance.
By the way, if you doubt that SMIRF's level of play is intelligent and
non-random, then play a CRC variant of your choice against it at 90
minutes per move. After you lose repeatedly, you may not be able to
credit yourself with being intelligent either (although you should) ...
if you insist upon holding an impractically high standard to define the
word.
______
'If you find a discrepancy, it is enormously more likely that the result
of your 2-game match is off from its true win probability.'
For a 2-game match ... I agree. However, this may not be true for a
4-game, 6-game or 8-game match and surely is not true to the extremes
you imagine. Everything is critically dependant upon the specifications
of the match. The number of games played (of course), the playing
strength or quality of the program used, the speed of the computer and
the time or ply depth per move are the most important factors.
_________________________________________________________
'Play 100 games, and the error in the observed score is reasonable
certain (68% of the cases) to be below 4.5% ~1/3 Pawn, so 16 cP per Rook. Only then you can see with reasonable confidence if your observations differ from mine.'
It would require est. 20 years for me to generate 100 games with the
quality (and time controls) I am accustomed to and somewhat satisfied
with. Unfortunately, it is not that important to me just to get you to
pay attention to the results for the benefit of only your piece values
model. As a practical concern to you, everyone else who is working to
refine quality piece values models in FRC and CRC will have likely
surpassed your achievements by then IF you refuse to learn anything from
the results of others who use different yet valid and meaningful methods
for playtesting and mathematical analysis than you.