Single Comment

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Sat, May 24, 2008 09:49 AM UTC:

Derek:
| Conclusions drawn from playing at normal time controls are
| irrelevant compared to extremely-long time controls.

First, that would only be true if the conclusions would actually depend on
the TC. Which is a totally unproven conjecture on your part, and in fact
contrary to any observation made at TCs where such observations can be
made with any accuracy (because enough games can be played). This whole thing reminds me of my friend, who always claims that stones fall upward. When I then drop a stone to refute him, he jsut shrugs, and says it proves nothing because the stone is 'not big enough'. Very conveniently for him, the upward falling of stones can only be observed on stones that are too big for anyone to lift...
But the main point is of course, if you draw a conclusion that is valid
only at a TC that no one is interested in playing, what use would such a
conclusion be?

| The chance of getting the same flip (heads or tails) twice-in-a-row
| is 1/4. Not impressive but a decent beginning. Add a couple or a
| few or several consecutive same flips and it departs 'luck' by a
| huge margin.

Actually the chance for twice the same flip in a row is 1/2. Unless you
are biased as to what the outcome of the flip should be (one-sided
testing). And indeed, 10 identical flips in a row would be unlikely to
occur by luck by a large margin. But that is rather academic, because you
won't see 10 identical results in a row between the subtly different
models. You will see results like 6-4 or 7-3, which will again be very
likely to be a result of luck (as that is exactly what they are the result
of, as you would realize after 10,000 games when the result is standing at
4,628-5,372).

Calculate the number of games you need to typically get a result for a
53-47 advantage that could not just as easily have been obtained from a
50-50 chance with a little luck. You will be surprised...

| I have wondered why the performance of computer chess programs is
| unpredictable and varied even under identical controls. Despite
| their extraordinary complexity, I think of computer hardware,
| operating systems and applications (such as Joker80) as deterministic.

In most engines there alwas is some residual indeterminism, due to timing
jitter. There are critical decision points, where the engine decides if it
should do one more iteration or not (or search one more move vs aborting
the iteration). If it would take such decisions purely on internal data,
like node count, it would play 100% reproducible. But most engines use the
system clock, (to not forfeit on time if the machine is also running other
tasks), and experience the timing jitter caused by other processes
running, or rotational delays of the hard disk they had been using. In
multi-threaded programs this is even worse, as the scheduling of the
threads by the OS is unpredictable. Even the position where exactly the
program is loaded in physical memory might have an effect.

But in Joker the source of indeterminism is much less subtle: it is
programmed explicitly. Joker uses the starting time of the game as the
seed of a pseudo-random-number generator, and uses the random numbers
generated with the latter as a small addition to the evaluation, in order
to lift the degeneracy of exactly identical scores, and provide a bias for
choosing the move that leads to the widest choice of equivalent positions
later.

The non-determanism is a boon, rather than a bust, as it allows you to
play several games from an identical position, and still do a meaningful
sampling of possible games, and of the decisions that lead to their
results. If one position would always lead to the same game, with the same
result (as would occur if you were playing a simple end-game with the aid
of tablebases), it would not tell you anything about the relative strength
of the armies. It would only tell you that this particular position was won
/ drawn. But noting about the millions of other positons with the same
material on the board. And the value of the material is by definition an
average over all these positions. So with deterministic play, you would be
forced to sample the initial positions, rather than using the indeterminism
of the engine to create a representative sample of positions before
anything is decided.

| In fact, to the extent that your remarks are true, they will
| support my case if my playtesting is successful that the
| unlikelihood of achieving the same outcome (i.e., wins or
| losses for one player) is extreme.
This sentence is to complicated for me to understand. 'Your case' is
that 'the unlikelyhood of achieving the same outcome is extreme'? If the
unlikelyhood is extreme, is that the same as that the likelyhood is
extreme? Is the 'unlikelyhood to be the same' the same as the
'likelyhood to be different'? What does 'extreme' mean for a
likelyhood? Extremely low or extremely high? I wonder if anything is
claimed here at all...

I think you make a mistake by seeing me as a low-quality advocate. I only
advocate minimum quantity to not make the results inconclusive.
Unfortunately, that is high, despite my best efforts to make it as low as
possible through asymmetric playtesting and playing material imbalances in
pairs (e.g. 2 Chancellors agains two Archbisops, rather than one vs one).
And that minimum quantity puts limits to the maximum quality that I can
afford with my limited means. So it would be more accurate to describe me
as a minimum-(significant)-quantity, maximum-(affordable)-quality
advocate...