Single Comment

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Sun, May 25, 2008 09:14 AM UTC:

'Do not you realize that forcing Joker80 to do otherwise must reduce its
playing strength significantly from its maximum potential?'

On the contrary, it makes it stronger. The explanation is that by adding a
random value to the evaluation, branches with very many equal end leaves
have a much larger probability to have the highest random bonus amongst
them than a branch that leads to only a single end-leaf of that same
score.

The difference can be observed most dramatically when you evaluate all
positions as zero. This makes all moves totally equivalent at any search
depth. Such a program would always play the first legal move it finds, and
would spend the whole game moving its Rook back and forth between a1 and
b1, while the opponent is eating all its other pieces. OTOH, a program
that evaluates every position as a completely random number starts to play
quite reasonable ches, once the search reaches 8-10 ply. Because it is
biased to seek out moves that lead to pre-horizon nodes that have the
largest number of legal moves, which usually are the positions where the
strongest pieces are still in its possession.

It is always possible to make the random addition so small that it only
decides between moves that would otherwise have exactly equal evaluation.
But this is not optimal, as it would then prefer a move (in the root) that
could lead (after 10 ply or so) to a position of score 53 (centiPawn),
while all other choices later in the PV would lead to -250 or worse, over
a move that could lead to 20 different positions (based on later move
choices) all evaluating as 52cP. But, as the scores were just
approximations based on finite-depth search, two moves later, when it can
look ahead further, all the end-leaf scores will change from what they
were, because those nodes are now no longer end-leaves. The 53 cP might
now be 43cP because deeper search revealed it to disappoint by 10cP. But
alas, there is no choice: the alternatives in this branch might have
changed a little too, but now all range from -200 to -300. Not much help,
whe have to settle for the 43cP... 

Had it taken the root move that keeps the option open to go to any of the
20 positions of 52cP, it would now see that their scores on deeper search
would have been spread out between 32cP and 72cP, and it could now go for
the 72cP. In other words, the investment of keeping its options open
rather than greedily commit itself to going for an uncertain, only
marginally better score, typically pays off. 

To properly weight the expected pay-back of keeping options that at the
current search depth seem inferior, it must have an idea of the typical
change of a score from one search depth to the next. And match the size of
the random eval addition to that, to make sure that even sligtly (but
insignificantly) worse end-leaves still contribute to enhancing the
probability that the branch will be chosen. Playing a game in the face of
an approximate (and thus noisy) evaluation is all about contingency
planning.

As to the probability theory, you don't seem to be able to see the math
because of the formulae...

P(hh) = 0.5*0.5 = 0.25
P(tt) = 0.5*0.5 = 0.25
______________________+
P(two equal)    = 0.5