Comments/Ratings for a Single Item
'Do you think these piece values will work smoothly with Joker80 running under Winboard F yet remain true to all three models?' Yes, I think these values will not conflict in anyway with any of the hard-wired value approximates that are used for pruning decisions. At least not to the point where it would lead to any observable effect on playing strength. (Prunings based on the piece values occur only close to the leaves, and engines are usually quite insensitive as to how exactly you prune there.)
I believe the value of a piece should relate to its mobility first and foremost. If one were to end up rating a piece, come up with a value of 1 for the most pathetic potential piece in the game, and then adjust accordingly. How about a pawn that starts out on the second space and only moves backwards one as its move and doesn't capture? That pawn has a value of one. How much more is an Asian chess pawn that moves only one space forward, and doesn't promote worth in contrast? To base it on a normal chess pawn is to not provide a full solution for the variant community. Let me provide another challenge for people here regarding pawns. How much is a pawn that moves only one space forward (not initial 2) but starts on the third row instead of second worth in contrast to a normal chess pawn? How much is it worth alone, and then in a line of pawns that start on the third row?
'Let me provide another challenge for people here regarding pawns. How much is a pawn that moves only one space forward (not initial 2) but starts on the third row instead of second worth in contrast to a normal chess pawn? How much is it worth alone, and then in a line of pawns that start on the third row?' But this is a totally normal FIDE Pawn... It would get a pretty large positional penalty if it was alone (isolated-pawn penalty). In a complete line of pawns on the 3rd rank it would be worth a lot more, as it would not be isolated, and not be backward. All in all it would be fairly similar to having a line of Pawns on second rank, as the bonus for pushing the Pawns forward 1 square is approximately cancelled by not having Pawn control anymore over any of the squares on the 3rd rank.
It seems like a normal FIDE pawn, but by simply shifting all the pawns up one row, the value of all them changes. In other words, their value is dependent upon their proximity to other pawns. In light of this, are pieces worth the same in every configuration of Chess960? This issue is more complicated than it appears. Take Near vs Normal Chess, for example. Which side has an advantage? The Near side moves everything up one row, but drops castling, but has a back row to either drop the king back or mobilize the rooks. And, against this, Near can En Passant the pawns of Normal, but Normal can't do the same to Near. Because of all this, I suggest evaluating entire configuration of pieces, rather than a single piece.
I believe that is correct [that is what programs like Fritz and Chess Master seem to do... evaluating the two configurations and giving a score for the deviation] but also I would say, evaluate the pieces within the given position. The values are relative and change with every move.
The lowly pawn about to queen is a fine example. The Knight that attacks 8 spaces compared to one that attacks 4 is another, as is the 'bad' [blockaded] Bishop.
Another concept is that of brain power. For example, the late Bobby Fischer's Knights would be much more powerful than mine... not in potential, but in reality of games played. Pieces have potential, but the amount of creative power behind them is an important factor.
Originally, I planned two 'internal playtests'. [By this self-invented term I mean playtests of the standard model of a person against a special model that I have compelling reasons to think may be superior by a provable margin.] The first planned test involves the standard CRC model of Muller against a special CRC model with a higher, closer-to-conventional rook value. Upon closer examination, I suspected that the discrepancy was possibly too small to be detected even with very long time controls. So, I announced that this test was cancelled. Notwithstanding, I may change my mind and return to this unsolved mystery if Joker80 demonstrates unusually-high aptitude as a playtesting tool. This might require very deep runs of moves with a completion time of a few weeks to a few months per pair of games to achieve conclusive results. The second planned test involves the standard CRC model of Scharnagl against a special CRC model with a higher, unconventional archbishop value. Scharnagl currently assigns the archbishop with a material value of appr. 77% that of the chancellor in his standard CRC model. Muller currently assigns the archbishop with a material value of greater than 97% that of the chancellor in his standard CRC model. Nalls currently assigns the archbishop with a material value of lesser than 98% that of the chancellor in his standard CRC model. I devised a special CRC model using identical material values for every piece in the standard CRC model by Scharnagl except that it assigns the archbishop with a material value of exactly 95% that of the chancellor (18% or 1.65 pawns higher). [Note that this figure is slightly more moderate than those by Muller & Nalls.] A discrepancy this large should be detectable at short-moderate time controls. This test is now underway. If either of these tests are successful at establishing or implicating a probability that the special models play stronger than the standard models, then revisions to the standard models may occur. At that juncture, we would be ready to begin 'external playtests'. [By this self-invented term I mean playtests of the standard models of different persons against one another.]
Perhaps we need to look back to exactly why we need piece values. Is it to balance different armies, or just because people are curious? Is the objective to turn Chess Variants into a single balanced game, or something else? Maybe need to think of the reason for the discussion, so then you can perhaps find a way to cut the Gordian knot instead of trying to untangle it.
'Because of all this, I suggest evaluating entire configuration of pieces, rather than a single piece.' This is exactly what Chess engines do. But it is a subject that transcends piece values. Material evaluation is supposed to answer the question: 'what combination of pieces would you rather have, without knowing where they stand on the board'. Piece values are an attempt to approximate the material evaluation as a simple sum of the value of the individual pieces, making up the army. It turns out that material evaluation is by far the largest component of the total evaluation of a Chess position. And this material evaluation again can be closely approximated by a sum of piece values. The most well-known exception is the Bishop pair: having two Bishops is worth about half a Pawn more than double the value of a single Bishop. Other non-additive terms are those that make the Bishop and Rook value dependent on the number of Pawns present. To account for such effects some engines (e.g. Rybka) have tabulated the total value of all possible combinations of material (ignoring promotions) in a 'material table'. Such tables can then also account for the material component of the evaluation that gives the deviation from the sum of piece values due to cooperative effects between the various pieces. Useful as this may be, it remains true that piece values are by far the largest contribution to the total evaluation. The only positional terms that can compete with it are passed pawns (a Pawn on 7th rank is worth nearly 2.5 normal Pawns) and King Safety (having a completely exposed King in the middle game, when the opponent still has a Queen or similar super-piece, can be worth nearly a Rook).
Derek Nalls: | This might require very deep runs of moves with a completion time | of a few weeks to a few months per pair of games to achieve | conclusive results. It still escapes me what you hope to prove by playing at such an excessively long Time Control. If the result would be different from playing at a a more 'normal' TC, like one or two hours per game, (which IMO will not be the case), it would only mean that any conclusions you draw on them would be irrelevant for playing Chess at normal TC. Furthermore, playing 2 games will be like flipping a coin. The result, whatever it is, will not prove anything, as it would be different if you would repeat the test. Experiments that do not give a fixed outcome will tell you nothing, unless you conduct enough of them to get a good impression on the probability for each outcome to occur.
I have recently been sufficiently convinced via asymmetrical playtesting (still underway) that the 2 rooks : 1 queen advantage in material values is appr. the same in CRC as in FRC. [I used to think it was higher in CRC.] Consequently, I revised my model (again) and my CRC piece values: universal calculation of piece values http://www.symmetryperfect.com/shots/calc.pdf CRC material values of pieces http://www.symmetryperfect.com/shots/values-capa.pdf FRC material values of pieces http://www.symmetryperfect.com/shots/values-chess.pdf This change was implemented by raising the value of the queen in CRC- not by lowering the value of the rook. revised Joker80 values Nalls standard CRC model P85=268=307=518=818=835=950
'If the result would be different from playing at a a more 'normal' TC, like one or two hours per game, it would only mean that any conclusions you draw on them would be irrelevant for playing Chess at normal TC.' Conclusions drawn from playing at normal time controls are irrelevant compared to extremely-long time controls. It is desirable to see what secrets can be discovered from a rarely viewed vantage of extremely well-played games. Are not you interested at all to analyze move-by-move games played better than almost any pair of human players are capable? You do not seem to understand that I, too, am discontent with the probability of a small number of wins or losses in a row. This is a compensation that reduces the chance that the games were randomly played to the greatest extent attainable and consequently, the winner or loser randomly determined. _____________________________ '... playing 2 games will be like flipping a coin.' Correction- Playing 1 game will be like flipping a coin ... once. Playing 2 games will be like flipping a coin ... twice. The chance of getting the same flip (heads or tails) twice-in-a-row is 1/4. Not impressive but a decent beginning. Add a couple or a few or several consecutive same flips and it departs 'luck' by a huge margin. _______________________________________________________________ 'The result, whatever it is, will not prove anything, as it would be different if you would repeat the test. Experiments that do not give a fixed outcome will tell you nothing, unless you conduct enough of them to get a good impression on the probability for each outcome to occur.' I have wondered why the performance of computer chess programs is unpredictable and varied even under identical controls. Despite their extraordinary complexity, I think of computer hardware, operating systems and applications (such as Joker80) as deterministic. The details of the differences in outcomes do not concern me. In fact, to the extent that your remarks are true, they will support my case if my playtesting is successful that the unlikelihood of achieving the same outcome (i.e., wins or losses for one player) is extreme. I am pleased to report that I estimate it will be possible, over time, to generate enough experiments using Joker80 to have meaning for a high-quality, low-quantity advocate (such as myself) and even a moderate-quality, moderate-quantity advocate (such as Scharnagl). As for a low-quality, high-quantity advocate (such as you), you will always be disappointed as you are impossible to please.
Derek: | Conclusions drawn from playing at normal time controls are | irrelevant compared to extremely-long time controls. First, that would only be true if the conclusions would actually depend on the TC. Which is a totally unproven conjecture on your part, and in fact contrary to any observation made at TCs where such observations can be made with any accuracy (because enough games can be played). This whole thing reminds me of my friend, who always claims that stones fall upward. When I then drop a stone to refute him, he jsut shrugs, and says it proves nothing because the stone is 'not big enough'. Very conveniently for him, the upward falling of stones can only be observed on stones that are too big for anyone to lift... But the main point is of course, if you draw a conclusion that is valid only at a TC that no one is interested in playing, what use would such a conclusion be? | The chance of getting the same flip (heads or tails) twice-in-a-row | is 1/4. Not impressive but a decent beginning. Add a couple or a | few or several consecutive same flips and it departs 'luck' by a | huge margin. Actually the chance for twice the same flip in a row is 1/2. Unless you are biased as to what the outcome of the flip should be (one-sided testing). And indeed, 10 identical flips in a row would be unlikely to occur by luck by a large margin. But that is rather academic, because you won't see 10 identical results in a row between the subtly different models. You will see results like 6-4 or 7-3, which will again be very likely to be a result of luck (as that is exactly what they are the result of, as you would realize after 10,000 games when the result is standing at 4,628-5,372). Calculate the number of games you need to typically get a result for a 53-47 advantage that could not just as easily have been obtained from a 50-50 chance with a little luck. You will be surprised... | I have wondered why the performance of computer chess programs is | unpredictable and varied even under identical controls. Despite | their extraordinary complexity, I think of computer hardware, | operating systems and applications (such as Joker80) as deterministic. In most engines there alwas is some residual indeterminism, due to timing jitter. There are critical decision points, where the engine decides if it should do one more iteration or not (or search one more move vs aborting the iteration). If it would take such decisions purely on internal data, like node count, it would play 100% reproducible. But most engines use the system clock, (to not forfeit on time if the machine is also running other tasks), and experience the timing jitter caused by other processes running, or rotational delays of the hard disk they had been using. In multi-threaded programs this is even worse, as the scheduling of the threads by the OS is unpredictable. Even the position where exactly the program is loaded in physical memory might have an effect. But in Joker the source of indeterminism is much less subtle: it is programmed explicitly. Joker uses the starting time of the game as the seed of a pseudo-random-number generator, and uses the random numbers generated with the latter as a small addition to the evaluation, in order to lift the degeneracy of exactly identical scores, and provide a bias for choosing the move that leads to the widest choice of equivalent positions later. The non-determanism is a boon, rather than a bust, as it allows you to play several games from an identical position, and still do a meaningful sampling of possible games, and of the decisions that lead to their results. If one position would always lead to the same game, with the same result (as would occur if you were playing a simple end-game with the aid of tablebases), it would not tell you anything about the relative strength of the armies. It would only tell you that this particular position was won / drawn. But noting about the millions of other positons with the same material on the board. And the value of the material is by definition an average over all these positions. So with deterministic play, you would be forced to sample the initial positions, rather than using the indeterminism of the engine to create a representative sample of positions before anything is decided. | In fact, to the extent that your remarks are true, they will | support my case if my playtesting is successful that the | unlikelihood of achieving the same outcome (i.e., wins or | losses for one player) is extreme. This sentence is to complicated for me to understand. 'Your case' is that 'the unlikelyhood of achieving the same outcome is extreme'? If the unlikelyhood is extreme, is that the same as that the likelyhood is extreme? Is the 'unlikelyhood to be the same' the same as the 'likelyhood to be different'? What does 'extreme' mean for a likelyhood? Extremely low or extremely high? I wonder if anything is claimed here at all... I think you make a mistake by seeing me as a low-quality advocate. I only advocate minimum quantity to not make the results inconclusive. Unfortunately, that is high, despite my best efforts to make it as low as possible through asymmetric playtesting and playing material imbalances in pairs (e.g. 2 Chancellors agains two Archbisops, rather than one vs one). And that minimum quantity puts limits to the maximum quality that I can afford with my limited means. So it would be more accurate to describe me as a minimum-(significant)-quantity, maximum-(affordable)-quality advocate...
'Actually the chance for twice the same flip in a row is 1/2.' ______________________________________________________ Really? You obviously need a lesson on probability. Let us start with elementary stuff. Mathematical Ideas fifth edition Miller & Heeren 1986 It is an old college textbook from a class I took in the mid-90's. [Yes, I passed the class.] ______________________ It says interesting things such as- 'The relative frequency with which an outcome happens represents its probability.' 'In probability, each repetition of an experiment is a trial. The possible results of each trial are outcomes.' ____________________________________________ An example of a probability experiment is 'tossing a coin'. Each 'toss' (trial of the experiment) has only two equally-possible outcomes, 'heads' or 'tails' ... assuming the condition that the coin is fair (i.e., not loaded). probability = p heads = h tails = t number of tosses = x addition = + involution = ^ [This is a substitute upon a single line for superscript representation of an exponent to the upper right of a base.] probability of heads = p(h) probability of tails = p(t) p(h) is a base. p(t) is a base. x is an exponent. p(h) = 0.5 p(t) = 0.5 _________________ What follows are examples of the chances of getting the same result upon EVERY consecutive toss. 1 time x = 1 p(h) ^ x = 0.5 ^ 1 = 0.5 p(t) ^ x = 0.5 ^ 1 = 0.5 Note: In this case only ... p(h) + p(t) = 1.0 2 times x = 2 p(h) ^ x = 0.5 ^ 2 = 0.25 p(t) ^ x = 0.5 ^ 2 = 0.25 3 times x = 3 p(h) ^ x = 0.5 ^ 3 = 0.125 p(t) ^ x = 0.5 ^ 3 = 0.125 Etc ... ______________________ By a function that is the inverse of successive exponents of base 2, the chance for consecutive tosses to yield the same result rapidly becomes extremely small. When this occurs, there are only two possibilities- 'random good-bad luck' or an unfair advantage-disadvantage exists (i.e., 'the coin is loaded'). The sum of these two possibilities always equals 1. random luck (good or bad) = l unfair (advantage or disadvantage) = u luck (heads) = l(h) luck (tails) = l(t) unfair (heads) = u(h) unfair (tails) = u(t) p(h) ^ x = l(h) p(t) ^ x = l(t) l(h) + u(h) = 1 l(t) + u(t) = 1 Therefore, as the chances of 'random good-bad luck' become extremely low in the example, the chances of an advantage-disadvantage existing for 'one side of the coin' or (if you follow the analogy) 'one side of the gameboard' or 'one player' or 'one set of piece values' become likewise extremely high. Only if it can be proven that an advantage-disadvantage does not exist for one player, then can it be accepted that the extremely unlikely event by 'random good-bad luck' is indeed the case. It is essential to understand that random good luck or random bad luck cannot be consistently relied upon. From this fact alone, firm conclusions can be responsibly drawn with a strong probability of correctness. ____________________________________________________________ 1 time x = 1 p(h) ^ x = 0.5 u(h) = 0.5 p(t) ^ x = 0.5 u(t) = 0.5 2 times x = 2 p(h) ^ x = 0.25 u(h) = 0.75 p(t) ^ x = 0.25 u(t) = 0.75 3 times x = 3 p(h) ^ x = 0.125 u(h) = 0.875 p(t) ^ x = 0.125 u(t) = 0.875 Etc ...
'... in Joker the source of indeterminism is much less subtle: it is programmed explicitly.' This renders Joker80 totally unsuitable for my playtesting purposes. [I am just relieved that you told me this bizarre fact now before I invested large amounts of computer time and effort.] It is critically important that any AI program attempt (to its greatest capability) to pinpoint the single, very best possible move in the time allowed upon every move in the game even if this means that it would often-sometimes repeat an identical move from an identical position. Do not you realize that forcing Joker80 to do otherwise must reduce its playing strength significantly from its maximum potential?
'Do not you realize that forcing Joker80 to do otherwise must reduce its playing strength significantly from its maximum potential?' On the contrary, it makes it stronger. The explanation is that by adding a random value to the evaluation, branches with very many equal end leaves have a much larger probability to have the highest random bonus amongst them than a branch that leads to only a single end-leaf of that same score. The difference can be observed most dramatically when you evaluate all positions as zero. This makes all moves totally equivalent at any search depth. Such a program would always play the first legal move it finds, and would spend the whole game moving its Rook back and forth between a1 and b1, while the opponent is eating all its other pieces. OTOH, a program that evaluates every position as a completely random number starts to play quite reasonable ches, once the search reaches 8-10 ply. Because it is biased to seek out moves that lead to pre-horizon nodes that have the largest number of legal moves, which usually are the positions where the strongest pieces are still in its possession. It is always possible to make the random addition so small that it only decides between moves that would otherwise have exactly equal evaluation. But this is not optimal, as it would then prefer a move (in the root) that could lead (after 10 ply or so) to a position of score 53 (centiPawn), while all other choices later in the PV would lead to -250 or worse, over a move that could lead to 20 different positions (based on later move choices) all evaluating as 52cP. But, as the scores were just approximations based on finite-depth search, two moves later, when it can look ahead further, all the end-leaf scores will change from what they were, because those nodes are now no longer end-leaves. The 53 cP might now be 43cP because deeper search revealed it to disappoint by 10cP. But alas, there is no choice: the alternatives in this branch might have changed a little too, but now all range from -200 to -300. Not much help, whe have to settle for the 43cP... Had it taken the root move that keeps the option open to go to any of the 20 positions of 52cP, it would now see that their scores on deeper search would have been spread out between 32cP and 72cP, and it could now go for the 72cP. In other words, the investment of keeping its options open rather than greedily commit itself to going for an uncertain, only marginally better score, typically pays off. To properly weight the expected pay-back of keeping options that at the current search depth seem inferior, it must have an idea of the typical change of a score from one search depth to the next. And match the size of the random eval addition to that, to make sure that even sligtly (but insignificantly) worse end-leaves still contribute to enhancing the probability that the branch will be chosen. Playing a game in the face of an approximate (and thus noisy) evaluation is all about contingency planning. As to the probability theory, you don't seem to be able to see the math because of the formulae... P(hh) = 0.5*0.5 = 0.25 P(tt) = 0.5*0.5 = 0.25 ______________________+ P(two equal) = 0.5
Harm wrote: ... 'OTOH, a program that evaluates every position as a completely random number starts to play quite reasonable ches, once the search reaches 8-10 ply. Because it is biased to seek out moves that lead to pre-horizon nodes that have the largest number of legal moves, which usually are the positions where the strongest pieces are still in its possession.' ... This is nothing but a probability based heuristic simulating a mobility evaluation component. But having a working positional evaluation, especially when also covering mobility, that randomizing method is not orthogonal to the calculated much more appropriate knowledge. Thus you will overlay a much better evaluation by a disturbing noise generator. Nevertheless this approach might have advantages through the opening, preventing some else working implementations of preinvestigated killer combinations.
Indeed, it is a stochastic way to simulate mobility evaluation. In the presence of other terms it should of course not be made so large that it dominates the total evaluation. Like explicit mobility terms should not dominate the evaluation. But its weight should not be set to zero either: properly weighted mobility might add more than 100 Elo to an engine. Joker has no explicit mobility in its evaluation, and relies entirely on this probabilistic mechanism to simulate it. The disadvantage is that, because of the probabilistic nature, it is not 100% guaranteed to always take the best decision. On rare occasions the single acceptable end leave does draw a higher random bonus than one-hundred slightly better positions in another branch. OTOH it is extremely cheap to implement, while explicit mobility is very expensive. As a result, I might gain an extra ply in search depth. And then it becomes superior to explicit mobility, as it only counts tactically sound moves, rather than just every move. So it is like safe mobility verified by a full Quiescence Search. In my assesment, the probabilistic mobility adds more strength to Joker than changing the Rook value by 50cP would add or subtract. This can be easily verified by play-testing. It is possible to switch this evaluation term off. In fact, you have to switch it on, but WinBoard does this by default. To prevent it from being switched on, one should run WinBoard with the command-line option /firstInitString='new'. (The default setting is 'new\nrandom'. If Joker is running as second engine, you will of course have to use /secondInitString='new'.)
'Actually the chance for twice the same flip in a row is 1/2.' H.G. is correct here. - The probability of two heads in a row is 1/4. - The probability of two tails in a row is 1/4. - The probability of two same flips in a row is the sum of these two outcomes: 1/4 + 1/4 = 1/2. Another way to think about it: With two coin flips, there are 4 equally likely outcomes: HH, HT, TH, TT. In 2 of the 4 (equally likely) outcomes, the same flip result occurs twice in a row.
Well, when you said ... 'Actually the chance for twice the same flip in a row is 1/2.' ... that was vague and misleading. I thought you meant 'heads' twice OR 'tails' twice equals a chance of 1/2 instead of the sum of 'heads' twice AND 'tails' twice equals a chance of 1/2. Since English is a second language to you, of course I will overlook this minor mis-communication and even apologize for implicitly accusing you of incompetence. However, you should expect that you will draw critical reactions from others when you have previously, falsely, explicitly accused them of incompetence in a subject matter.
I would have thought that 'twice the same flip in a row' was pretty unambiguous, especially in combination with the remark about two-sided testing. But let's not quibble about the wording. The point was that for two-sided testing, if you suspect a coin to be loaded, but have no idea if it is loaded to produce tail or heads, thw two flips tell you exactly nothing. They are either the same or different, and on an unbiased coin that would occur with equal probability. So the 'confidence' of any conclusion as to the fairness of the coin drawn from the two flips would be only 50%. I.e. not better than totally random, you might as well have guessed if it was fair or not without flipping it at all. That would also have given you a 50% chance of guessing correct.
The reason you have never been able find any correlation between winning probabilities for one army and time controls [contrary to the experiences of people using other AI programs] in asymmetrical playtests using Joker80 is that you have destructively randomized the algorithm within your program to such an extent that it fails to measurably improve the quality of its moves as a function of time or plies completed. A program with serious problems of this nature may do well in speed chess but at truly long time controls against quality programs that improve as they should with time or plies per move, it cannot consistently win. I have two useful, important pieces of news for you: 1. All of the statistical data you have generated using Joker80 (appr. 20,000+ games) is corrupt. It must all be thrown out and started over from scratch after you repair Joker80. 2. All of your material values for CRC pieces are unreliable since they are based upon and derived from #1 (corrupt statistical data). I hope you can handle constructive advice.
Derek: 'I hope you can handle constructive advice.' It gives me a big laugh, that's for sure. Of course none of what you say is even remotely true. That is what happens if you jump to conclusions regarding complex matters you are not knowledgeable about, without even taking the trouble to verify your ideas. Of course I extensively TESTED how the playing strength of Joker80, (and all available other engines), varied as a function of time control. This was the purpose of several elaborate time-odds tournament I conducted, where various versions of most engines participated that had to play their games in 36, 12, 4, 1:30, 0:40 or 0:24 min, where handicapped engines were meeting non-handicapped ones in a full round robin. (I.e. the handicaps were factors 3, 9, 24, 54 or 90, where only the strongest engines were handicapped upto the very maximum, and the weakest only participated in an unhandicapped version). And of course Joker80 behaves similar to any Shannon-type engine that is reasonably free of bugs: its playing strength measured in Elo monotonically increases in a logarithmic fashion, approximately to the formula rating = 100*ln(time). So Joker80 at 5 min/move crushes Joker80 at 1 sec per move, as you could have easily found out for yourself. So that much for your nonsense about Joker80 failing to improve its move quality with time. For some discussion on one of the tournaments, see: http://www.talkchess.com/forum/viewtopic.php?t=19764&postdays=0&postorder=asc&topic_view=flat&start=34 At that time Fairy-Max still had a hash-table bug that made it hang (and subsequently forfeit on time) that was striking at a fixed rate per second, so that Fairy-Max started to forfeit more and more games at longer TC. Since then the bug has been identified and repaired, and now also Fairy-Max performs progressively better at longer TC. So nice try, but next time better save your breath for telling the surgeon how to do his job before he will perform open heart surgery on you. Because he has no doubt much more to learn from you regarding cardiology than I have in the area of building Chess engines... Things are as they are, and can become known by observation and testing. Believing in misconceptions born out of ignorance is not really helpful. Or, more explicitly: if you think you know how to build better Chess engines than other people, by all means, do so. It will be fun to confront your ideas with reality. In the mean time I will continue to build them as I think best, (and know is best, through extensive testing), so you should have every chance to surpass them. Lacking that, you could at least _use_ the engines of others to check out if your theories of how they behave have any reality value. You don't have to depend on the time-odds tourneys and other tests I conduct. You might not even be aware of them, as the developers of Chess engines hardly ever publish the thousands of games they do for testing if their ideas work in practice.
I am slightly relieved and surprised that Joker80 measurably improves the quality of its moves as a function of time or plies completed over a range of speed chess tournaments. Nonetheless, completing games of CRC (where a long, close, well-played game can require more than 80 moves per player) in 0:24 minutes - 36 minutes does NOT qualify as long or even, moderate time controls. In the case of your longest 36-minute games, with an example total of 160 moves, that allows just 13.5 seconds per move per player. In fact, that is an extremely short time by any serious standards. I consider 10 minutes per move a moderate time that produces results of marginal, unreliable quality and 60-90 minutes per move a long time that produces results of acceptable, reliable quality. Ask Reinhard Scharnagl or ET about the longest time per move they have used testing openings with their programs playing 'Unmentionable Chess'- 24 hours per move! It is noteworthy that you are now resorting to playing dirty by using the 'exclusivist argument' that essentially 'since I am not a computer chess programmer, I cannot possibly know what I am talking about when I dare criticize an important working of your Joker80 program'. What you fail to take into account is that I am a playtester with more experience than you at truly long time controls. If you will not listen to what I am trying to tell you, then why will you not listen to Scharnagl? After all, he is also a computer chess programmer with a lot of knowledge in important subject matters (such as mathematics). You really should not be laughing. This is a serious problem. Your sarcastic reaction does nothing to reassure my trust or confidence that you will competently investigate it, confirm it and fix it. Now, please do not misconstrue my remarks? My intent is not to overstate the problem. I realize Joker80 in its present form is not a totally random 'woodpusher'. It would not be able to win any short time control tournaments if that were the case. In fact, I believe you when you state that you have not experienced any problems with it but ... I think this is strictly because you have not done any truly long time control playtesting with it. You must decide upon and define the best primary function for your Joker80 program: 1. To pinpoint the single, very best move available from any position. [Ideally, repeats could produce an identical move.] OR 2. To produce a different move from any position upon most repeats. [At best, by randomly choosing amongst a short list of the best available moves.] These two objectives are mutually exclusive. It is impossible and self-contradictory for a program to somehow accomplish both. Virtually every AI game developer in the world except you chooses #1 as preferable to #2 by a long shot in terms of the move quality produced on average. If you do not even commit your AI program to TRYING to find the single best move available because you think variety is just a whole lot more interesting and fun, then it will be soft competition at truly long time controls facing other quality AI programs that are frequently-sometimes pinpointing the single, best move available and playing it against you.
Derek Nalls: | Nonetheless, completing games of CRC (where a long, close, | well-played game can require more than 80 moves per player) | in 0:24 minutes - 36 minutes does NOT qualify as long or even, | moderate time controls. In the case of your longest 36-minute games, | with an example total of 160 moves, that allows just 13.5 seconds per | move per player. In fact, that is an extremely short time by any | serious standards. In my experience most games on the average take only 60 moves (perhaps because of the large strength difference of the players). As early moves are more important for the game result as late moves (even the best moves late in the game do not help you if your position is already lost), most engines use 2.5% of the remaining time for their next move (on average, depending on how the iterations end compared to the target time). That would be nearly 54 sec/move at 36 min/game in the decisive phase of the game. That is more than you thought, but admittedly still fast. Note, however, that I also played 60-min games in the General Championship (without time odds), and that Joker80 confirms its lead over the competitors it manifested at faster time controls. But I don't see the point: Joker80's strength increases with time as expected, in the range from 0.4 sec to 36 sec per move, in a regular and theoretically expected way. This is over the entire range where I tested the dependence of the scoring percentage of various material imbalances, which extended to only 15 sec/move, and found it to be independent of TC. So your 'explanation' for the latter phenomenon is just nonsense. The effect you mention is observed NOT to occur, and thus cannot explain anything that was observed to occur. Now if you want to conjecture that this will all miraculously become very different at longer TC, you are welcome to test it and show us convincing results. I am not going to waste my computer time on such a wild and expensive goose chase. Because from the way I know the engines work, I know that they are 'scalable': their performance at 10 ply results from one ply being put in front of 9-ply search trees. And that extra ply will always help. If they have good 9-ply trees, they will have even better 10-ply trees. But you don't have to take my word for it. You have the engine, and if you don't want to believe that at 1 hour per move you will get the same win probability as at 1 sec/move, or that at 1 hour per move it won't beat 10 min//move, just play the games, and you will see for yourself. It would even be appreciated if you publish the games here or on your website. But, needless to say, one or two games won't convince anyone of anything. | 'since I am not a computer chess programmer, I cannot possibly | know what I am talking about when I dare criticize an important | working of your Joker80 program' Well, you certainly make it appear that way. As, despite the elaborate explanation I gave of why programs derive extra strength from this technique, you still draw a conclusion that in practice was already shown to be 100% wrong earlier. And if you think you will run into the problem you imagine at enormously longer TC, well, very simple: don't use Joker80, but use some other engine. You are on your own there, as I am not specifically interested in extremely long TC. There is always a risk in using equipment outside the range of conditions for which it was designed and tested, and that risk is entirely yours. So better tread carefully, and make sure you rule out the percieved dangers by concise testing. | You must decide upon and define the primary function of your | Joker80 program. I do not see the dilemma you sketch. The purpose is to play ON AVERAGE the best possible move. If you do that, you have the best chance to win the game. If I can achieve that through a non-deterministic algorithm better than through a deterministic one, I go for the nondeterministic method. That it also diversifies play, and makes me less sensitive to prepared openings from the opponent, is a win/win situation. Not a compromise. As I explained, it is very easy to switch this feature off. But you should be prepared for significant loss of strength if you do that.
25 comments displayed
Permalink to the exact comments currently displayed.