[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]
Comments by HGMuller
'I cannot speak for Reinhard Scharnagl at all, though.' This is exactly the problem. 'base value' for Pawns is a very ill-defined concept, as it is the smallest of all piece base values, while the positional terms regarding to Pawns are usually the largest of all positional terms. And the whole issue of pawn-structure evaluation in Joker is so complex that I am not even sure if the average of positional terms (over all pawns and over a typical game) is positive or negative. Pawns get penalties for being doubled, or having no Pawns next or behind them on neigboring files. They get points for advancing, but they get penalties for creating squares that no longer can be defended by any Pawn. My guess is that in general, the positional terms are slightly positive, even for non-passers not involved in King Safety. A statement like 'a Knight is worth exactly 3 Pawns' is only meaningful after exactly specifying which kind of pawn. If the Scharnagl model evaluates all non-passers exactly the same (except, perhaps, edge Pawns), then the question still arises how to most-closely approximate that in Joker80, which doesn't. And simply setting the Joker80 base value equal to the single value of the Scharnagle model is very unlikely to do it. Good differentiation in Pawn evaluation is likely to impact play strength much more than the relative value of Pawns and Pieces, as Pawns are traded for other Pawns (or such trades are declined by pushing the Pawn and locking the chains) much more often than they can be traded for Pieces.
'Do you think these piece values will work smoothly with Joker80 running under Winboard F yet remain true to all three models?' Yes, I think these values will not conflict in anyway with any of the hard-wired value approximates that are used for pruning decisions. At least not to the point where it would lead to any observable effect on playing strength. (Prunings based on the piece values occur only close to the leaves, and engines are usually quite insensitive as to how exactly you prune there.)
'Let me provide another challenge for people here regarding pawns. How much is a pawn that moves only one space forward (not initial 2) but starts on the third row instead of second worth in contrast to a normal chess pawn? How much is it worth alone, and then in a line of pawns that start on the third row?' But this is a totally normal FIDE Pawn... It would get a pretty large positional penalty if it was alone (isolated-pawn penalty). In a complete line of pawns on the 3rd rank it would be worth a lot more, as it would not be isolated, and not be backward. All in all it would be fairly similar to having a line of Pawns on second rank, as the bonus for pushing the Pawns forward 1 square is approximately cancelled by not having Pawn control anymore over any of the squares on the 3rd rank.
'Because of all this, I suggest evaluating entire configuration of pieces, rather than a single piece.' This is exactly what Chess engines do. But it is a subject that transcends piece values. Material evaluation is supposed to answer the question: 'what combination of pieces would you rather have, without knowing where they stand on the board'. Piece values are an attempt to approximate the material evaluation as a simple sum of the value of the individual pieces, making up the army. It turns out that material evaluation is by far the largest component of the total evaluation of a Chess position. And this material evaluation again can be closely approximated by a sum of piece values. The most well-known exception is the Bishop pair: having two Bishops is worth about half a Pawn more than double the value of a single Bishop. Other non-additive terms are those that make the Bishop and Rook value dependent on the number of Pawns present. To account for such effects some engines (e.g. Rybka) have tabulated the total value of all possible combinations of material (ignoring promotions) in a 'material table'. Such tables can then also account for the material component of the evaluation that gives the deviation from the sum of piece values due to cooperative effects between the various pieces. Useful as this may be, it remains true that piece values are by far the largest contribution to the total evaluation. The only positional terms that can compete with it are passed pawns (a Pawn on 7th rank is worth nearly 2.5 normal Pawns) and King Safety (having a completely exposed King in the middle game, when the opponent still has a Queen or similar super-piece, can be worth nearly a Rook).
Derek Nalls: | This might require very deep runs of moves with a completion time | of a few weeks to a few months per pair of games to achieve | conclusive results. It still escapes me what you hope to prove by playing at such an excessively long Time Control. If the result would be different from playing at a a more 'normal' TC, like one or two hours per game, (which IMO will not be the case), it would only mean that any conclusions you draw on them would be irrelevant for playing Chess at normal TC. Furthermore, playing 2 games will be like flipping a coin. The result, whatever it is, will not prove anything, as it would be different if you would repeat the test. Experiments that do not give a fixed outcome will tell you nothing, unless you conduct enough of them to get a good impression on the probability for each outcome to occur.
Derek: | Conclusions drawn from playing at normal time controls are | irrelevant compared to extremely-long time controls. First, that would only be true if the conclusions would actually depend on the TC. Which is a totally unproven conjecture on your part, and in fact contrary to any observation made at TCs where such observations can be made with any accuracy (because enough games can be played). This whole thing reminds me of my friend, who always claims that stones fall upward. When I then drop a stone to refute him, he jsut shrugs, and says it proves nothing because the stone is 'not big enough'. Very conveniently for him, the upward falling of stones can only be observed on stones that are too big for anyone to lift... But the main point is of course, if you draw a conclusion that is valid only at a TC that no one is interested in playing, what use would such a conclusion be? | The chance of getting the same flip (heads or tails) twice-in-a-row | is 1/4. Not impressive but a decent beginning. Add a couple or a | few or several consecutive same flips and it departs 'luck' by a | huge margin. Actually the chance for twice the same flip in a row is 1/2. Unless you are biased as to what the outcome of the flip should be (one-sided testing). And indeed, 10 identical flips in a row would be unlikely to occur by luck by a large margin. But that is rather academic, because you won't see 10 identical results in a row between the subtly different models. You will see results like 6-4 or 7-3, which will again be very likely to be a result of luck (as that is exactly what they are the result of, as you would realize after 10,000 games when the result is standing at 4,628-5,372). Calculate the number of games you need to typically get a result for a 53-47 advantage that could not just as easily have been obtained from a 50-50 chance with a little luck. You will be surprised... | I have wondered why the performance of computer chess programs is | unpredictable and varied even under identical controls. Despite | their extraordinary complexity, I think of computer hardware, | operating systems and applications (such as Joker80) as deterministic. In most engines there alwas is some residual indeterminism, due to timing jitter. There are critical decision points, where the engine decides if it should do one more iteration or not (or search one more move vs aborting the iteration). If it would take such decisions purely on internal data, like node count, it would play 100% reproducible. But most engines use the system clock, (to not forfeit on time if the machine is also running other tasks), and experience the timing jitter caused by other processes running, or rotational delays of the hard disk they had been using. In multi-threaded programs this is even worse, as the scheduling of the threads by the OS is unpredictable. Even the position where exactly the program is loaded in physical memory might have an effect. But in Joker the source of indeterminism is much less subtle: it is programmed explicitly. Joker uses the starting time of the game as the seed of a pseudo-random-number generator, and uses the random numbers generated with the latter as a small addition to the evaluation, in order to lift the degeneracy of exactly identical scores, and provide a bias for choosing the move that leads to the widest choice of equivalent positions later. The non-determanism is a boon, rather than a bust, as it allows you to play several games from an identical position, and still do a meaningful sampling of possible games, and of the decisions that lead to their results. If one position would always lead to the same game, with the same result (as would occur if you were playing a simple end-game with the aid of tablebases), it would not tell you anything about the relative strength of the armies. It would only tell you that this particular position was won / drawn. But noting about the millions of other positons with the same material on the board. And the value of the material is by definition an average over all these positions. So with deterministic play, you would be forced to sample the initial positions, rather than using the indeterminism of the engine to create a representative sample of positions before anything is decided. | In fact, to the extent that your remarks are true, they will | support my case if my playtesting is successful that the | unlikelihood of achieving the same outcome (i.e., wins or | losses for one player) is extreme. This sentence is to complicated for me to understand. 'Your case' is that 'the unlikelyhood of achieving the same outcome is extreme'? If the unlikelyhood is extreme, is that the same as that the likelyhood is extreme? Is the 'unlikelyhood to be the same' the same as the 'likelyhood to be different'? What does 'extreme' mean for a likelyhood? Extremely low or extremely high? I wonder if anything is claimed here at all... I think you make a mistake by seeing me as a low-quality advocate. I only advocate minimum quantity to not make the results inconclusive. Unfortunately, that is high, despite my best efforts to make it as low as possible through asymmetric playtesting and playing material imbalances in pairs (e.g. 2 Chancellors agains two Archbisops, rather than one vs one). And that minimum quantity puts limits to the maximum quality that I can afford with my limited means. So it would be more accurate to describe me as a minimum-(significant)-quantity, maximum-(affordable)-quality advocate...
'Do not you realize that forcing Joker80 to do otherwise must reduce its playing strength significantly from its maximum potential?' On the contrary, it makes it stronger. The explanation is that by adding a random value to the evaluation, branches with very many equal end leaves have a much larger probability to have the highest random bonus amongst them than a branch that leads to only a single end-leaf of that same score. The difference can be observed most dramatically when you evaluate all positions as zero. This makes all moves totally equivalent at any search depth. Such a program would always play the first legal move it finds, and would spend the whole game moving its Rook back and forth between a1 and b1, while the opponent is eating all its other pieces. OTOH, a program that evaluates every position as a completely random number starts to play quite reasonable ches, once the search reaches 8-10 ply. Because it is biased to seek out moves that lead to pre-horizon nodes that have the largest number of legal moves, which usually are the positions where the strongest pieces are still in its possession. It is always possible to make the random addition so small that it only decides between moves that would otherwise have exactly equal evaluation. But this is not optimal, as it would then prefer a move (in the root) that could lead (after 10 ply or so) to a position of score 53 (centiPawn), while all other choices later in the PV would lead to -250 or worse, over a move that could lead to 20 different positions (based on later move choices) all evaluating as 52cP. But, as the scores were just approximations based on finite-depth search, two moves later, when it can look ahead further, all the end-leaf scores will change from what they were, because those nodes are now no longer end-leaves. The 53 cP might now be 43cP because deeper search revealed it to disappoint by 10cP. But alas, there is no choice: the alternatives in this branch might have changed a little too, but now all range from -200 to -300. Not much help, whe have to settle for the 43cP... Had it taken the root move that keeps the option open to go to any of the 20 positions of 52cP, it would now see that their scores on deeper search would have been spread out between 32cP and 72cP, and it could now go for the 72cP. In other words, the investment of keeping its options open rather than greedily commit itself to going for an uncertain, only marginally better score, typically pays off. To properly weight the expected pay-back of keeping options that at the current search depth seem inferior, it must have an idea of the typical change of a score from one search depth to the next. And match the size of the random eval addition to that, to make sure that even sligtly (but insignificantly) worse end-leaves still contribute to enhancing the probability that the branch will be chosen. Playing a game in the face of an approximate (and thus noisy) evaluation is all about contingency planning. As to the probability theory, you don't seem to be able to see the math because of the formulae... P(hh) = 0.5*0.5 = 0.25 P(tt) = 0.5*0.5 = 0.25 ______________________+ P(two equal) = 0.5
Indeed, it is a stochastic way to simulate mobility evaluation. In the presence of other terms it should of course not be made so large that it dominates the total evaluation. Like explicit mobility terms should not dominate the evaluation. But its weight should not be set to zero either: properly weighted mobility might add more than 100 Elo to an engine. Joker has no explicit mobility in its evaluation, and relies entirely on this probabilistic mechanism to simulate it. The disadvantage is that, because of the probabilistic nature, it is not 100% guaranteed to always take the best decision. On rare occasions the single acceptable end leave does draw a higher random bonus than one-hundred slightly better positions in another branch. OTOH it is extremely cheap to implement, while explicit mobility is very expensive. As a result, I might gain an extra ply in search depth. And then it becomes superior to explicit mobility, as it only counts tactically sound moves, rather than just every move. So it is like safe mobility verified by a full Quiescence Search. In my assesment, the probabilistic mobility adds more strength to Joker than changing the Rook value by 50cP would add or subtract. This can be easily verified by play-testing. It is possible to switch this evaluation term off. In fact, you have to switch it on, but WinBoard does this by default. To prevent it from being switched on, one should run WinBoard with the command-line option /firstInitString='new'. (The default setting is 'new\nrandom'. If Joker is running as second engine, you will of course have to use /secondInitString='new'.)
I would have thought that 'twice the same flip in a row' was pretty unambiguous, especially in combination with the remark about two-sided testing. But let's not quibble about the wording. The point was that for two-sided testing, if you suspect a coin to be loaded, but have no idea if it is loaded to produce tail or heads, thw two flips tell you exactly nothing. They are either the same or different, and on an unbiased coin that would occur with equal probability. So the 'confidence' of any conclusion as to the fairness of the coin drawn from the two flips would be only 50%. I.e. not better than totally random, you might as well have guessed if it was fair or not without flipping it at all. That would also have given you a 50% chance of guessing correct.
Derek: 'I hope you can handle constructive advice.' It gives me a big laugh, that's for sure. Of course none of what you say is even remotely true. That is what happens if you jump to conclusions regarding complex matters you are not knowledgeable about, without even taking the trouble to verify your ideas. Of course I extensively TESTED how the playing strength of Joker80, (and all available other engines), varied as a function of time control. This was the purpose of several elaborate time-odds tournament I conducted, where various versions of most engines participated that had to play their games in 36, 12, 4, 1:30, 0:40 or 0:24 min, where handicapped engines were meeting non-handicapped ones in a full round robin. (I.e. the handicaps were factors 3, 9, 24, 54 or 90, where only the strongest engines were handicapped upto the very maximum, and the weakest only participated in an unhandicapped version). And of course Joker80 behaves similar to any Shannon-type engine that is reasonably free of bugs: its playing strength measured in Elo monotonically increases in a logarithmic fashion, approximately to the formula rating = 100*ln(time). So Joker80 at 5 min/move crushes Joker80 at 1 sec per move, as you could have easily found out for yourself. So that much for your nonsense about Joker80 failing to improve its move quality with time. For some discussion on one of the tournaments, see: http://www.talkchess.com/forum/viewtopic.php?t=19764&postdays=0&postorder=asc&topic_view=flat&start=34 At that time Fairy-Max still had a hash-table bug that made it hang (and subsequently forfeit on time) that was striking at a fixed rate per second, so that Fairy-Max started to forfeit more and more games at longer TC. Since then the bug has been identified and repaired, and now also Fairy-Max performs progressively better at longer TC. So nice try, but next time better save your breath for telling the surgeon how to do his job before he will perform open heart surgery on you. Because he has no doubt much more to learn from you regarding cardiology than I have in the area of building Chess engines... Things are as they are, and can become known by observation and testing. Believing in misconceptions born out of ignorance is not really helpful. Or, more explicitly: if you think you know how to build better Chess engines than other people, by all means, do so. It will be fun to confront your ideas with reality. In the mean time I will continue to build them as I think best, (and know is best, through extensive testing), so you should have every chance to surpass them. Lacking that, you could at least _use_ the engines of others to check out if your theories of how they behave have any reality value. You don't have to depend on the time-odds tourneys and other tests I conduct. You might not even be aware of them, as the developers of Chess engines hardly ever publish the thousands of games they do for testing if their ideas work in practice.
Derek Nalls: | Nonetheless, completing games of CRC (where a long, close, | well-played game can require more than 80 moves per player) | in 0:24 minutes - 36 minutes does NOT qualify as long or even, | moderate time controls. In the case of your longest 36-minute games, | with an example total of 160 moves, that allows just 13.5 seconds per | move per player. In fact, that is an extremely short time by any | serious standards. In my experience most games on the average take only 60 moves (perhaps because of the large strength difference of the players). As early moves are more important for the game result as late moves (even the best moves late in the game do not help you if your position is already lost), most engines use 2.5% of the remaining time for their next move (on average, depending on how the iterations end compared to the target time). That would be nearly 54 sec/move at 36 min/game in the decisive phase of the game. That is more than you thought, but admittedly still fast. Note, however, that I also played 60-min games in the General Championship (without time odds), and that Joker80 confirms its lead over the competitors it manifested at faster time controls. But I don't see the point: Joker80's strength increases with time as expected, in the range from 0.4 sec to 36 sec per move, in a regular and theoretically expected way. This is over the entire range where I tested the dependence of the scoring percentage of various material imbalances, which extended to only 15 sec/move, and found it to be independent of TC. So your 'explanation' for the latter phenomenon is just nonsense. The effect you mention is observed NOT to occur, and thus cannot explain anything that was observed to occur. Now if you want to conjecture that this will all miraculously become very different at longer TC, you are welcome to test it and show us convincing results. I am not going to waste my computer time on such a wild and expensive goose chase. Because from the way I know the engines work, I know that they are 'scalable': their performance at 10 ply results from one ply being put in front of 9-ply search trees. And that extra ply will always help. If they have good 9-ply trees, they will have even better 10-ply trees. But you don't have to take my word for it. You have the engine, and if you don't want to believe that at 1 hour per move you will get the same win probability as at 1 sec/move, or that at 1 hour per move it won't beat 10 min//move, just play the games, and you will see for yourself. It would even be appreciated if you publish the games here or on your website. But, needless to say, one or two games won't convince anyone of anything. | 'since I am not a computer chess programmer, I cannot possibly | know what I am talking about when I dare criticize an important | working of your Joker80 program' Well, you certainly make it appear that way. As, despite the elaborate explanation I gave of why programs derive extra strength from this technique, you still draw a conclusion that in practice was already shown to be 100% wrong earlier. And if you think you will run into the problem you imagine at enormously longer TC, well, very simple: don't use Joker80, but use some other engine. You are on your own there, as I am not specifically interested in extremely long TC. There is always a risk in using equipment outside the range of conditions for which it was designed and tested, and that risk is entirely yours. So better tread carefully, and make sure you rule out the percieved dangers by concise testing. | You must decide upon and define the primary function of your | Joker80 program. I do not see the dilemma you sketch. The purpose is to play ON AVERAGE the best possible move. If you do that, you have the best chance to win the game. If I can achieve that through a non-deterministic algorithm better than through a deterministic one, I go for the nondeterministic method. That it also diversifies play, and makes me less sensitive to prepared openings from the opponent, is a win/win situation. Not a compromise. As I explained, it is very easy to switch this feature off. But you should be prepared for significant loss of strength if you do that.
| I just cannot understand how any rational, intelligent man could | believe that introducing chaos (i.e., randomness) is beneficial | (instead of detrimental) to achieving a goal defined in terms of | filtering-out disorder to pinpoint order. It would be very educational then to get yourself acquainted with the current state of the art of Go programming, where Monte-Carlo techniques are the most successful paradigm to date... | When you reduce the power of your algorithm in any way to | filter-out inferior moves, you thereby reduce the average | quality of the moves chosen and consequently, you reduce | the playing strength of your program- esp. at long time controls. Exactly. This is why I _enhance_ the power of my algorithm to filter out inferior moves. As the inferior moves have a smaller probability to draw a large positive random bonus than the better moves. They thus have a lower probability to be chosen, which enhances the average quality of the moves, and thus playing strength. At any time control. It is a pity this suppression of inferior moves is only probabilistic, and some inferior moves by sheer luck can still penetrate the filter. But I know of no deterministic way to achieve the same thing. So something ais better as nothing, and I settle for the inferior moves only getting a lower chance to pass. Even if it is not a zero chance, it is still better than letting them pass unimpededly. | In any event, the addition of the completely-unnecessary module of | code used to create the randomization effect within Joker80 that | you desire irrefutably makes your program larger, more complicated | and slower. Can that be a good thing? Everything you put into a Chess engine makes it larger and slower. Yet, taking almost everything out, only leaves you with a weak engine like micro-Max 1.6. The point is that putting code in also can make the engine smarter, improve its strategic understanding, reduce its branching ratio, etc. So if it is a good thing or not does not depend on if it makes the engine larger, motre complicated, or slower. It depends on if the engine still fits in the available memory, and from there produces better moves in the same time. Which larger, more complicated and slower engines often do. As always, testing is the bottom line. Actually the 'module of code' consists only of only 6 instructions, as I derive the pseudo-random number from the hashKey. But the point you are missing is this: I have theoretical understanding of how Chess engines work, and therefore are able to extrapolate their behavior with high confidence from what I observe under other conditions (i.e. at fast TC). Just like I don't have to travel to the Moon and back to know its distance from the Earth, because I understand geometry and triangulation. So I know that if including a certain evaluation term gives me more accurate scores (and thus more reliable selection of the best move) from 8-ply search trees, I know that this can only give better moves from 18-ply search trees. As the latter is nothing but millions of 8-ply search trees grafted on the tips of a mathematically exact 10-ply minimax propagation of the score from the 8-ply trees towards the root. Anyway, it is not of any interest to me to throw months of valuable CPU time to answer questions I already know the answer to.
Derek: | The moral of the story is that randomization of move selection | reduces the growth in playing strength that normally occurs with | time and plies completed. This is not how it works. For one, you assume that at long TC there would be fewer moves to chose from, and they would be farther apart in score. This is not the case. The average distribution of move scores in the root depends on the position, not on search depth. And in cases were the scores of the best and second-best move are far apart, the random component of the score propagating from the end-leaves to the root is limited to some maximum value, and thus could never cause the second-best move to be preferred over the best move. The mechanism can only have any effect on moves that would score nearly equal (within the range of the maximum addition) in absence of the randomization. For moves that are close enough in score to have an effect on, the random contribution in the end-leaves will be filtered by minimax while trickling down to the root in such a way that it is no longer a homogeneously distributed random contribution to the root score, but on average suppresses scores of moves leading to sub-trees where the opponent had a lot of playable options, and we only few, while on average increasing scores where we have many options, and the opponent only few. And the latter are exactly the moves that, in the long term, will lead you to positions of the highest score.
No engine I know of prunes in the root, in any iteration. They might reduce the depth of poor moves compared to that of the best move by one ply, but they will be searched in every iteration except a very early one (where they were classified as poor) to a larger depth then they were ever searched before. So at any time their score can recover, and if it does, they are re-searched within the same iteration at the un-reduced depth. This is absolutely standard, and also how Joker80 works. Selective search, in engines that do it, is applied only very deep in the tree. Never close to the root.
It is a bit misleading to list Capablanca Random Chess (and its Modern variant) here with a fixed array. It would have been more logical to depict an empty board, with the pieces next to it... For completeness, it should at least have mentioned what the restrictions for setting up the pieces are. Note that most engines able to play Scharnagl's CRC are also capable of playing opening setups he explicitly excludes from being CRC, i.e. with undefended Pawns, with Bishops next to each other, or with Q and A on like colors. They in general consider this all the same variant, 'Capablanca Random Chess', as opening arrays in the program logic are not part of the variant definition, but are simply set by loading a FEN. CRC in some programs is considered a different variant from Capablanca, dure to the different castling rules (like FRC is considered a distinct variant from normal FIDE Chess).
George Duke: | However, the reality is if one is playing many CVs, precisely | Number One, not any of the other 3, is far and away the most valuable | and reliable tool, effectively building on experience. Time is also | factor, and unless Player can adjust quickly, without extensive | playtesting, and make ballpark estimates of values, all is lost on | new enterprise. We recommend just this Method One, increasing | facility at it, for serious CV play, and in turn the designer | needs to try to keep the game somewhat out of reach for Computer. Well, I guess that it depends on what your standards are. If you are satisfied with values that are sometimes off by 2 full Pawns, (as the case of the Archbishop demonstrates to be possible), I guess method #1 will do fine for you. But, as 2 Pawns is almost a totally winning advantage, my standards are a bit higher than that. If I build an engine for a CV, I don't want it to strive for trades that are immediately losing.
Derek: | Could you please give me example lines within the 'winboard.ini' | file that would successfully do so? I need to make sure every | character is correct. Sorry for the late response; I was on holiday for the past two weeks. The best way to do it is probably to make the option dependent on the engine selection. That means you have to write it behind the engine name in the list of pre-selectable engines like: /firstChessProgramNames={... 'C:/engines/joker/joker80.exe 23' /firstInitString='new\n' ... } And something similar for the second engine, using /secondInitString. The path name of the joker80 executable would of course have to be where you installed it on your computer; the argument '23' sets the hash-table size. you could add other arguments, e.g. for setting the piece values, there as well. Note the executable name and all engine argument are enclosed by the first set of quotes (which are double quotes, but these for some reason refuse to print in this forum), and everything after this first syntactical unit on the line is interpreted as WinBoard arguments that should be used with this engine when it gets selected. Note that string arguments are C-style strings, enclosed in double quotes, and making use of escape sequences like '\n' for newline. The defauld value for the init strings is 'new\nrandom\n'.
George Duke: | Has initial array positioning already entered discussion for | value determinations? No, it hasn't, and I don't think it should, as this discussion is about Piece Values, and not about positional play. Piece values are by definition averages over all positions, and thus independent on the placement of pieces on the board. Note furthermore that the heuristic of evaluation is only useful for strategic characteristics of a position, i.e. characteristics that tend to be persistent, rather than volatile. Piece placement can be such a trait, but not always. In particular, in the opening phase, pieces are not locked in the places they start, but can find plenty better places to migrate to, as the center of the board is still complete no-man's land. Therefore, in the opening phase, the concept of 'tempo' becomes important: if you waste too much time, the opponent gets the chance to conquer space, and prevent your pieces that were badly positioned in the array to properly develop. I did some asymmetric playtesting for positional values in normal Chess, swapping Knights and Bishops for one side, or Knights and Rooks. I was not able to detect any systematic advantage the engines might have been deriving from this. In my piece value testing I eliminate positionsal influences by playing from positions that are as symmetric as possible given the material imbalance. And the effect of starting the pieces involved in the imbalance in different places is averaged out by playing from shuffled arrays, so that each piece is tried in many different locations.
Well, never mind. The symmetrical playtesting would not have given any conclusive results with anything less than 2000 games anyway. The asymmetrical playtesting sounds more interesting. I am not completely sure what Smirf bug you are talking about, but in the Battle of the Goths Championship it happened that Smirf played a totally random move when it could give mate in 3 (IIRC) according to both programs (Fairy-Max was the lucky opponent). This move blundered away the Queen with which Smirf was supposed to mate, after which Fairy-Max had no trouble winning with an Archbishop agains some five Pawns. This seems to happen when Smirf has seen the mate, and stored the tree leading to it completely in its hash table. It is then no longer searching, and it reports score and depth zero, playing the stored moves (at least, that was the intention). I have never seen any such behavior when Smirf was reporting non-zero search depth, and in particular, the last non-zero-depth score before such an occurence (a mate score) seemed to be correct. So I don't think there is much chance of an error when you believe the mate announce,emt and call the game. Of course you could also use Joker80 or TJchess10x8, which do not suffer from such problems.
| However, TJChess cannot handle my favorite CRC opening setup, | Embassy Chess, without issuing false 'illegal move' warnings and | stopping the game. Remarkable. I played this opening setup too, in Battle of the Goths, and never noticed any problems with TJchess. It might have been another version, though. If you have somehow saved the game, be sure to send it to Tony, so he can fix the problem.
OK, I see the problem now. I forgot that the Embassy array is a mirrored one, with the King starting on e1, rather than f1. And that to avoid any problems with it in Battle of the Goths, I did not really play Embassy, but the fully equivalent mirrored Embassy. And with that one, none of the engines had problems, of course. Actually it seems that it is not TJchess that is in error here: e1b1 does seem a legal castling in Embassy. It is WinBoard_F which unjustly rejects the move. Most likely because of the FEN reader ignoring specified castling rights for which it does not find a King on f1 and a Rook in the indicated corner. The fact that you don't have this problem with Joker80 is because Joker80 is buggy. (Well, not really; it is merely outside its specs. Joker80 considers all castlings with a non-corner Rook and King not in the f-file as CRC castlings, which are only allowed in variant caparandom, but not in variants capablanca or *UNSPEAKABLE*. And Joker80 does not support caparandom yet.) So the fact that you don't see any problems with Joker80 is because it will never castle when you feed it the Embassy setup, so that WinBoard doesn't get the chance to reject the castling as illegal. And if the opponent castles, WinBoard would reject it as illegal, and not pass it on to Joker80. I guess the fundamental fix will have to wait until I implement variant caparandom in WinBoard; I think that both WinBoard and Joker80 are correct in identifying the Embassy opening position as not belonging to Capablanca Chess, but needing the CRC extension of castling. (Even if it is only a limited extension, as the Rooks are still in thre corner.) And after I fix it in WinBoard, I still would have to equip Joker80 with CRC capability before you could use it to play the Embassy setup. It is not very high on my list of priorities, though, as I see little reason to play Embassy rather than mirrored Embassy.
I am still contemplating how to generalize the castling in Joker80. There are two issues there: how to commnicate the move from and to the GUI, and how to indicate the existence of the rights. Currently WinBoard protocol has two mechanisms to set up a position: by loading the engine with a FEN, or (obsolete) through an edit command to enter a list of pieces+squares combinations. The latter mode does not support transmission of castling rights at all, and is only a legacy for backward compatibility with old engines. So for loading position, we only have to provide a mechanism for indicating castling rights in a FEN. The FRC-type notation only indicates the position of the Rook. The King does not need to be indicated in games where there is only one King, and the positioning of Rook w.r.t. King implies where both will end up. This means we would have to devise some other notation for cases where the King ends elsewhere. I am not sure if it would make sense to generalize so much as to allow castlings where the Rook does not end up next to, and on the other side of the King. There is of course no limit to the craziness of moves that could be called a castling, but one would have to put a limit somewhere, to not fall victim to the 'maximum-flexibility, minimum-usefulness' principle. I would probably implement it like this: in the castling-rights field of the FEN, the letter indicating the file of the Rook that can castle (which does not necessarily have to be an orthodox Rook, as the FEN makes it obvious what piece is standing there) can be followed by a digit, indicating the number of squares the King ends up away from the corner. The final position of the Rook would be implied by this. Example: normal King-side castling rights could be indicated by H1. The 1 would be the default (on an 8x8 board), and could be omitted for upward compatibility with Shredder-FEN. In Capablanca Chess the opening would have castling rights A2J1a2j1, equivalent to AJaj (or KQkq). Symmetric castling rights like in Janus Chess would be indicated as A1J1a1j1, or A1Ja1j when deleting the redundant defaults. Multiple castling rights to the same side could exist next to each other: A2A1J2J1a2a1j2j1 would allow short as well as long castling in both directions. For transmitting the castling moves, one could use King captures own Rook. In games where the same Rook could be used for castlings with multiple King destinations, one could give the King step to its final destination in stead. If this could also be a normal King move, one could append an r as 5th character to identify it as a castling, using the syntax that would otherwise be used for promotions. In PGN one could use similar strategies to indicate non-orthodox castlings, and use suffix =R on a King move to specify castling. I think this covers most cases encountered in practice. Problems only occur only if there would be multiple castlings with the same Rook, and at the same time castlings with a Rook on the left would have the same King destination as those with a Rook on the right. Because the move notation cannot indicate at the same time which Rook to use and specify where the King should go. But this seems to outlandish to worry about. To cover cases where K and R do not end up next to each other, we could put a second digit in the FEN castling-rights specifier for the final position of the Rook wrt the corner. (I.e. normal king-side castling = h12.) This obviously could lead to problems on very wide boards, that require multiple digits to specify distance to the corner. So perhaps it is better to separate King and Rook destination by a period (h1.2). Indicating the move would be a problem, as two destinations might have to be specified to unambiguously identify the move (e.g. if all castlings are allowed weher the King steps any number of squares >=2 towards a Rook, and then the Rook can go to any square the King passed over.) One could just specify King and Rook final squares (i.e. O-O = g1f1), but in FRC there is no guarantee that this cannot be a normal move. In which case the 'r' could again be used as 5th character, to indicate castling. In PGN we could reserve a character used in stead of the piece indicator for castlings, say 'O'. Conclusion: it is difficult to design a notation that would be general and universal; different games seem to need different ways to specify the moves and rights.
Well, one has to think ahead a little bit to keep the road to future extensions open, and not paint oneself into a corner. This is why I tackle a fairly large number of cases at once. I don't see the unicity of the FEN strings as a serious problem; if the logic behind the various systems would allow a certain castling to be described in multiple ways, one can supply an additional rule to specify which method should be used preferentially. e.g. if K or H could be used to unambiguously specify king-side castling, one should use K. In the FEN reader I would not even pay attention to that, and have it understand both, as this is usually easier. An important issue is how much effort one should put into upkeeping a unified approach, in which both game state and played variant are unambiguously specified by the FEN. One might wonder if it is sensible to require, say, that a position from Janus Chess and a position from Capablanca Chess should be considered as different positions from the same variant, 'fullchess'. This puts a lot of extra burdon on the FEN: For indicating game state, the castling rights have to indicate only which pieces moved. Wanting the FEN to specify the castling method, or other aspects of the rules (e.g. if Pawns can promote to Chancellors, or not), might just be asking for trouble. So perhaps I was overdoing it. It might be more useful to consider variants like Janus or Grotesque as distinct from Capablanca. KQkq could then be used to indicate castling rights in all three cases. Games with more than 2 Rooks could use the Shredder-FEN system without any problem, as long as there is only one King (so that all rights disappear once this King moves). Only in games with multiple Kings AND multiple Rooks there would be a problem. This only leaves move notation. In particular in variants where a castling to a particular side can be performed in more than one way, like in Grotesque. A very general way to solve this in PGN would be to provide a mechanism to specify moves that displace more than one piece, by joining the moves with an &. So an alternative to write h-side castling in Grotesque could be Ke1-i1&Rj1-h1 (or in short, Ki1&Rh1). In WinBoard protocol, the moves between engine and GUI are not transmitted in SAN, but simply as FROM and TO square appended to each other, with an optional 5th character to indicate promotion piece (e.g. e7e8q). Perhaps the best system there would be to encode variable castlings by using k or q as the 'promotion' character, to indicate if the K-side or Q-side Rook is to be used, and make the squares indicate the to-square of King and Rook, respectively. These notations would always be recognizable as not indicating promotions, as both the mentioned squares would be on the same rank.
'If the effort isn't too big' is a big if. Normal chess, Capablanca, FRC and CRC are similar enough not to cause too much trouble. Although I consider it already a nasty trait that some of the rules have to be implied by the board size, such as to which pieces a Pawn may promote. If a board width of 10 is taken to imply Chancellor and Archbishop are allowed, the problems with Janus Chess or Chancellor Chess I would consider already pretty bad. To unify those with Capablanca/CRC would require different letters for their Pawns. In Janus Chess you would have to indicate the deviating castling mode amongst the rights. In the O-i-h system you would always be able to deduce if the K-side or Q-side rook is to be used by the ordering of the King and Rook destination? I guess we could indeed consider it a defining property of a castling that it swaps the order of King and Rook. I am not aware of any exceptions to this, even in FRC/CRC the King is required to be between the two Rooks. So I guess your system is acceptable for in the PGN, with the additional preference rule that if there is only one castling possible to the given side, it would be written as O-O or O-O-O. The way the move is transmitted between engine and GUI in WB protocol is a matter specific to the WinBoard GUI. And WinBoard does generate the list of allowed moves itself, there is no way in WB protocol to request it from the engine. As this type of castling with multiple King and Rook destinations is about as crazy as they get, anticipating this format would probably enough to cover everything. Even the normal castling requires the GUI to recognize castlings, and know which Rook to move and where. (This caused problems when I had Fairy-Max play Cylinder Chess, as a King crossing the side edge was considered a castling, and led to the displacement of a second piece on the display board!) In fact, with the assumption that the relative orientation of King and Rook destination squares implies which Rook has to be used (and even if there are several Rooks on that side, only the one nearest to the King could be involved in castling), there is no need to convey any information in the 5th character other than that it is a castling. So an O here would be quite convenient, as promotion pieces have to be lower case in WB protocol. For a really dumb interface (like my battle-of-the-Goth javascript viewer) it is necessary to fully specify from- and to-square of each piece that is moved separately. So there I transmit O-O as e1g1h1f1 and e4xf5 e.p. as e4f5e4f4.
25 comments displayed
Permalink to the exact comments currently displayed.