Check out Atomic Chess, our featured variant for November, 2024.

Enter Your Reply

The Comment You're Replying To
H. G. Muller wrote on Tue, Mar 12 11:29 AM UTC in reply to Kevin Pacey from 12:26 AM:

Systematic errors can never be estimated. There is no limit to how inaccurate a method of measurement can be. The only recourse is to be sure you design the method as good as you can. But what you mention is a statistical error, not a systematic one. Of course the weaker side can win, in any match with a finite number of games, by a fluke. Conventional statistics tells you how large the probability for that is. The probability to be off by 2 standard deviations or more (in either direction) is about 5%. To be off by 3 about 0.27%. It quickly tails off, but to make the standard deviation twice smaller you need 4 times as many games.

So it depends on how much weaker the weak side is. To demonstrate with only a one-in-a-million probablity for a fluke that a Queen is stronger than a Pawn wouldn't require very many games. The 20-0 result that you would almost certainly get would only have a one-in-a-million probability when the Queen was not better, but equal. OTOH, to show that a certain material imbalance provides a 1% better result with 95% 'confidence' (i.e. only 5% chance it is a fluke), you will need 6400 games (40%/sqrt(6400) = 40%/80 = 0.5%, so a 51% outcome is two standard deviations away from equality).

My aim is usually to determine piece values with a standard deviation of about 0.1 Pawn. Since Pawn odds typically causes a 65-70% result, 0.1 Pawn would result in 1.5-2% excess score, and 400-700 games would achieve that (40%/sqrt(400) = 2%). I consider it questionable whether it makes sense to strive for more accurate values, because piece values in themselves are already averages over various material combinations, and the actual material that is present might affect them by more than 0.1 Pawn.

I am not sure what you want to say in your first paragraph. You still argue like there would be an 'absolute truth' in piece values. But there isn't. The only thing that is absolute is the distance to checkmate. Piece values are only a heuristic used by fallible players who cannot calculate far enough ahead to see the checkmate. (Together with other heuristics for judging positional aspects.) If the checkmate is beyond your horizon you go for the material you think is strongest (i.e. gives the best prospects for winning), and hope for the best. If material gain is beyond the horizon you go for the position with the current material that you consider best. Above a certain level of play piece values become meaningless, and positions will be judged by other criteria than what material is present. And below that level they cannot be 'absolute truth', because it is not the ultimate level.

I never claimed that statistics of computer-generated games provide uncontestable proof of piece values. But they provide evidence. If a program that human players rated around 2000 Elo have difficulty beating in orthodox Chess hardly does better with a Chancellor as Queen replacement than as with an Archbishop (say 54%), it seems very unlikely that the Archbishop would be two Pawns less valuable. As that same engine would have very little trouble to convert other uncontested 2-Pawn advantages (such as R vs N, or 2N+P vs R) to a 90% score. It would require pretty strong evidence to the contrary to dismiss that as irrelevant, plus an explanation for why the program systematically blundered that advantage away. But there doesn't seem to be any such evidence at all. That a high-rated player thinks it is different is not evidence, especially if the rating is only based on games where neither A nor C participate. That the average number of moves on an empty board of A is smaller than that of C is not evidence, as it was never proven that piece values only depend on average mobility. (And counter examples on which everyone would agree can easily be given.) That A is a compound of pieces that are known to be weaker than the pieces C is a compound of is no evidence, as it was never proven that the value of a piece is equal to the sum of its compounds. (The Queen is an accepted counter-example.)

As to the draw margin: I usually took that as 1.5 Pawn, but that is very close to 4/3, and my only reason to pick it was that it is somewhere between 1 and 2 Pawns. And advantage of 1 Pawn is often not enough, 2 usually is. But 'decisive' is a relative notion. At lower levels games with a two-Pawn advantage can still be lost. GMs would probably not stand much chance against Stockfish if they were allowed to start with a two-Pawn advantage. At high levels a Pawn advantage was reported by Kaufmann to be equivalent to a 200-Elo rating advantage.


Edit Form
Conduct Guidelines
This is a Chess variants website, not a general forum.
Please limit your comments to Chess variants or the operation of this site.
Keep this website a safe space for Chess variant hobbyists of all stripes.
Because we want people to feel comfortable here no matter what their political or religious beliefs might be, we ask you to avoid discussing politics, religion, or other controversial subjects here. No matter how passionately you feel about any of these subjects, just take it someplace else.
Avoid Inflammatory Comments
If you are feeling anger, keep it to yourself until you calm down. Avoid insulting, blaming, or attacking someone you are angry with. Focus criticisms on ideas rather than people, and understand that criticisms of your ideas are not personal attacks and do not justify an inflammatory response.
Quick Markdown Guide

By default, new comments may be entered as Markdown, simple markup syntax designed to be readable and not look like markup. Comments stored as Markdown will be converted to HTML by Parsedown before displaying them. This follows the Github Flavored Markdown Spec with support for Markdown Extra. For a good overview of Markdown in general, check out the Markdown Guide. Here is a quick comparison of some commonly used Markdown with the rendered result:

Top level header: <H1>

Block quote

Second paragraph in block quote

First Paragraph of response. Italics, bold, and bold italics.

Second Paragraph after blank line. Here is some HTML code mixed in with the Markdown, and here is the same <U>HTML code</U> enclosed by backticks.

Secondary Header: <H2>

  • Unordered list item
  • Second unordered list item
  • New unordered list
    • Nested list item

Third Level header <H3>

  1. An ordered list item.
  2. A second ordered list item with the same number.
  3. A third ordered list item.
Here is some preformatted text.
  This line begins with some indentation.
    This begins with even more indentation.
And this line has no indentation.

Alt text for a graphic image

A definition list
A list of terms, each with one or more definitions following it.
An HTML construct using the tags <DL>, <DT> and <DD>.
A term
Its definition after a colon.
A second definition.
A third definition.
Another term following a blank line
The definition of that term.