That describes the 'policy head' of the NN, which is used to bias the move choice (which is otherwise based on the number of visits of the move and that of the total for the node, and the move scores) when walking the tree from root to leaf for finding the next leaf to expand. But my understanding was that when the leaf is chosen and expanded, all daughters should receive a score from the 'evaluation head' of the NN in the position after the move, rather than just inheriting their policy weight from the position before the move. These scores are then back-propagated towards the root, by including them in the average score of all nodes in the path to the expanded leaf.
That describes the 'policy head' of the NN, which is used to bias the move choice (which is otherwise based on the number of visits of the move and that of the total for the node, and the move scores) when walking the tree from root to leaf for finding the next leaf to expand. But my understanding was that when the leaf is chosen and expanded, all daughters should receive a score from the 'evaluation head' of the NN in the position after the move, rather than just inheriting their policy weight from the position before the move. These scores are then back-propagated towards the root, by including them in the average score of all nodes in the path to the expanded leaf.