... though I still don't fully trust computer analysis to give reliable piece values for a given board size (e.g. may well depend at least to some extent on what else is on the board, and where exactly it's placed, in the setup used by a given computer study).
This is why serious computer studies always use a number of different mixes of opponent pieces, and average over many shuffles of those as initial setup. E.g. if you want to compare the value of Queen, Archbishop and Chancellor, you don't just play these against each other (e.g. in a FIDE setup whetre one player starts with A or C instead of a Q), but also against, say, R+B, R+N, R+N+P, 2B+N, B+2N (deleting these for the player that has Q, C or A, and deleting Q of the other player), to see which of the super-pieces does better, and by how much.
To test an Alibaba (which I apparently did once), you would replace 2N, N+B, 2B or R for two Alibabas (and give the opponent Pawn odds to get closer to equality), and just a single N or B for one Alibaba.
How does your estimate take account of the severe color binding of the Alibaba? Because of that it seems a very weak piece to me. It can for instance not act against half the Pawns.
Ancient Shatranj theory indeed values different Pawns differently. In Shatranj an Alfil is considered slightly better than an average Pawn. But you should keep in mind that a FIDE Pawn is worth significantly more than a Shatranj Pawn, because it has a game-deciding promotion, while in Shatranj an extra Ferz is often not helpful at all. And I suspect a lot of the value of the Alfil is that, even if tactically worthless, it acts as insurance against loss by baring when only weak pieces are left.
This is why serious computer studies always use a number of different mixes of opponent pieces, and average over many shuffles of those as initial setup. E.g. if you want to compare the value of Queen, Archbishop and Chancellor, you don't just play these against each other (e.g. in a FIDE setup whetre one player starts with A or C instead of a Q), but also against, say, R+B, R+N, R+N+P, 2B+N, B+2N (deleting these for the player that has Q, C or A, and deleting Q of the other player), to see which of the super-pieces does better, and by how much.
To test an Alibaba (which I apparently did once), you would replace 2N, N+B, 2B or R for two Alibabas (and give the opponent Pawn odds to get closer to equality), and just a single N or B for one Alibaba.
How does your estimate take account of the severe color binding of the Alibaba? Because of that it seems a very weak piece to me. It can for instance not act against half the Pawns.
Ancient Shatranj theory indeed values different Pawns differently. In Shatranj an Alfil is considered slightly better than an average Pawn. But you should keep in mind that a FIDE Pawn is worth significantly more than a Shatranj Pawn, because it has a game-deciding promotion, while in Shatranj an extra Ferz is often not helpful at all. And I suspect a lot of the value of the Alfil is that, even if tactically worthless, it acts as insurance against loss by baring when only weak pieces are left.