I wrote "there are no simulations" a while ago, referring to a chess playing robot as an example. To explain this better, let's consider what is the "reality" for a chess engine.
Is the reality the abstract game with the FIDE rules, and if a physical chess piece is slightly off-center on a square it's a "simulation artifact"? Because we can only represent the game physically in a manner of imperfect fidelity we could argue that the abstract game is the reality and there are imperfect projections of it on different media.
There can be a physical chess board, or a board represented as pixels on a screen. There can also be a chess playing robot moving and sensing physical pieces. These are all projections from the game itself which is played in the space of rules and not in the space of representations.
The knowledge and skills for the game of chess are embodied in different agents, human or machine.
This applies to all "simulations" actually. There are no simulations, only games, interfaces and embodiments.
It doesn't matter if a flight simulator for pilot training isn't totally photorealistic. They trained pilots successfully with very crude simulators before GPUs were a thing. What matters is how the skills and knowledge are represented in the games, to make them transferrable across different embodiments; from flight simulators to planes of different kinds.
A struggle for ever more photorealism in AI training makes little sense; there are diminishing gains especially if these "improvements" mean lower volumes of training data. What we need to struggle towards is a scaling sweet-spot where the skills we want trained are exercised as fully as possible within compute scaling curves budgeting compute between fidelity and volume.
In any real-world training we would often trade fidelity to volume simply because volume means the agents can try many more different actions, policies and strategies. The volume is more important than fidelity as long as fidelity is just enough to exercise the relevant skills.
It is a wrong way to think to think "simulations" because it makes one focus on real-world match, when what is important is actually creating games which allow training for the skills — physical or cognitive — which are relevant and transferrable to the target context, rather than making ever heavier, ever more photorealistic high-fidelity simulations which trade volume for beauty.
It is always possible to construct curriculums of game environments where the highest fidelity environments are saved for the last fine-tunings, while the bulk of the skills and knowledge can be trained in high-volume environments before that.