Human and machine approaches to solving the treasure hunt game have similarities and differences that are attributed to the biological markup of humans and the hardware and algorithms of machines.
Humans are rational thinkers and can devise systematic strategies to solve the treasure hunt game. For example, a common strategy for a human is to walk alongside the walls of the maze using the right-hand or left-hand strategy until the treasure is found. This approach might not be as efficient as some machine approaches, but it is easier for a human than visually scanning the maze to determine the correct path.
Unlike humans, machines can leverage many algorithms, including reinforcement learning to solve the treasure hunt game. Using reinforcement learning, a machine can find the optimal path to the treasure by using a disciplinary system of punishments and rewards for each action the intelligent agent makes.
In Q-learning, the agent starts with zero knowledge of the maze and its objective inside the maze or what actions lead to rewards or punishments.
As the agent plays the game, it quickly learns that it is getting punished for wandering and taking paths that do not lead to a reward.
The agent saves its experience for the actions it has taken in a particular state and stores that experience as a numerical value in its Q-table. The values in the Q-table indicate how likely the agent is to take that action again in the future. After the agent plays the game multiple times and its Q-table becomes consistent, the agent would have discovered the optimal path to the treasure.