The Exploration-Exploitation tradeoff is a fascinating duality. On the one hand, we seek new information, possibilities, risk-taking, and experimentation. On the other hand, we refine and implement our existing body of knowledge. This helps us to maximize our scores and results.
The duality is nowhere more pronounced than in the restaurant dilemma. When going on a trip, we are usually in the mood to try new foods. On the last night of the trip, most of us go to the same restaurant we liked at first. We'd rather not risk disappointment.
Another example is the dating market. Should one settle for a familiar persona from their past? Or should they go on different dates to find a deeper connection? Once you realize the pattern, you can see it everywhere. It also applies to careers and investment choices.
I came across the tradeoff in 2019 while learning how computers learn to play games and beat humans. The tradeoff represents a quantitative measure that balances learning, choosing, and decision-making.
In computer systems, the tradeoff is represented by a discounting factor. It starts high and lowers after a few iterations of training the system. Initially, the system should explore as much as possible. As it begins to form a model of reality, it should start exploiting its knowledge of the system. This means making choices and engaging with the model it has built, sometimes refining the model and other times reaping the reward.
This behavior of computer systems clones our personal experience. We explore and learn more about the world when we're young. But as we age, we prefer to stick to what we know and capitalize on our knowledge.
The behavior pattern betrays an understandable bias. We miss new opportunities because we feel uncomfortable learning further information. Instead, we rely on past experience and justify it as exploitation.
Now, next time you make a decision, ask yourself: Are you following the best-known policy, or are you trying something new to map the trajectory better for future adventures?