The Pandemic

It was sticky. It was hot and constrictive and relentless. It was the summer of 2020. I was trapped in a tiny New York apartment without very good air-conditioning, aging. I needed a win. America needed a win.

I played a lot of Pokémon Showdown during the pandemic. It's a website that let's you build your own team of Pokémon and battle others. You can experience all the fun of Pokémon battling with different teams without doing all the work tthe build the teams in game. You can also opt to just be assigned a random team of Pokémon and battle others with random teams, which is what I normally do.

When I took my first machine learning class in college I joked about training my computer to make me better at Showdown. Now, in my darkest hour, playing the game constantly, the idea returned to me, half-seriously this time. Did I finally have the programming and machine learning skills to make this happen? What if I failed? Could I really be the one to pull this off? I decided to risk it all and go for it.

The API

My goal for this project was always to stand up a bot capable of autonomously playing a generation 1 (simplest) Pokémon battle that used ML to make decisions. Simple ML would be fine. I would make the program modular so changing the ML algorithm would be no harder than setting the project up with the new algorithm initially. It would be slightly harder to make new predictors, but I was still fine with a manageable set of predictors in the beginning.

Therefore, the most difficult part became dealing with the Showdown interface. I found an incomplete API wrapper and used that to start. Thank you ckw017. The API wrapper was good at initiating battle rooms and sending messages, but was not able to parse more complex battle information. When the wrapped didn't know what to do with some information it just dumped the json into the terminal. I wrote my own child object in Python to extend the API. I played a lot of battles and every time something fell through the cracks I just wrote another few lines of code to deal with that outcome. Now, I think I have an API wrapper than can parse any outcome that can happen in a gen 1 Pokémon battle. I also wrote an object that contains the entire state of the battle at any time, which my API wrapper manipulates. Finally I just needed to write a method that would tell the API-wrapper what move to make next given the state of the battle. I also had to learn how to use async/await properly.

The ML

What was I really trying to optimize for? A win. But usually any move does not immediately lead to a win. I needed a simple (at least at first) metric that could apply to any specific move to optimize for. Inspired by finance, I created the "net present win" metric, which is zero if no win is achieved, 1 if a turn results in a win, and decreases with each additional turn it takes to win. So after each battle for each turn I can calculate the net present win score.

I also wanted to represent the state of the game at each turn in a small number of columns so it would take less data for the model to start to get oriented. I lost a lot of detail in the process, but I broke each turn down into things like expected damage done, damage received, status probability, outspeed probability, etc. So now, given a game state and set of possible moves that can be made, I can calculate the predictors I mentioned for each possible action, then use a model based on previous battles to get the predicted net present win score of each possible move, and select the move with the highest predicted net present win score, functionally optimizing for the fastest win. The model currently uses KNN to make its predictions, but it's very easy to swap out the model object for anything else, such as regression or forests. My friend Jake helped me design and code some of the predictors by opening PRs into the repo.

Other Bells and Whistles

The tool reconnects and reorients itself if anything disrupts the connection. The tool can handle an arbitrary number of battles at once with no cross talk. It also has a "training mode" where it defers to a human operator on all decisions and learns from them. This is really the only way to build up a good kernel of data - otherwise it just loses every battle and never is able to start identifying actions that would be associated with a good net present win score. The training mode has also been how I have found the tool to be the most useful. Not that many people play gen 1 battles and I don't want to inconvenience anyone by having the model just play endless battles sub-optimally, so I prefer to battle, have the model learn, and then battle the model myself. This sometimes helps me learn things about my own battle strategy, though it's not that nuanced due to the information loss I mentioned above.

Video

Here's what it looks like to battle against the model! Here is the code.