nareshkarthigeyan

Building an Unbeatable Tic Tac Toe

Dec 17, 2025

Tic Tac Toe is like the holy grail of "hello world" project to understand multiple adveresrial search concepts and simple game theory. It's also frustratingly fun to have a bot not let you win at all in a game that you play in the classroom to kill boredom.

I spent around four hours building a working XO (I’m calling it that because “Tic Tac Toe” is unnecessarily long). What makes XO perfect for learning AI concepts is that it hits a sweet spot:

  • A limited state space
  • Clear rules and outcomes
  • And it’s actually fun!

Before writing any code, I needed to understand why the solution works, not just how.

You can play the game here:

Tic Tac Toe

You can never win!

Choose your side:

What is A Search?

Search problems are everywhere, from your google maps to recommendation systems to social networks and netowork routing protocols. Often times, we need to find out "paths" and efficient routes from one point to another. The usual BFS and DFS are suspects of uninformed searches. It's the algorithms that we use when we have NO information about the graph - and that's the best we could get given the constraints and we traverse through the graph with only one node at a time, and we only have idea about it's neighbors and the past route - not the full graph!

More modern and sophisticated (in a sense) algorithms like A* and Unifrom Cost Search are informed searches. We essentially have a "map" of the graph and it's cost, thereby optimizing for the best output. Things like cost, estimates, heuiristic functions and manhattan distances make way to informed searches.

But none of these quite capture what’s happening in games.

Let's bring our attention to Adversarial Search. A search where one step tries to oppose the other. i.e, a search where a multi-agent scenarios with conflicting goals.

Adversarial Searches and Tic Tac Toe

Let's define that shit again. Adversarial search is a technique for decision-making in competitive, multi-agent scenarios where players have conflicting goals. Like, "X" and "O".

In the game of XO: "X" is trying to find the best move to get it's row, column or diagonal, while making sure "O" does not get them.

and

"O" is trying to find the best move to get it's row, column or diagonal, while making sure "X" does not get them.

More strucuturally:

  • X is trying to complete a row, column, or diagonal.
  • O is trying to do the same — while preventing X from succeeding.

So that's TWO opponents playing against the best interests of the other. In such cases, what might be the most optimal "route" (route being, the move).

Every move a player makes influences not only the current state of the board, but the future decisions of your opponent. This moves more from pathfinding and routing to strategic game-theory world!!

And, the best move is in a position is simply to play against the best move of the opponent. The opponent won't let you win - so there is an inherent assumption that the opponent will also play the most optimal move possible. With that information in mind, the move that counters or minimizes the opponent's utility (optimal move) is going to be your optimal move!

Scoring & Setting the Stage

To make the computation actually possible, we assign scores to the end board states where a game terminates. In case a board does not terminate a game, we simply play all the possible senarios until there is a terminal state. The scores are then propogated back up the tree. The score:

  • X wins+1
  • O wins-1
  • Draw0

These values form a utility function. The AI’s job is to choose actions that maximize its expected utility.

minimax

I'm soo excited to talk about this because minimax is so coool!

Minimax like the name tells us: it is a recursive, adveserial search algorithm that minimizes and maximizes - get the maximized route in the minimized options (or vice versa, depending on the player).

Visually,

  • If you are X, you're best move is to play a move that prevents the best move of O.
  • So you take into record ALL of the games that can go from your current state that O will play in, and for each of those you'll respond to all possible valid moves and so on....
  • Until there's a terminal state - i.e the game ends in a WIN, LOSE or TIE and a utility value is assigned.

Now:

  • Take ALL of the games that X won, ie: utility = +1, propogate it to it's parents (did I say it was a tree with each node is a game state, and it's children are games from that position!!).
  • Then the parent node will take the max of all the children (because the intuition of "X" is to get the best move possible - i.e maximize the utility).
  • Then it propogates that value to it's parents, which now, will take the min of all the children (each of which was the max of it's children) - (because the intuition of "O" is to get the best move possible - i.e minimize the utility).
  • The process repeats alternately until we arrive at the current move.
  • Here, simply, the AI will have eliminated all possiblitites where it (X) has lost, might have a few where it (X) has a tie, and the best move is the one that has a win.
  • The bot (X) chooses the best possible move with a win (or a tie if there's no winning route).

I hope that made sense,

because it's the opposite for O - who also plays a similar way, but tries to get the lowest score (-1) by minimizing a maximally optimal tree.

A clear assumption is that both the players (X and O) will play optimally, i.e, they will play the best move possible.

Tic Tac Toe

You can never win!

Choose your side: