Self-learning 2048 bot: tuning the evaluation function with self-play

In Part 1, we built a 2048 bot using expectimax plus a hand-crafted evaluation function: more empty cells, smoothness, monotonicity, big tile in the corner, etc. The bot plays decently, but there is still a problem: all weights are chosen by intuition. In this Part 2, we give the bot a bit of self-learning ability: it will auto-tune the weights in the evaluation function by playing against itself (self-play), measuring performance, and updating gradually. It’s not full-blown reinforcement learning yet, but it’s a clear step from a static AI to an AI that can adapt.

Self-learning 2048 bot: tuning the evaluation function with self-play

1. The problem with hand-tuned evaluation functions

In Part 1, our score looked like this:

score(board) =
    w_empty   * emptyCells(board)
  + w_mono    * monotonicity(board)
  + w_smooth  * smoothness(board)
  + w_corner  * maxTileCorner(board)
  + w_max     * maxTile(board)

Problems:

  • w_* are chosen by “feel”: trial–error–refresh.
  • Weights that work well for depth = 4 may be bad for depth = 3 or 5.
  • Whenever you add a new feature (e.g. a penalty for large tiles in the middle), you have to retune everything manually.

In other words, our “AI” does not learn from data at all. Let’s fix that by letting it play many games, measure how good a weight vector is, and adjust automatically.

2. Turning the evaluation into feature vector + weight vector

First, separate features and weights clearly:

  • Features: measurements from the board (empty cells, smoothness, monotonicity, corner, max tile).
  • Weights: a vector of real numbers, one per feature.

We rewrite the evaluation like this:

function extractFeatures(board) {
  const emptyCount   = getEmptyCells(board).length;
  const smoothness   = computeSmoothness(board);   // negative number
  const monotonicity = computeMonotonicity(board); // positive number
  const cornerBonus  = maxTileInCorner(board) ? 1 : 0;
  const maxTile      = getMaxTile(board);

  return {
    emptyCount,
    smoothness,
    monotonicity,
    cornerBonus,
    maxTile,
  };
}

function evaluateWithWeights(board, w) {
  const f = extractFeatures(board);

  return (
    w.wEmpty  * f.emptyCount   +
    w.wSmooth * f.smoothness   +
    w.wMono   * f.monotonicity +
    w.wCorner * f.cornerBonus  +
    w.wMax    * f.maxTile
  );
}

Then, in expectimax, just pass the weight vector w:

function expectimax(board, depth, isPlayerTurn, w) {
  if (depth === 0 || isGameOver(board)) {
    return evaluateWithWeights(board, w);
  }
  // ... rest is the same as Part 1, only the call to evaluate(...) is changed
}

Conceptually, we now have a simple model:
score = w · f(board) (dot product of weights and feature vector).
Learning becomes: find a good weight vector w.

3. Self-play: letting the bot grade its own weights

Idea: for each weight vector w, let the bot:

  • Play several full 2048 games (or up to some move limit).
  • Record the average score (or average max tile).
  • Treat that as the “fitness” of this w.

So we get a fitness function:

function evaluateWeights(w) {
  const NUM_GAMES = 3;
  let totalScore = 0;

  for (let g = 0; g < NUM_GAMES; g++) {
    const score = playOneGameWithWeights(w);
    totalScore += score;
  }

  return totalScore / NUM_GAMES;
}

Where playOneGameWithWeights(w):

  • Initializes a board.
  • Uses expectimax with the given w until game over or target reached.
  • Returns the game score.

The higher the score, the better that weight vector is.

4. A lightweight hill-climbing algorithm to tune the weights

We use a very simple algorithm: stochastic hill-climbing.

Idea:

  1. Start from an initial “intuitive” weight set w_best.
  2. Compute its fitness: fitness_best = evaluateWeights(w_best).
  3. Repeat many times:
    • Randomly perturb each weight:
      w_candidate[i] = w_best[i] * (1 + noise), where noise is uniform in [-0.2, 0.2].
    • Compute fitness_candidate.
    • If fitness_candidate > fitness_best:
      • w_best = w_candidate
      • fitness_best = fitness_candidate

Pseudocode:

let bestWeights = {
  wEmpty: 350,
  wSmooth: 3,
  wMono: 10,
  wCorner: 300,
  wMax: 1,
};

let bestFitness = -Infinity;

function mutateWeights(w) {
  const factor = () => 1 + (Math.random() * 0.4 - 0.2); // [-0.2, +0.2]
  return {
    wEmpty:  w.wEmpty  * factor(),
    wSmooth: w.wSmooth * factor(),
    wMono:   w.wMono   * factor(),
    wCorner: w.wCorner * factor(),
    wMax:    w.wMax    * factor(),
  };
}

function trainOneStep() {
  const candidate = mutateWeights(bestWeights);
  const fitness   = evaluateWeights(candidate); // self-play a few games

  if (fitness > bestFitness) {
    bestFitness = fitness;
    bestWeights = candidate;
    console.log("New best fitness =", fitness, "weights =", bestWeights);
  }
}

This is not gradient descent, not full RL, but:

  • It’s simple and intuitive.
  • It can run directly in the browser.
  • You can see the weights drift and watch the bot improve (or sometimes get worse!) in real time.

5. Demo: 2048 bot that self-tunes its weights (mini self-play trainer)

The demo below has two parts:

  • A 2048 board so you can watch the bot play with the current weights.
  • A small trainer:
    • “Train 10 steps (self-play)”: the bot plays many games with mutated weights; if they are better, it keeps them.
    • Displays the current weights and an estimated “fitness” (average score).

You can:

  • Click “Auto-play with current weights” to watch a game using the latest weights.
  • Click “Train 10 steps” a few times, then auto-play again to see whether the bot gets better.

2048 Self-Learning AI Demo (Part 2)

Expectimax + linear heuristic, weights tuned via self-play hill-climbing

Current score:0 | Max tile: 0 | Status:

Current weights


Estimated fitness (avg score):

6. Limitations & what’s coming in Part 3

What we’ve built is still quite “rough”:

  • The evaluation is still a linear combination of a few hand-crafted features.
  • The learning algorithm is just random hill-climbing: no gradient, no full “state–action–reward” setup like proper RL.
  • Evaluating the fitness is expensive (many games per weight set), which doesn’t scale well if you want deeper searches.

But it’s also very practical:

  • Easy to run in the browser, no backend required.
  • Highly visual: you can see the weights change and the bot’s play style evolve.
  • A great stepping stone towards Part 3:
    • Using a neural network to learn the value function.
    • TD-learning / Q-learning style RL for 2048.
    • MCTS + learned value function (a tiny AlphaZero-like setup for 2048).

In Part 3, we will move from “tuning linear weights” to learning a non-linear value function, pushing our 2048 bot closer to modern game AI techniques.

Comments


  • No comments yet.

Init Toolbox

Press Ctrl + \ on desktop, or swipe left anywhere on mobile.

Login