AI & 2048: Learning the Evaluation Function with TensorFlow.js

In Part 1, we built a 2048 bot using expectimax plus a hand-crafted evaluation function. In Part 2, we let the bot self-tune that evaluation function using simple self-play hill-climbing. In this Part 3, we take the next step: we use TensorFlow.js to train a neural network that learns a value function V(board) directly in the browser, and we use that neural network to pick moves. Everything is done client-side with JavaScript and TensorFlow.js – no Python, no backend.

AI & 2048: Learning the Evaluation Function with TensorFlow.js

1. Goal: learn a value function V(board)

We want a function:

V(board) ≈ “how good this state is”

Then, at each move:

  1. Simulate all 4 directions: Up, Right, Down, Left.
  2. Apply the move, get the resulting board.
  3. Use the neural network to predict V(board_next).
  4. Pick the move with the highest predicted value.

In Parts 1–2, this value was computed by a linear heuristic.
Now, V is a non-linear function represented by a neural network trained from data.

2. Encoding the board as neural network input

The neural network works with numeric vectors, not 4×4 grids.
We encode the board into a vector of length 16:

  • Empty cell (0) → 0
  • Tile 2 → log2(2) = 1
  • Tile 4 → 2
  • Tile 8 → 3
  • … and so on

We also lightly scale it to keep values in a reasonable range:

function encodeBoard(board) {
  const arr = [];
  for (let r = 0; r < 4; r++) {
    for (let c = 0; c < 4; c++) {
      const v = board[r][c];
      if (v === 0) {
        arr.push(0);
      } else {
        arr.push(Math.log2(v) / 16); // simple scaling
      }
    }
  }
  return arr; // length = 16
}

This gives us a fixed-size numerical representation of the board that can be fed into a dense neural network.

3. Neural network architecture in TensorFlow.js

We use a simple fully-connected network:

  • Input layer: 16 units (the encoded board)
  • Hidden layer 1: 64 units, ReLU
  • Hidden layer 2: 64 units, ReLU
  • Output layer: 1 unit (scalar value of the board)

In TensorFlow.js:

function createModel() {
  const model = tf.sequential();
  model.add(tf.layers.dense({
    inputShape: [16],
    units: 64,
    activation: 'relu',
  }));
  model.add(tf.layers.dense({
    units: 64,
    activation: 'relu',
  }));
  model.add(tf.layers.dense({
    units: 1, // scalar value V(board)
  }));
  model.compile({
    optimizer: tf.train.adam(0.001),
    loss: 'meanSquaredError',
  });
  return model;
}

This is deliberately small and simple, so it can train quickly in the browser.

4. Where do training targets come from?

A neural network needs target values to learn from. For this part, we do not use “pure RL” yet.
Instead, we let our hand-crafted heuristic from Part 1–2 act as a teacher.

Training pipeline:

  1. Generate many board states via random self-play (short rollouts).
  2. For each board, compute a heuristic score: target = heuristicValue(board).
  3. Train the neural network to predict that heuristic score from the encoded board.

This is supervised learning from a pseudo-expert.
Once the network can approximate the heuristic well, we can optionally keep training it, or later replace the targets with TD-learning updates.

5. Using the neural network to pick moves

Given a model and a board, the NN-based evaluation is:

function evaluateBoardNN(board, model) {
  return tf.tidy(() => {
    const input = tf.tensor2d([encodeBoard(board)], [1, 16]);
    const output = model.predict(input);
    const value = output.dataSync()[0];
    return value;
  });
}

To choose a move:

function chooseBestMoveWithNN(board, model) {
  const moves = [UP, RIGHT, DOWN, LEFT];
  let bestMove = null;
  let bestValue = -Infinity;

  for (const move of moves) {
    const result = moveBoard(board, move);
    if (!result.moved) continue;
    const value = evaluateBoardNN(result.board, model);
    if (value > bestValue) {
      bestValue = value;
      bestMove = move;
    }
  }

  return bestMove;
}

If the model is not trained yet, we can fall back to the classic heuristic so the bot still plays something reasonable.

6. Live demo: 2048 + TensorFlow.js neural value function

The demo below includes:

  • A 4×4 2048 board rendered in HTML.
  • Buttons to:
    • Auto-play with NN – the bot uses the neural network to choose moves.
    • Train 1 batch (NN) – generate training data and update the network once.
    • Train 10 batches (NN) – same as above, repeated 10 times.
    • Reset model – discard the current neural network and create a new one.
    • Reset board – start a fresh 2048 game.
  • Live info:
    • Whether the model is initialized.
    • How many training batches have been run.
    • The loss of the last training batch.

All training happens in the browser using TensorFlow.js loaded from a CDN.

2048 Neural Value Function Demo (TensorFlow.js)

A neural network learns V(board) from a heuristic and then uses it to select moves

Score: 0 | Max tile: 0 | Status:

NN training info


7. Conclusion and next steps

In this Part 3, we:

  • Encoded the 2048 board as a fixed-size numeric vector.
  • Built a neural network in TensorFlow.js to approximate a value function V(board).
  • Used the hand-crafted heuristic as a teacher to generate supervised training data.
  • Used the trained neural network to drive a 2048 bot directly in the browser.

From here, there are several natural upgrades:

  • Replace heuristic targets with TD-learning or Q-learning style updates.
  • Combine expectimax with the neural value function (NN as a learned evaluator instead of a manual heuristic).
  • Add replay buffers, learning rate schedules, and better exploration strategies.

But even with just these three parts, the “AI & 2048” series has gone from human heuristics → expectimax → self-play tuning → neural value functions in TensorFlow.js — enough to build a surprisingly smart 2048 bot entirely inside the browser.

Comments


  • No comments yet.

Init Toolbox

Press Ctrl + \ on desktop, or swipe left anywhere on mobile.

Login