- 1. Goal: learn a value function V(board)
- 2. Encoding the board as neural network input
- 3. Neural network architecture in TensorFlow.js
- 4. Where do training targets come from?
- 5. Using the neural network to pick moves
- 6. Live demo: 2048 + TensorFlow.js neural value function
- 2048 Neural Value Function Demo (TensorFlow.js)
- NN training info
- 7. Conclusion and next steps
1. Goal: learn a value function V(board)
We want a function:
V(board) ≈ “how good this state is”
Then, at each move:
- Simulate all 4 directions: Up, Right, Down, Left.
- Apply the move, get the resulting board.
- Use the neural network to predict
V(board_next). - Pick the move with the highest predicted value.
In Parts 1–2, this value was computed by a linear heuristic.
Now, V is a non-linear function represented by a neural network trained from data.
2. Encoding the board as neural network input
The neural network works with numeric vectors, not 4×4 grids.
We encode the board into a vector of length 16:
- Empty cell (0) → 0
- Tile 2 → log2(2) = 1
- Tile 4 → 2
- Tile 8 → 3
- … and so on
We also lightly scale it to keep values in a reasonable range:
function encodeBoard(board) {
const arr = [];
for (let r = 0; r < 4; r++) {
for (let c = 0; c < 4; c++) {
const v = board[r][c];
if (v === 0) {
arr.push(0);
} else {
arr.push(Math.log2(v) / 16); // simple scaling
}
}
}
return arr; // length = 16
}
This gives us a fixed-size numerical representation of the board that can be fed into a dense neural network.
3. Neural network architecture in TensorFlow.js
We use a simple fully-connected network:
- Input layer: 16 units (the encoded board)
- Hidden layer 1: 64 units, ReLU
- Hidden layer 2: 64 units, ReLU
- Output layer: 1 unit (scalar value of the board)
In TensorFlow.js:
function createModel() {
const model = tf.sequential();
model.add(tf.layers.dense({
inputShape: [16],
units: 64,
activation: 'relu',
}));
model.add(tf.layers.dense({
units: 64,
activation: 'relu',
}));
model.add(tf.layers.dense({
units: 1, // scalar value V(board)
}));
model.compile({
optimizer: tf.train.adam(0.001),
loss: 'meanSquaredError',
});
return model;
}
This is deliberately small and simple, so it can train quickly in the browser.
4. Where do training targets come from?
A neural network needs target values to learn from. For this part, we do not use “pure RL” yet.
Instead, we let our hand-crafted heuristic from Part 1–2 act as a teacher.
Training pipeline:
- Generate many board states via random self-play (short rollouts).
- For each board, compute a heuristic score:
target = heuristicValue(board). - Train the neural network to predict that heuristic score from the encoded board.
This is supervised learning from a pseudo-expert.
Once the network can approximate the heuristic well, we can optionally keep training it, or later replace the targets with TD-learning updates.
5. Using the neural network to pick moves
Given a model and a board, the NN-based evaluation is:
function evaluateBoardNN(board, model) {
return tf.tidy(() => {
const input = tf.tensor2d([encodeBoard(board)], [1, 16]);
const output = model.predict(input);
const value = output.dataSync()[0];
return value;
});
}
To choose a move:
function chooseBestMoveWithNN(board, model) {
const moves = [UP, RIGHT, DOWN, LEFT];
let bestMove = null;
let bestValue = -Infinity;
for (const move of moves) {
const result = moveBoard(board, move);
if (!result.moved) continue;
const value = evaluateBoardNN(result.board, model);
if (value > bestValue) {
bestValue = value;
bestMove = move;
}
}
return bestMove;
}
If the model is not trained yet, we can fall back to the classic heuristic so the bot still plays something reasonable.
6. Live demo: 2048 + TensorFlow.js neural value function
The demo below includes:
- A 4×4 2048 board rendered in HTML.
- Buttons to:
- Auto-play with NN – the bot uses the neural network to choose moves.
- Train 1 batch (NN) – generate training data and update the network once.
- Train 10 batches (NN) – same as above, repeated 10 times.
- Reset model – discard the current neural network and create a new one.
- Reset board – start a fresh 2048 game.
- Live info:
- Whether the model is initialized.
- How many training batches have been run.
- The loss of the last training batch.
All training happens in the browser using TensorFlow.js loaded from a CDN.
2048 Neural Value Function Demo (TensorFlow.js)
NN training info
7. Conclusion and next steps
In this Part 3, we:
- Encoded the 2048 board as a fixed-size numeric vector.
- Built a neural network in TensorFlow.js to approximate a value function
V(board). - Used the hand-crafted heuristic as a teacher to generate supervised training data.
- Used the trained neural network to drive a 2048 bot directly in the browser.
From here, there are several natural upgrades:
- Replace heuristic targets with TD-learning or Q-learning style updates.
- Combine expectimax with the neural value function (NN as a learned evaluator instead of a manual heuristic).
- Add replay buffers, learning rate schedules, and better exploration strategies.
But even with just these three parts, the “AI & 2048” series has gone from human heuristics → expectimax → self-play tuning → neural value functions in TensorFlow.js — enough to build a surprisingly smart 2048 bot entirely inside the browser.
Comments