- 1. The problem with hand-tuned evaluation functions
- 2. Turning the evaluation into feature vector + weight vector
- 3. Self-play: letting the bot grade its own weights
- 4. A lightweight hill-climbing algorithm to tune the weights
- 5. Demo: 2048 bot that self-tunes its weights (mini self-play trainer)
- 2048 Self-Learning AI Demo (Part 2)
- Current weights
- 6. Limitations & what’s coming in Part 3
1. The problem with hand-tuned evaluation functions
In Part 1, our score looked like this:
score(board) =
w_empty * emptyCells(board)
+ w_mono * monotonicity(board)
+ w_smooth * smoothness(board)
+ w_corner * maxTileCorner(board)
+ w_max * maxTile(board)
Problems:
w_*are chosen by “feel”: trial–error–refresh.- Weights that work well for depth = 4 may be bad for depth = 3 or 5.
- Whenever you add a new feature (e.g. a penalty for large tiles in the middle), you have to retune everything manually.
In other words, our “AI” does not learn from data at all. Let’s fix that by letting it play many games, measure how good a weight vector is, and adjust automatically.
2. Turning the evaluation into feature vector + weight vector
First, separate features and weights clearly:
- Features: measurements from the board (empty cells, smoothness, monotonicity, corner, max tile).
- Weights: a vector of real numbers, one per feature.
We rewrite the evaluation like this:
function extractFeatures(board) {
const emptyCount = getEmptyCells(board).length;
const smoothness = computeSmoothness(board); // negative number
const monotonicity = computeMonotonicity(board); // positive number
const cornerBonus = maxTileInCorner(board) ? 1 : 0;
const maxTile = getMaxTile(board);
return {
emptyCount,
smoothness,
monotonicity,
cornerBonus,
maxTile,
};
}
function evaluateWithWeights(board, w) {
const f = extractFeatures(board);
return (
w.wEmpty * f.emptyCount +
w.wSmooth * f.smoothness +
w.wMono * f.monotonicity +
w.wCorner * f.cornerBonus +
w.wMax * f.maxTile
);
}
Then, in expectimax, just pass the weight vector w:
function expectimax(board, depth, isPlayerTurn, w) {
if (depth === 0 || isGameOver(board)) {
return evaluateWithWeights(board, w);
}
// ... rest is the same as Part 1, only the call to evaluate(...) is changed
}
Conceptually, we now have a simple model:
score = w · f(board) (dot product of weights and feature vector).
Learning becomes: find a good weight vector w.
3. Self-play: letting the bot grade its own weights
Idea: for each weight vector w, let the bot:
- Play several full 2048 games (or up to some move limit).
- Record the average score (or average max tile).
- Treat that as the “fitness” of this
w.
So we get a fitness function:
function evaluateWeights(w) {
const NUM_GAMES = 3;
let totalScore = 0;
for (let g = 0; g < NUM_GAMES; g++) {
const score = playOneGameWithWeights(w);
totalScore += score;
}
return totalScore / NUM_GAMES;
}
Where playOneGameWithWeights(w):
- Initializes a board.
- Uses expectimax with the given
wuntil game over or target reached. - Returns the game score.
The higher the score, the better that weight vector is.
4. A lightweight hill-climbing algorithm to tune the weights
We use a very simple algorithm: stochastic hill-climbing.
Idea:
- Start from an initial “intuitive” weight set
w_best. - Compute its fitness:
fitness_best = evaluateWeights(w_best). - Repeat many times:
- Randomly perturb each weight:
w_candidate[i] = w_best[i] * (1 + noise), wherenoiseis uniform in[-0.2, 0.2]. - Compute
fitness_candidate. - If
fitness_candidate > fitness_best:w_best = w_candidatefitness_best = fitness_candidate
- Randomly perturb each weight:
Pseudocode:
let bestWeights = {
wEmpty: 350,
wSmooth: 3,
wMono: 10,
wCorner: 300,
wMax: 1,
};
let bestFitness = -Infinity;
function mutateWeights(w) {
const factor = () => 1 + (Math.random() * 0.4 - 0.2); // [-0.2, +0.2]
return {
wEmpty: w.wEmpty * factor(),
wSmooth: w.wSmooth * factor(),
wMono: w.wMono * factor(),
wCorner: w.wCorner * factor(),
wMax: w.wMax * factor(),
};
}
function trainOneStep() {
const candidate = mutateWeights(bestWeights);
const fitness = evaluateWeights(candidate); // self-play a few games
if (fitness > bestFitness) {
bestFitness = fitness;
bestWeights = candidate;
console.log("New best fitness =", fitness, "weights =", bestWeights);
}
}
This is not gradient descent, not full RL, but:
- It’s simple and intuitive.
- It can run directly in the browser.
- You can see the weights drift and watch the bot improve (or sometimes get worse!) in real time.
5. Demo: 2048 bot that self-tunes its weights (mini self-play trainer)
The demo below has two parts:
- A 2048 board so you can watch the bot play with the current weights.
- A small trainer:
- “Train 10 steps (self-play)”: the bot plays many games with mutated weights; if they are better, it keeps them.
- Displays the current weights and an estimated “fitness” (average score).
You can:
- Click “Auto-play with current weights” to watch a game using the latest weights.
- Click “Train 10 steps” a few times, then auto-play again to see whether the bot gets better.
2048 Self-Learning AI Demo (Part 2)
Current weights
6. Limitations & what’s coming in Part 3
What we’ve built is still quite “rough”:
- The evaluation is still a linear combination of a few hand-crafted features.
- The learning algorithm is just random hill-climbing: no gradient, no full “state–action–reward” setup like proper RL.
- Evaluating the fitness is expensive (many games per weight set), which doesn’t scale well if you want deeper searches.
But it’s also very practical:
- Easy to run in the browser, no backend required.
- Highly visual: you can see the weights change and the bot’s play style evolve.
- A great stepping stone towards Part 3:
- Using a neural network to learn the value function.
- TD-learning / Q-learning style RL for 2048.
- MCTS + learned value function (a tiny AlphaZero-like setup for 2048).
In Part 3, we will move from “tuning linear weights” to learning a non-linear value function, pushing our 2048 bot closer to modern game AI techniques.
Comments