Word search puzzles are easy for humans and trivially easy for computers - until you start asking more interesting questions. A brute-force solver takes a dozen lines of Python. An efficient multi-word solver using a trie takes a few more. But framing the problem as a graph search, and applying A* in multiple dimensions, opens up genuinely novel territory: simultaneous multi-word search, Boggle-style free-path solving, and even puzzle *generation* as an optimisation problem.
This article walks through all three levels, with working code at each stage.
Level 1: Brute Force
The standard word search is simple to solve exhaustively. For each word, try every cell as a starting point in every direction:
DIRECTIONS = [
(0, 1), # right
(0, -1), # left
(1, 0), # down
(-1, 0), # up
(1, 1), # down-right
(1, -1), # down-left
(-1, 1), # up-right
(-1, -1), # up-left
]
def find_word(grid, word):
rows, cols = len(grid), len(grid[0])
for row in range(rows):
for col in range(cols):
for dr, dc in DIRECTIONS:
cells = []
match = True
for depth, char in enumerate(word):
nr, nc = row + dr * depth, col + dc * depth
if not (0 <= nr < rows and 0 <= nc < cols):
match = False
break
if grid[nr][nc] != char:
match = False
break
cells.append((nr, nc))
if match:
return cells
return None
def solve(grid, words):
return {word: find_word(grid, word) for word in words}
Complexity: For a grid of R rows and C columns, W words each of average length L, the cost is O(R × C × 8 × L × W). For our 12x12 grid with 12 words averaging 7 characters, that is around 96,768 operations - effectively instantaneous.
The brute-force approach is perfectly adequate for standard word searches. There is no practical reason to optimise it. But things get interesting when the problem changes shape.
Level 2: Multi-Word Search with a Trie
If you want to find all possible words from a dictionary hidden in a grid (a problem that arises in procedural puzzle generation or in Boggle-style games), checking each word separately is wasteful. Many words share prefixes - BIRD, BIRDS, BIRDCAGE all start with B-I-R-D. A trie (prefix tree) lets you prune the search the moment a path stops matching any word prefix.
class TrieNode:
def __init__(self):
self.children = {}
self.word = None # non-None at word-end nodes
def build_trie(words):
root = TrieNode()
for word in words:
node = root
for char in word:
node = node.children.setdefault(char, TrieNode())
node.word = word
return root
def find_all_words(grid, trie_root):
rows, cols = len(grid), len(grid[0])
found = {}
for start_row in range(rows):
for start_col in range(cols):
for dr, dc in DIRECTIONS:
node = trie_root
row, col = start_row, start_col
cells = []
while 0 <= row < rows and 0 <= col < cols:
char = grid[row][col]
if char not in node.children:
break
node = node.children[char]
cells.append((row, col))
if node.word:
found[node.word] = list(cells)
row += dr
col += dc
return found
This visits each cell once per direction and prunes immediately when no trie child matches. For a 12x12 grid searching a dictionary of 10,000 words, it is orders of magnitude faster than the naive approach.
Level 3: A* Search Across Multiple Dimensions
Here is where it gets genuinely interesting. Suppose you want to solve a Boggle-style variant: words are hidden along any connected path through the grid (not just straight lines), and you can move to any adjacent cell in any of the eight directions at each step. This is a graph search problem.
Define the state space:
- Node:
(row, col, depth)- current grid position and how many characters of the target word have been matched so far. - Start state:
(start_row, start_col, 0)for every cell whose first character matches the word's first character. - Goal state: any node where
depth == len(word). - Edges: from
(row, col, depth)to(row + dr, col + dc, depth + 1)for each direction, ifgrid[row + dr][col + dc] == word[depth].
The heuristic for A* is the number of remaining characters: h(state) = len(word) - depth. This is admissible (never overestimates the true cost) because each step can match at most one character. One consequence worth noting: f = g + h = depth + (len(word) - depth) = len(word) is constant for every state at every depth level, so the priority queue provides no meaningful ordering - A* with this heuristic is equivalent to BFS. This is why it offers no practical advantage over brute force for standard word search, but the framing becomes useful once the search space gains real structure (see the multi-dimensional extension below).
import heapq
def astar_find_word(grid, word):
rows, cols = len(grid), len(grid[0])
# Priority queue: (f_score, g_score, row, col, depth, path)
heap = []
for row in range(rows):
for col in range(cols):
if grid[row][col] == word[0]:
g = 1
h = len(word) - g
heapq.heappush(heap, (g + h, g, row, col, 1, [(row, col)]))
while heap:
f, g, row, col, depth, path = heapq.heappop(heap)
if depth == len(word):
return path
path_set = set(path) # cells already used in this path
for dr, dc in DIRECTIONS:
nr, nc = row + dr, col + dc
if not (0 <= nr < rows and 0 <= nc < cols):
continue
if (nr, nc) in path_set: # Boggle rule: no cell reuse within a path
continue
if grid[nr][nc] != word[depth]:
continue
new_g = g + 1
new_h = len(word) - new_g
heapq.heappush(
heap,
(new_g + new_h, new_g, nr, nc, depth + 1, path + [(nr, nc)])
)
return None # word not found
For a standard (straight-line) word search, A* offers no advantage over brute force - the heuristic is tight but the search space is already small. Where it becomes genuinely useful is in the free-path variant: by expanding cheapest-first, A* finds the first valid path without exhaustively exploring dead-end branches.
The Multi-Dimensional Extension
The user who suggested this article's title had in mind something even more ambitious: searching for multiple words simultaneously as a joint optimisation problem. In this framing, the state space becomes multi-dimensional - one axis per word.
A state is now (depth_0, depth_1, ..., depth_n) representing simultaneous progress through n target words. The goal is a state where all depths equal their respective word lengths. This is the structure of an n-dimensional grid search, and A* with an appropriate joint heuristic (sum of remaining characters across all words) can navigate it efficiently.
In practice, this formulation is most useful for puzzle generation: you want to place W words in a grid such that they overlap maximally (sharing letters at crossing points reduces grid density). Framing placement as a joint A* search over the space of word positions finds high-overlap configurations far more efficiently than random trial-and-error.
The complexity grows exponentially with the number of words, so in practice a beam search variant (keeping only the K most promising states at each step) is used for large word sets - the same approach used in machine translation and speech recognition.
Further Reading
- Russell & Norvig, Artificial Intelligence: A Modern Approach - the canonical treatment of A* and heuristic search.
- Amit Patel's guide to A* - a superb visual introduction to the algorithm.
- Sedgewick & Wayne, Algorithms (4th ed.) - for trie implementation and analysis.
See how these generation principles apply to Word Puzzle Hub's own puzzle system in How Our Puzzles Are Made.