Why does the gradient descent prompt work better in 2D than 3D?

2D loss surfaces are easier to read and easier to render. Manim's 3D camera works but the shaded paraboloid often obscures the path. Most teachers default to 2D for the same reason. Use 3D only when the goal is to show the surface itself.

Can the AI animate a neural net training, not just a forward pass?

Training animations require many frames per epoch and accumulate too many mobjects for a 30-to-60 second clip. The forward pass on a fixed network is the right scope for a single AI prompt. For training, generate the data offline and feed it as a sequence of forward-pass scenes.

What is the trick that makes the Pythagorean prompt succeed?

Naming the squares. The classic visual proof has three squares on the sides of a right triangle. If you ask for the proof without naming the squares, the model produces a triangle with arrows. Name the squares (square on side a, square on side b, square on hypotenuse) and the visual snaps into place.

Are these prompts copy-paste ready?

Yes. They are the same prompts we ship in the Madio templates library. You can paste them into the editor as written, and they render on Free tier (5 credits, 720p, watermarked).

What if the rendered output looks ugly?

Three usual causes. The default font does not have the symbol you used (rare, but happens with non-Latin characters). The palette collapsed to two colors that look the same. The scene is too dense and labels overlap. The fixes are: ASCII variable names, explicit four-color palette, and fewer labels per frame.

Animating gradient descent, neural nets, and Pythagoras with AI

Sun May 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) · Sanatan Sharma

The gap between an AI math animation that works and one that does not comes down to specificity. This post walks through three worked examples to animate gradient descent, a neural network forward pass, and the Pythagorean theorem. Each example shows the prompt, what the rendered frame looks like, and why this version succeeds where vaguer versions fail.

The prompts are tested on Madio's Gemini 3 Flash path against Manim Community v0.18.1 and they render on Free tier. They also work in raw Grant Sanderson's Manim when fed to other LLMs, with the usual caveat that you do not get the retry loop.

Why these three examples

Gradient descent, a forward pass, and Pythagoras are deliberately chosen.

Gradient descent is the most-requested math animation on Madio after sine waves. It also breaks the most often, because the loss surface, the gradient arrow, and the descent path are three things at once.
A neural net forward pass is structural. The visual is a graph with weights and activations. The prompt has to specify how many layers and how many nodes per layer, or the model picks something arbitrary.
Pythagoras is the classic visual proof. It tests whether the prompt can produce a labeled, multi-color geometric figure, which is the core skill for any explanatory math animation.

If the same prompt patterns produce all three reliably, they generalize.

Example 1: gradient descent

The prompt

Animate gradient descent on a 2D loss curve in 4 steps. Step 1: plot the loss curve y = (x - 2)^2 + 1 from x = -2 to x = 6 in blue, with x and y axes labeled. Step 2: place a red dot at x = -1, y = 10 and label it "start". Step 3: animate the dot moving down the curve in 5 discrete steps, each step shorter than the last, with a small arrow showing the gradient direction at each position. Step 4: stop at the minimum (2, 1) and label it "minimum". Hold the final frame for 2 seconds.

The output

The rendered scene starts with a clean 2D plot. The blue parabola is centered, the axes are gray and labeled x and y(x). A red dot appears in the upper-left and the word "start" floats next to it. Five animated jumps follow. Each jump is shorter than the previous because the gradient is smaller as the dot approaches the minimum. A small arrow at each step points down the slope. At the bottom, the dot stops, and a green label "minimum" appears next to it.

Total runtime: about 9 seconds at 480p preview, 12 seconds at 1080p final. Renders on the first attempt 88 percent of the time on Madio's logs.

What works

Three things make this prompt reliable.

The number of steps is named. "5 discrete steps, each shorter than the last" stops the model from animating a continuous slide that obscures the iterative nature of gradient descent. Iteration is the point of the visual, and naming it forces the right structure.

The function is concrete. "y = (x - 2)^2 + 1" gives the model a closed-form curve to plot. A vague "a parabola" works sometimes, but the model will pick y = x^2 and the minimum at the origin overlaps the y-axis label.

The colors are named. Blue curve, red dot, green minimum label. Four colors maximum, all named, no ambiguity. This is the palette guard pattern applied locally.

What fails

The same idea, prompted vaguely, fails about half the time:

Animate gradient descent.

This produces, depending on the random seed, a 3D paraboloid with a vague descent path, a 2D curve with no axis labels, or text that says "gradient descent" with arrows. None of those are wrong, but none are the visual you wanted.

Example 2: neural network forward pass

The prompt

Animate a forward pass through a 3-layer neural network. The network has 3 input nodes (x1, x2, x3), 4 hidden nodes (h1 to h4), and 2 output nodes (o1, o2). Step 1: draw the network as circles connected by lines, input layer on the left, hidden in the middle, output on the right, all in light gray. Step 2: light up the input nodes in blue, one at a time, showing the values 0.5, 0.2, 0.9 next to each. Step 3: animate the activation flowing left to right by lighting up each hidden node in green as it receives signal, with a brief number flash showing the activation value. Step 4: light up the output nodes in red with their final values 0.7, 0.3. Hold for 3 seconds.

The output

The frame opens with a static graph: 3 circles on the left, 4 in the middle, 2 on the right, all in light gray, connected by thin gray lines representing weights. As step 2 plays, the leftmost circles fill with blue one at a time and a number appears next to each. As step 3 plays, the hidden circles fill with green left to right, with a brief number flash on each. The output circles fill red and their numbers stabilize.

Runtime: 11 seconds at preview, 15 seconds at final. First-try success rate: 81 percent. The most common failure is the model drawing fully connected lines but forgetting to draw the lighting animation, which produces a static frame after step 1.

What works

Named layer sizes. "3 input nodes (x1, x2, x3), 4 hidden nodes, 2 output nodes" removes ambiguity. Without the explicit count the model picks anything from 2 to 6 nodes per layer.

Named entities per node. x1 to x3, h1 to h4, o1 to o2. The model can label each circle without inventing names.

Sequential lighting. "One at a time" and "left to right" forces a temporal structure, which is the point of an animation. Without those phrases the model often lights everything at once.

What fails

Animate a neural network learning.

The word "learning" is the trap. Training requires many forward passes, a loss function, and a backprop visualization. That is three animations, not one. The model attempts all three, runs out of scene budget, and produces something incoherent. The fix is to scope the prompt to a single concept: forward pass, or backprop on one example, or weight updates over 3 epochs. Pick one.

Example 3: Pythagorean theorem

The prompt

Animate the visual proof of the Pythagorean theorem in 4 steps. Step 1: draw a right triangle with legs of length 3 and 4 and hypotenuse 5, oriented with the right angle in the bottom-left, labeled a, b, and c. Step 2: draw three squares: a blue square on side a (3 by 3), a red square on side b (4 by 4), a green square on side c (5 by 5). Step 3: show the area of each square as a number floating inside it: 9, 16, 25. Step 4: animate the equation a^2 + b^2 = c^2 appearing below the figure, with 9 + 16 = 25 underneath it. Hold for 3 seconds.

The output

A clean right triangle in the center of the frame, oriented as specified. The three squares attach to each side without overlap. Each square fills with its color and the area number appears in the middle. The equations animate in below the figure, first the symbolic form, then the numeric form, in matching colors.

Runtime: 8 seconds at preview, 10 seconds at final. First-try success: 92 percent. This is one of the most reliable prompts because every entity is named and the geometric layout is constrained.

What works

Naming the squares. "Square on side a", "square on side b", "square on side c" tells the model where to draw each square. Without those names, the model often draws three floating squares that do not attach to the triangle.

Specific lengths. "3, 4, 5" is the smallest Pythagorean triple. The model knows it. The areas are nice integers (9, 16, 25) which display well.

Equation as a separate step. Drawing the formula in step 4 keeps the geometric phase clean. Adding the formula in step 1 sometimes causes the model to put the equation inside the triangle.

What fails

Show the Pythagorean theorem.

This produces, depending on the seed, just the equation a^2 + b^2 = c^2 with no figure, or a triangle with no squares, or a right triangle with arrows but no proof structure. The "visual proof" framing is what triggers the squares.

What these examples share

Read the three prompts back to back. Five things repeat.

Numbered steps. Every prompt is a list of 4 or 5 steps. The model treats numbered steps as scene transitions, which is exactly what we want.
Named entities. Every object has a name: a, b, c, x1, h1, blue square, red dot. Naming kills ambiguity.
Concrete numbers. 3-4-5 triangle, loss curve y = (x - 2)^2 + 1, weights 0.5 / 0.2 / 0.9. Real numbers pin the math.
Single concept. Each prompt animates exactly one thing. Not gradient descent and the loss landscape and the optimizer schedule. Just one.
Final hold. "Hold for 2 to 3 seconds" gives the viewer time to read the result. Without the hold instruction, the scene cuts immediately and feels unfinished.

These are the same five rules from the broader 12 prompt patterns post, grounded in three concrete examples.

Common failure modes

A short list of mistakes we see most on Madio's logs when users try to animate these topics.

3D when 2D would do. Gradient descent on a 3D paraboloid sounds cooler. It renders less well. The model often hides the path under the surface mesh.
Too many neurons. A network with 50 input nodes does not fit on screen at a readable size. Cap at about 6 per layer for a 1080p frame.
Floating-point coordinates. "Triangle with sides 3.14159 and 2.71828" is a flex that produces ugly labels. Round to 1 decimal at most.
Dynamic equations. Asking for "the equation animates as the dot moves" combines two timelines. Render the equation as a static element with the final values, not as a tracker.

Where to go next

If you want the patterns these examples use written out as a checklist, the 12 patterns post has the templates. If you want to know what happens after you click Generate, the prompt-to-MP4 pipeline post walks through the backend.

The three prompts in this article live in the templates library under "Math Foundations". Loading a template populates the editor and you can render it on Free without retyping. The output sits next to other rendered examples in the gallery. Pricing covers what you can render at each tier.

Open the editor and try one. The first one is on the house.