How to Code a Neural Network from Scratch (Step-by-Step Guide)

Neural networks are at the core of modern artificial intelligence, powering applications such as image recognition, natural language processing, and recommendation systems. While many developers rely on high-level libraries, understanding how a neural network works under the hood is a major advantage for any software engineer.


In this guide, you’ll learn how to code a simple neural network from scratch, understand the math behind it, and see how training actually works.


What Is a Neural Network?

A neural network is a computational model inspired by the human brain. It consists of layers of interconnected units called neurons, where each neuron:

  1. Receives inputs
  2. Applies weights
  3. Adds a bias
  4. Passes the result through an activation function

The goal is to learn weights and biases that minimize prediction error.


Core Components of a Neural Network

Before writing code, let’s understand the building blocks.

1. Neurons and Weights

Each neuron computes:

z = (w1 * x1) + (w2 * x2) + ... + b

Where:

  • x = input
  • w = weight
  • b = bias

2. Activation Function

The activation function introduces non-linearity.

Common examples:

  • Sigmoid
  • ReLU
  • Tanh

We’ll use Sigmoid for simplicity:

sigmoid(x) = 1 / (1 + e^(-x))

3. Loss Function

The loss function measures how wrong the prediction is.

For simple regression:

Mean Squared Error (MSE)

4. Backpropagation

Backpropagation adjusts weights by computing gradients and minimizing loss using gradient descent.


Step 1: Define the Neural Network Structure

We’ll build a single hidden-layer neural network using Python and NumPy.

import numpy as np

Step 2: Activation Function

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

Step 3: Initialize the Network

class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.weights_input_hidden = np.random.rand(input_size, hidden_size)
        self.weights_hidden_output = np.random.rand(hidden_size, output_size)

        self.bias_hidden = np.random.rand(hidden_size)
        self.bias_output = np.random.rand(output_size)

Step 4: Forward Propagation

    def forward(self, X):
        self.hidden_input = np.dot(X, self.weights_input_hidden) + self.bias_hidden
        self.hidden_output = sigmoid(self.hidden_input)

        self.output_input = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output
        self.output = sigmoid(self.output_input)

        return self.output

Step 5: Backpropagation

    def backward(self, X, y, learning_rate):
        output_error = y - self.output
        output_delta = output_error * sigmoid_derivative(self.output)

        hidden_error = output_delta.dot(self.weights_hidden_output.T)
        hidden_delta = hidden_error * sigmoid_derivative(self.hidden_output)

        self.weights_hidden_output += self.hidden_output.T.dot(output_delta) * learning_rate
        self.weights_input_hidden += X.T.dot(hidden_delta) * learning_rate

        self.bias_output += np.sum(output_delta, axis=0) * learning_rate
        self.bias_hidden += np.sum(hidden_delta, axis=0) * learning_rate

Step 6: Train the Neural Network

    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            self.forward(X)
            self.backward(X, y, learning_rate)

            if epoch % 1000 == 0:
                loss = np.mean(np.square(y - self.output))
                print(f"Epoch {epoch}, Loss: {loss}")

Step 7: Test with Sample Data

X = np.array([[0,0],
              [0,1],
              [1,0],
              [1,1]])

y = np.array([[0], [1], [1], [0]])

nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
nn.train(X, y, epochs=10000, learning_rate=0.1)

print(nn.forward(X))

This example trains the network to solve the XOR problem, a classic test that simple linear models cannot solve.


What You Just Built

✔ A neural network from scratch
✔ Forward propagation
✔ Backpropagation
✔ Gradient descent training
✔ Non-linear decision making

Understanding this gives you deep insight into how modern AI frameworks work internally.


How This Scales in Real Applications

In real-world systems:

  • Libraries like TensorFlow and PyTorch handle gradients automatically
  • GPUs accelerate matrix operations
  • Networks grow deeper and wider
  • Regularization prevents overfitting

But the core logic remains exactly what you coded above.


Final Thoughts

Coding a neural network from scratch is one of the best ways to understand AI beyond buzzwords. Even if you later rely on high-level frameworks, this knowledge helps you:

  • Debug AI models more effectively
  • Design better architectures
  • Integrate AI into production systems confidently

If you’re a software engineer looking to move deeper into AI, this foundational understanding is non-negotiable.


TensorFlow Version (Simple Example)

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    Dense(4, activation='sigmoid', input_shape=(2,)),
    Dense(1, activation='sigmoid')
])

model.compile(
    optimizer='adam',
    loss='mean_squared_error'
)

X = [[0,0], [0,1], [1,0], [1,1]]
y = [[0], [1], [1], [0]]

model.fit(X, y, epochs=500, verbose=0)

print(model.predict(X))

When to Use TensorFlow

  • Production-grade systems
  • Mobile / edge deployment
  • Large-scale training

PyTorch Version (Simple Example)

import torch
import torch.nn as nn
import torch.optim as optim

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(2, 4)
        self.fc2 = nn.Linear(4, 1)

    def forward(self, x):
        x = torch.sigmoid(self.fc1(x))
        x = torch.sigmoid(self.fc2(x))
        return x

model = NeuralNet()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)

X = torch.tensor([[0.,0.],[0.,1.],[1.,0.],[1.,1.]])
y = torch.tensor([[0.],[1.],[1.],[0.]])

for _ in range(5000):
    optimizer.zero_grad()
    output = model(X)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()

print(model(X))

When to Use PyTorch

  • Research & experimentation
  • Custom model architectures
  • Full control over training loops

Frequently Asked Questions

What is the easiest way to code a neural network?

The easiest way is using libraries like TensorFlow or PyTorch, but coding one from scratch helps you understand how backpropagation, weights, and gradients actually work.

Do I need advanced math to build a neural network?

Basic linear algebra and calculus concepts are enough to start. Libraries handle most mathematical complexity in real-world applications.

Is Python required to code neural networks?

Python is the most common language due to its ecosystem, but neural networks can also be implemented in C++, Java, and JavaScript.

What is the difference between machine learning and neural networks?

Neural networks are a subset of machine learning models designed to learn complex non-linear patterns.

When should I use TensorFlow or PyTorch instead of scratch code?

Use scratch implementations for learning. Use TensorFlow or PyTorch for production systems, scalability, and performance.

Can neural networks be used in backend applications?

Yes. Neural networks are commonly integrated into APIs, microservices, and data pipelines.

Is neural network coding relevant for software engineers?

Absolutely. Understanding neural networks improves problem-solving skills, system design, and AI integration capabilities.