what is white noise in stationary time series ? part 1: introduction

Sirine Amrane
7 min readFeb 9, 2025

--

white noise is unique among time series because it is completely random and lacks any structure. unlike other time series that may exhibit trends, cycles, or autocorrelation, white noise is unpredictable and chaotic.

what distinguishes white noise from other time series?

a. stationary series

  • we need to determine whether the series is white noise or not.

b. no exploitable structure

  • no trend (no consistent increase or decrease).
  • no seasonality (no repeating pattern over time).
  • no autocorrelation (each point is independent of previous values).

c. constant variance

  • the dispersion of values does not change over time.

d. constant mean

  • for classical white noise, the mean is zero.
  • for generalized white noise, the mean can be a nonzero constant.

e. impossible to predict

  • any attempt to model white noise will result in purely random residuals.

why is this important?

white noise is the baseline reference in time series analysis:

  • if a time series behaves like white noise, it is unpredictable.
  • if a model’s residuals resemble white noise, the model is well-fitted (since the remaining part is purely random).
  • if a time series is almost white noise but shows a slight structure, there may be a small opportunity for modeling (example: arbitrage in finance or anomaly detection in cybersecurity).

how to determine if a sequence is white noise?

there are several statistical tests and visual methods to check if a time series is white noise.
for a time series to be considered white noise, it must satisfy all of these conditions simultaneously.

1️⃣ check for autocorrelation: absence of time dependence

if a sequence is white noise, then there should be no correlation between successive values.

test: autocorrelation function (acf)

  • idea: if values are independent, the acf plot should show only values close to zero (indicating no time dependence).
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf

# Générer un bruit blanc
np.random.seed(42)
bruit_blanc = np.random.normal(loc=0, scale=1, size=100)

# Tracé de l'ACF
plot_acf(bruit_blanc, lags=20)
plt.show()

interpretation:

  • if all acf bars are close to zero (within the confidence interval) → the series is white noise (no exploitable pattern).
  • if some acf bars are significantly different from zero → there is an exploitable structure in the data.

2️⃣ check for stationarity: presence of stationarity

white noise is stationary: its mean and variance must remain constant over time.

test: augmented dickey-fuller (adf)

  • null hypothesis (h0): the series is not stationary.
  • alternative hypothesis (h1): the series is stationary (so, potentially white noise).
from statsmodels.tsa.stattools import adfuller
# Test ADF
result = adfuller(bruit_blanc)
print(f"p-value : {result[1]}")
if result[1] < 0.05:
print("La série est stationnaire (on rejette H0)")
else:
print("La série n'est pas stationnaire (on ne peut pas conclure)")

interpretation :

  • p-value < 0.05 → reject h0 → the series is stationary.
  • p-value > 0.05 → fail to reject h0 → the series is not stationary

3️⃣ check if the residuals of a model are white noise: absence of exploitable structure

if we apply a statistical model (e.g., arima) and there is still structure in the residuals, then the residuals are not white noise.

test: ljung-box

  • null hypothesis (h0): the series is white noise.
  • alternative hypothesis (h1): the series is not white noise.
from statsmodels.stats.diagnostic import acorr_ljungbox
# Test de Ljung-Box
ljung_box_result = acorr_ljungbox(bruit_blanc, lags=[10], return_df=True)
print(ljung_box_result)

interpretation :

  • p-value > 0.05 → fail to reject h0 → the series is white noise.
  • p-value < 0.05 → reject h0 → there is an exploitable structure.

4️⃣ check for normal distribution (for gaussian white noise) (plot and histogram)

  • plot the series → it should be chaotic, oscillating around zero with no visible structure.
  • plot a histogram of the values → it should resemble a normal distribution.
import seaborn as sns
# Tracé de la série
plt.figure(figsize=(10, 5))
plt.plot(bruit_blanc, marker='o', linestyle='-', alpha=0.7)
plt.axhline(y=0, color='red', linestyle='--')
plt.title("Série Temporelle")
plt.show()
# Histogramme des valeurs
sns.histplot(bruit_blanc, bins=20, kde=True)
plt.title("Distribution des Valeurs")
plt.show()

interpretation

the returns seem to fluctuate randomly around zero, which is a characteristic of white noise.

key observations:

  • if the series follows no discernible pattern → likely white noise.
  • if the histogram resembles a normal distribution centered at zero → likely white noise.

summary table

common example of white noise in quantitative finance

one of the most common examples of white noise in quantitative finance is the return of assets in a perfectly efficient market.

example: daily returns of a stock

in a perfectly efficient market (efficient market hypothesis — emh), the returns of a stock follow white noise because all available information is already reflected in the price.

in other words, the price movements of an asset are purely random and contain no exploitable trend.

if returns RtR_tRt​ are independent and identically distributed (iid), then they form white noise.

real-world example: daily returns of the s&p 500

let’s analyze the daily returns of the s&p 500 to check whether they exhibit white noise behavior.

day    |closing Price| return (%)  
-------------------------------------
1 | 4500.12 | 0.23
2 | 4512.45 | -0.34
3 | 4498.76 | 0.45
4 | 4520.10 | -0.12
5 | 4531.78 | 0.09
6 | 4545.62 | -0.28
7 | 4529.50 | 0.14
8 | 4535.12 | -0.37
9 | 4508.30 | 0.41
10 | 4520.50 | -0.22

here, the returns seem to fluctuate randomly around zero, which is a characteristic of white noise.

why it matters and its limitations: deviations from white noise

if an asset’s returns are pure white noise, then no model can predict them.
but in reality, markets are not always perfectly efficient! sometimes, deviations from white noise occur, and quants try to exploit them for profit. examples of such deviations include market inefficiencies, arbitrage opportunities, and hidden factors.

counter-example: deviations from white noise with a demonstration

hybrid strategy: white noise tests + machine learning

  1. step 1 → test whether the series is white noise (adf, ljung-box, acf).
  2. step 2 → if it is not white noise, use machine learning to exploit deviations and uncover potential structure.

objective:

testing returns with classical methods (adf, ljung-box, normality tests)
this verifies whether returns are random (white noise) or if they exhibit exploitable signals (trends, statistical anomalies).

if a deviation is detected, use machine learning (random forest, xgboost, lstm) to find an exploitable strategy
this step aims to identify the best signals and predict future price movements.

automatic optimization of advanced feature engineering to extract the best indicators that capture hidden patterns

  • automatically creates features from historical returns
  • adds them to the dataset for ml model training
  • avoids irrelevant features and selects the most predictive ones to prevent overfitting
  • examples of indicators: lag1, lag2, sma_5, volatility, etc.

automatic optimization of models

  • tests multiple models, compares them, and selects the best-performing one

coming up in the next article: reinforcement learning integration in our script.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.stattools import adfuller
from statsmodels.stats.diagnostic import acorr_ljungbox
from scipy.stats import jarque_bera
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Load data without column headers
file_path = "synthetic_returns.csv"
df = pd.read_csv(file_path, header=None, names=["Return"])

# Generate a dummy date column
df["Date"] = pd.date_range(start="2024-01-01", periods=len(df), freq="D")

# Extract returns
returns = df["Return"].values

# Statistical tests for white noise detection
adf_pvalue = adfuller(returns)[1]
ljung_pvalue = acorr_ljungbox(returns, lags=[10], return_df=True)['lb_pvalue'].values[0]
jb_pvalue = jarque_bera(returns)[1]

# Print test results
print("\n White Noise Test Results:")
print(f"ADF Test p-value: {adf_pvalue}")
print(f"Ljung-Box Test p-value: {ljung_pvalue}")
print(f"Jarque-Bera Test p-value: {jb_pvalue}")

# Check if the data deviates from white noise
if ljung_pvalue < 0.05 or adf_pvalue > 0.05:
print("\n Deviation detected! Applying advanced ML models.")

# Feature engineering
df["Lag1"] = df["Return"].shift(1)
df["Lag2"] = df["Return"].shift(2)
df["SMA_5"] = df["Return"].rolling(window=5).mean()
df["Volatility"] = df["Return"].rolling(window=5).std()
df.dropna(inplace=True)

# Create target variable (1 if next return is positive, 0 otherwise)
df["Target"] = (df["Return"].shift(-1) > 0).astype(int)

# Split data
X = df[["Return", "Lag1", "Lag2", "SMA_5", "Volatility"]]
y = df["Target"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define ML models
models = {
"Random Forest": RandomForestClassifier(n_estimators=100, random_state=42),
"XGBoost": XGBClassifier(use_label_encoder=False, eval_metric='logloss'),
"SVM": SVC(kernel="rbf", C=1.0, gamma="scale")
}

results = {}
for name, model in models.items():
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
results[name] = accuracy

# LSTM for deep learning
X_train_LSTM = np.expand_dims(X_train.values, axis=1)
X_test_LSTM = np.expand_dims(X_test.values, axis=1)

lstm_model = Sequential([
LSTM(50, activation="relu", input_shape=(1, X_train.shape[1])),
Dense(1, activation="sigmoid")
])
lstm_model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
lstm_model.fit(X_train_LSTM, y_train, epochs=10, batch_size=16, verbose=0)
lstm_accuracy = lstm_model.evaluate(X_test_LSTM, y_test, verbose=0)[1]
results["LSTM"] = lstm_accuracy

# Display results
results_df = pd.DataFrame.from_dict(results, orient="index", columns=["Accuracy"])
print("\n ML Model Accuracy:\n", results_df)

# Plot predictions
plt.figure(figsize=(12, 5))
plt.plot(df["Date"].iloc[-len(y_test):], y_test, label="Actual", color="blue")
plt.plot(df["Date"].iloc[-len(y_test):], y_pred, label="Predictions (Random Forest)", linestyle="dashed", color="red")
plt.title("Predictions vs Actual")
plt.legend()
plt.show()

else:
print("\n No exploitable signal: The series appears to be pure white noise.")

interpretation:

  • adf test = 0 → this means the time series is stationary (it does not have a unit root)
  • ljung-box test p-value = 0.825 (high) → there is no significant autocorrelation. this indicates that returns are not autocorrelated, meaning there is no exploitable trend or mean reversion — no predictable pattern in the data
  • jarque-bera test p-value = 0.684 (high) → the returns follow a normal distribution, meaning no heavy tails or anomalies. this suggests no extreme market moves or exploitable irregularities

conclusion:

  • our data does not show any learnable patterns for an ml model
  • any trading strategy based on this data will not be profitable, as the returns behave unpredictably (random walk)
  • this is pure white noise, which is expected given that we only have one feature (a single column of returns). additionally, daily returns often resemble white noise

solution:

  1. use higher-frequency dataminute-level or hourly data may reveal more exploitable patterns
  2. add more features → increase data complexity by incorporating volume, rsi (relative strength index), moving averages, or other technical indicators
  3. model volatility instead of returns → instead of predicting returns, test a volatility model such as garch (generalized autoregressive conditional heteroskedasticity), which could capture patterns in market fluctuations

Sirine Amrane

--

--

Sirine Amrane
Sirine Amrane

No responses yet