Skip to main content

Introduction to Python Zipf Distribution

Python Zipf Distribution

The Zipf distribution is characterized by a single parameter called the exponent (s), which controls the shape of the distribution. The exponent determines the rate at which the frequencies of elements decrease.

The probability mass function (PMF) of the Zipf distribution is given by:

P(k) = (1 / (k^s * H(N, s)))

Where k is the rank of the element, N is the total number of elements, and H(N, s) is the generalized harmonic number.

In Python, you can work with the Zipf distribution using the scipy.stats.zipf module from the SciPy library. This module provides various functions to analyze and generate random numbers from a Zipf distribution.

As an example:

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import zipf

# Define the exponent
s = 2

# Generate random numbers from a Zipf distribution
size = 1000
random_numbers = zipf.rvs(s, size=size)

# Plot the histogram of the generated random numbers
plt.hist(random_numbers, bins='auto', density=True)
plt.xlabel('Rank')
plt.ylabel('Frequency')
plt.title('Zipf Distribution (s={})'.format(s))
plt.show()

In this example:

  • zipf.rvs(s, size=size) generates an array of 1000 random numbers from a Zipf distribution with an exponent of 2.
  • The resulting array random_numbers will contain the generated random numbers.
  • The code then plots the histogram of the generated random numbers to visualize the frequency of occurrence.
tip

It's worth noting that the Zipf distribution is often used to model phenomena that exhibit a skewed distribution of frequencies, such as word frequencies in natural language, popularity of websites, and city populations.