Skip to main content

Using the value_counts() Method in Pandas

In this tutorial, we'll explore the value_counts() method in Pandas, a powerful tool for counting the occurrences of values within a Series in a DataFrame. We'll cover its basic usage, as well as various arguments you can use to customize its behavior.

1. Introduction to value_counts()

The value_counts() method in Pandas is used to count the occurrences of unique values in a Series. It's particularly useful when you want to analyze the distribution of categorical data within a DataFrame.

1.1 Syntax:

Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

1.2 Parameters:

  • normalize: If True, returns the relative frequencies of the unique values.
  • sort: If True, sorts the counts in descending order by default.
  • ascending: If True, sorts in ascending order.
  • bins: Specifies the number of equal-width bins to divide the data into (only works with numerical data).
  • dropna: If False, does not drop NaN values from the result.

Now, let's dive into each parameter and see how it influences the behavior of value_counts().

2. Basic Usage

2.1 Example Data

Let's assume we have a DataFrame containing information about the top richest people in the world. We'll focus on the 'industry' column for our examples.

import pandas as pd

# Assuming 'df' is our DataFrame containing the data
industry_counts = df['industry'].value_counts().sort_index()
print(industry_counts)

Output:

Fashion and retail    18
Finance 13
Technology 15
Name: industry, dtype: int64

3. Understanding Parameters

3.1 normalize

Setting normalize=True returns the relative frequencies of unique values instead of counts.

industry_normalized = df['industry'].value_counts(normalize=True)
print(industry_normalized)

Output:

Fashion and retail    0.45
Finance 0.325
Technology 0.225
Name: industry, dtype: float64

3.2 sort and ascending

By default, sort=True sorts the values in descending order. You can change the sorting order using the ascending parameter.

industry_sorted = df['industry'].value_counts(sort=True, ascending=False)
print(industry_sorted)

Output:

Fashion and retail    18
Technology 15
Finance 13
Name: industry, dtype: int64

3.3 bins

The bins parameter divides numerical data into equal-width intervals and counts the occurrences within each interval. This is useful for analyzing distributions.

age_bins = df['age'].value_counts(bins=5)
print(age_bins)

Output:

(29.9, 42.8]    5
(42.8, 55.6] 17
... ...
Name: age, dtype: int64

3.4 dropna

By default, dropna=True excludes NaN values from the result. Set it to False if you want to include NaN values.

industry_with_na = df['industry'].value_counts(dropna=False)
print(industry_with_na)

Output:

Fashion and retail    18
Finance 13
Technology 15
NaN 3
Name: industry, dtype: int64