Try Skimpy Instead of Pandas describe() for Quick Data Summaries

Apr 10, 2025 By Alison Perry

When it comes to exploratory data analysis (EDA), the Pandas describe() function has long been the go-to tool for quick numerical summaries in Python. It's fast, it's familiar, and it's part of the core Pandas toolkit. But here’s the thing—describe() has its limitations. It tends to play favorites with numeric data, leaving non-numeric columns with minimal information and requiring extra steps for anything beyond the basics.

That’s where Skimpy steps in—a modern, intelligent, and visually enriched alternative to describe() that caters to all data types, not just numbers. Whether you’re wrangling text, dealing with categorical variables, or spotting missing data, Skimpy has your back with a comprehensive snapshot that’s clean, insightful, and presentation-ready. This post will explore why Skimpy is becoming a must-have in the EDA toolbox, how to get started with it, and what makes it a game-changer over describe(). Let’s dive in.

Why Pandas’ describe() Isn’t Always Enough?

The describe() function in Pandas provides metrics like mean, standard deviation, and percentiles—but only for numeric columns by default. You have to pass extra arguments to include other data types, and even then, the summary is very minimal.

Let’s illustrate:

import pandas as pd

records = {

"Employee": ["Anna", "Ben", "Cara", "Dan"],

"Age": [29, 34, 28, 41],

"Location": ["Miami", "Seattle", "Austin", "Denver"],

"Income": [62000, 75000, 69000, 81000]

}

df = pd.DataFrame(records)

print(df.describe())

The output will only summarize Age and Income, excluding string columns like Employee and Location.

If we explicitly include all data types:

print(df.describe(include='all'))

You’ll notice the non-numeric columns provide only basic statistics—such as unique value count, most frequent entry, and its frequency. No insight into string lengths, missing data, or distribution patterns.

Skimpy: The All-in-One Data Summary Tool

Skimpy is a Python library purpose-built to simplify and enrich the process of data summarization. With a single function—skim()—it provides a rich, unified overview of your dataset, combining numeric, categorical, and text column summaries into a single, easy-to-read output.

Let’s start by installing it.

Installation

pip install skimpy

Once installed, you can verify it by importing:

from skimpy import skim

print("Skimpy is ready to roll!")

What Makes Skimpy So Special?

Skimpy isn't just about making your data look prettier—it offers real, tangible improvements over describe() in several key areas.

1. Unified Summary for All Data Types

Unlike describe(), which prioritizes numerical data, Skimpy treats all columns with equal attention. Whether it’s numbers, strings, or categorical variables, everything is summarized in one cohesive table.

from skimpy import skim

import pandas as pd

dataset = {

"FullName": ["Anna", "Ben", "Cara", "Dan"],

"Age": [29, 34, 28, 41],

"City": ["Miami", "Seattle", "Austin", "Denver"],

"Income": [62000, 75000, 69000, 81000],

"Rating": [4.2, None, 4.7, 4.9]

}

df = pd.DataFrame(dataset)

skim(df)

This command will instantly give you a table showing everything from missing data to text lengths and mode counts.

2. Automatic Detection of Missing Data

Skimpy highlights missing data without any extra commands. For instance, in the Rating column above, the missing value is automatically quantified both as a count and a percentage.

It saves you from having to write additional lines like:

df.isna().sum()

3. Deep Statistical Insights

In addition to the usual suspects (mean, median, min, max), Skimpy also includes kurtosis and skewness for numeric columns. These advanced metrics help you understand the shape and spread of your data distributions—vital when choosing models or deciding whether normalization is needed. Outliers? Skimpy doesn’t miss a beat. It flags unusual values based on distribution statistics, giving you the heads-up before problems arise downstream.

4. Rich Summaries for Text and Categorical Data

It is where Skimpy really shines. Text columns get a detailed breakdown that includes:

Number of unique values
Most frequent value (mode) and its frequency
Average, minimum, and maximum string lengths

For example:

Column	Unique Values	Most Frequent	Mode Count	Avg Length
Name	4	Alice	1	5.25
City	4	New York	1	7.50

It helps when preparing text for NLP, ensuring consistent formats or spotting anomalies like overly short or long entries.

5. Visual Appeal and Readability

Skimpy uses color-coded tables and clean formatting, which makes summaries much easier to digest—especially for large datasets. Whether you're presenting results to stakeholders or conducting solo analysis, these visuals reduce cognitive load and improve storytelling.

6. Support for Categorical Variables

Working with demographics or survey data? Skimpy handles categorical data beautifully, showing you category distributions, mode values, and frequency proportions—all without any extra work on your part. This level of detail is critical for understanding class imbalance, designing better visualizations, or selecting encoding strategies for modeling.

Using Skimpy Effectively: A Quick Guide

Step 1: Load Your Dataset

import pandas as pd

from skimpy import skim

sample_data = {

"Customer": ["Ella", "Leo", "Nina", "Sam"],

"Age": [23, 31, 27, 36],

"Location": ["Boston", "Chicago", "Phoenix", "Dallas"],

"Spend": [500, 620, 570, 690],

"Feedback": [4.6, None, 4.9, 5.0],

}

df = pd.DataFrame(sample_data)

Step 2: Run skim()

skim(df)

Done. That’s it. You now have a full, professional-grade summary of your data.

Step 3: Customize if Needed

Want to focus only on numerical columns?

skim(df[["Age", "Spend"]])

Interested only in missing data?

skim(df)[["Column", "Missing (%)"]]

Skimpy’s output is a tidy DataFrame, so you can slice and dice it just like any other table.

Why Choose Skimpy Over describe ()?

Here’s a quick side-by-side comparison:

Feature	Pandas describe()	Skimpy
Numeric data summary	Yes	Yes
Non-numeric data summary	Limited	Comprehensive
Missing data handling	Requires extra code	Built-in support
Visual presentation	Plain text output	Clean, formatted table
Text and string analysis	Not supported	Detailed insights
Advanced stats (e.g., skewness)	Not available	Included
Unified summary for all columns	No	Yes

Conclusion

Skimpy is a practical and efficient alternative to Pandas’ describe() function, especially for users who want deeper insights at a glance. It handles all column types, displays missing values, and even includes mini histograms for quick visual analysis. Unlike Pandas, it doesn’t restrict summaries to numerical data, making it ideal for datasets with mixed types. Skimpy’s clean formatting and terminal-friendly output make it perfect for Jupyter notebooks and quick data reviews. While it doesn’t provide advanced statistics or visual plots, it serves its purpose as a fast, readable profiler.

Skimpy: A Cleaner, Faster Alternative to Pandas describe() in Python

Why Pandas’ describe() Isn’t Always Enough?

Skimpy: The All-in-One Data Summary Tool

Installation

What Makes Skimpy So Special?

1. Unified Summary for All Data Types

2. Automatic Detection of Missing Data

3. Deep Statistical Insights

4. Rich Summaries for Text and Categorical Data

5. Visual Appeal and Readability

6. Support for Categorical Variables

Using Skimpy Effectively: A Quick Guide

Step 1: Load Your Dataset

Step 2: Run skim()

Step 3: Customize if Needed

Why Choose Skimpy Over describe ()?

Conclusion

Recommended Updates

Generative Models: Unraveling the Magic of GANs and VAEs

Enhance Your WhatsApp Mobile Use with Built-In Meta AI Features

How AI is Shaping Market Analysis and Predicting Consumer Behavior

Bringing AI Home: Running Language Models Locally with Ollama

Skimpy: A Cleaner, Faster Alternative to Pandas describe() in Python

The Environmental Cost of AI: Understanding Its Carbon Footprint

Introducing Alation AI Agent SDK: Build Smarter AI Models

Learn About Major AI Agent Types Powering Automation in 2025

Transforming Business: Key Applications of Autonomous Robots in the Enterprise

Hospital IoT: How AI and Wearable Devices Are Transforming Patient Care

7 Grammar Checkers That Can Make Your Life Easier in 2025

Google Cloud AI, IBM Watson, and OpenAI: The Driving Force Behind AI APIs