Skimpy: A Cleaner, Faster Alternative to Pandas describe() in Python

Apr 10, 2025 By Alison Perry

When it comes to exploratory data analysis (EDA), the Pandas describe() function has long been the go-to tool for quick numerical summaries in Python. It's fast, it's familiar, and it's part of the core Pandas toolkit. But here’s the thing—describe() has its limitations. It tends to play favorites with numeric data, leaving non-numeric columns with minimal information and requiring extra steps for anything beyond the basics.

That’s where Skimpy steps in—a modern, intelligent, and visually enriched alternative to describe() that caters to all data types, not just numbers. Whether you’re wrangling text, dealing with categorical variables, or spotting missing data, Skimpy has your back with a comprehensive snapshot that’s clean, insightful, and presentation-ready. This post will explore why Skimpy is becoming a must-have in the EDA toolbox, how to get started with it, and what makes it a game-changer over describe(). Let’s dive in.

Why Pandas’ describe() Isn’t Always Enough?

The describe() function in Pandas provides metrics like mean, standard deviation, and percentiles—but only for numeric columns by default. You have to pass extra arguments to include other data types, and even then, the summary is very minimal.

Let’s illustrate:

import pandas as pd

records = {

"Employee": ["Anna", "Ben", "Cara", "Dan"],

"Age": [29, 34, 28, 41],

"Location": ["Miami", "Seattle", "Austin", "Denver"],

"Income": [62000, 75000, 69000, 81000]

}

df = pd.DataFrame(records)

print(df.describe())

The output will only summarize Age and Income, excluding string columns like Employee and Location.

If we explicitly include all data types:

print(df.describe(include='all'))

You’ll notice the non-numeric columns provide only basic statistics—such as unique value count, most frequent entry, and its frequency. No insight into string lengths, missing data, or distribution patterns.

Skimpy: The All-in-One Data Summary Tool

Skimpy is a Python library purpose-built to simplify and enrich the process of data summarization. With a single function—skim()—it provides a rich, unified overview of your dataset, combining numeric, categorical, and text column summaries into a single, easy-to-read output.

Let’s start by installing it.

Installation

pip install skimpy

Once installed, you can verify it by importing:

from skimpy import skim

print("Skimpy is ready to roll!")

What Makes Skimpy So Special?

Skimpy isn't just about making your data look prettier—it offers real, tangible improvements over describe() in several key areas.

1. Unified Summary for All Data Types

Unlike describe(), which prioritizes numerical data, Skimpy treats all columns with equal attention. Whether it’s numbers, strings, or categorical variables, everything is summarized in one cohesive table.

from skimpy import skim

import pandas as pd

dataset = {

"FullName": ["Anna", "Ben", "Cara", "Dan"],

"Age": [29, 34, 28, 41],

"City": ["Miami", "Seattle", "Austin", "Denver"],

"Income": [62000, 75000, 69000, 81000],

"Rating": [4.2, None, 4.7, 4.9]

}

df = pd.DataFrame(dataset)

skim(df)

This command will instantly give you a table showing everything from missing data to text lengths and mode counts.

2. Automatic Detection of Missing Data

Skimpy highlights missing data without any extra commands. For instance, in the Rating column above, the missing value is automatically quantified both as a count and a percentage.

It saves you from having to write additional lines like:

df.isna().sum()

3. Deep Statistical Insights

In addition to the usual suspects (mean, median, min, max), Skimpy also includes kurtosis and skewness for numeric columns. These advanced metrics help you understand the shape and spread of your data distributions—vital when choosing models or deciding whether normalization is needed. Outliers? Skimpy doesn’t miss a beat. It flags unusual values based on distribution statistics, giving you the heads-up before problems arise downstream.

4. Rich Summaries for Text and Categorical Data

It is where Skimpy really shines. Text columns get a detailed breakdown that includes:

  • Number of unique values
  • Most frequent value (mode) and its frequency
  • Average, minimum, and maximum string lengths

For example:

Column

Unique Values

Most Frequent

Mode Count

Avg Length

Name

4

Alice

1

5.25

City

4

New York

1

7.50

It helps when preparing text for NLP, ensuring consistent formats or spotting anomalies like overly short or long entries.

5. Visual Appeal and Readability

Skimpy uses color-coded tables and clean formatting, which makes summaries much easier to digest—especially for large datasets. Whether you're presenting results to stakeholders or conducting solo analysis, these visuals reduce cognitive load and improve storytelling.

6. Support for Categorical Variables

Working with demographics or survey data? Skimpy handles categorical data beautifully, showing you category distributions, mode values, and frequency proportions—all without any extra work on your part. This level of detail is critical for understanding class imbalance, designing better visualizations, or selecting encoding strategies for modeling.

Using Skimpy Effectively: A Quick Guide

Step 1: Load Your Dataset

import pandas as pd

from skimpy import skim

sample_data = {

"Customer": ["Ella", "Leo", "Nina", "Sam"],

"Age": [23, 31, 27, 36],

"Location": ["Boston", "Chicago", "Phoenix", "Dallas"],

"Spend": [500, 620, 570, 690],

"Feedback": [4.6, None, 4.9, 5.0],

}

df = pd.DataFrame(sample_data)

Step 2: Run skim()

skim(df)

Done. That’s it. You now have a full, professional-grade summary of your data.

Step 3: Customize if Needed

Want to focus only on numerical columns?

skim(df[["Age", "Spend"]])

Interested only in missing data?

skim(df)[["Column", "Missing (%)"]]

Skimpy’s output is a tidy DataFrame, so you can slice and dice it just like any other table.

Why Choose Skimpy Over describe ()?

Here’s a quick side-by-side comparison:

Feature

Pandas describe()

Skimpy

Numeric data summary

Yes

Yes

Non-numeric data summary

Limited

Comprehensive

Missing data handling

Requires extra code

Built-in support

Visual presentation

Plain text output

Clean, formatted table

Text and string analysis

Not supported

Detailed insights

Advanced stats (e.g., skewness)

Not available

Included

Unified summary for all columns

No

Yes

Conclusion

Skimpy is a practical and efficient alternative to Pandas’ describe() function, especially for users who want deeper insights at a glance. It handles all column types, displays missing values, and even includes mini histograms for quick visual analysis. Unlike Pandas, it doesn’t restrict summaries to numerical data, making it ideal for datasets with mixed types. Skimpy’s clean formatting and terminal-friendly output make it perfect for Jupyter notebooks and quick data reviews. While it doesn’t provide advanced statistics or visual plots, it serves its purpose as a fast, readable profiler.

Recommended Updates

Basics Theory

Generative Models: Unraveling the Magic of GANs and VAEs

Alison Perry / Apr 17, 2025

Study the key distinctions that exist between GANs and VAEs, which represent two main generative AI models.

Applications

Enhance Your WhatsApp Mobile Use with Built-In Meta AI Features

Alison Perry / Apr 10, 2025

Explore how Meta AI on WhatsApp is revolutionizing mobile use with smart chats, planning, creativity, and translation.

Applications

How AI is Shaping Market Analysis and Predicting Consumer Behavior

Tessa Rodriguez / Apr 18, 2025

Unlock the potential of AI for market analysis to understand customer needs, predict future trends, and drive smarter business decisions with accurate consumer behavior prediction

Applications

Bringing AI Home: Running Language Models Locally with Ollama

Tessa Rodriguez / Apr 21, 2025

Want to run AI without the cloud? Learn how to run LLM models locally with Ollama—an easy, fast, and private solution for deploying language models directly on your machine

Applications

Skimpy: A Cleaner, Faster Alternative to Pandas describe() in Python

Alison Perry / Apr 10, 2025

Explore Skimpy, a fast and readable tool that outperforms Pandas describe() in summarizing all data types in Python.

Applications

The Environmental Cost of AI: Understanding Its Carbon Footprint

Alison Perry / Apr 20, 2025

How AI’s environmental impact is shaping our world. Learn about the carbon footprint of AI systems, the role of data centers, and how to move toward sustainable AI practices

Applications

Introducing Alation AI Agent SDK: Build Smarter AI Models

Alison Perry / Apr 18, 2025

Master the Alation Agentic Platform with the API Agent SDK capabilities, knowing the advantages and projected impact.

Applications

Learn About Major AI Agent Types Powering Automation in 2025

Tessa Rodriguez / Apr 13, 2025

Learn about the main types of AI agents in 2025 and how they enable smart, autonomous decision-making systems.

Applications

Transforming Business: Key Applications of Autonomous Robots in the Enterprise

Alison Perry / Apr 23, 2025

Discover how autonomous robots can boost enterprise efficiency through logistics, automation, and smart workplace solutions

Applications

Hospital IoT: How AI and Wearable Devices Are Transforming Patient Care

Alison Perry / Apr 24, 2025

Discover how hospital IoT, wearable health trackers, and AI‑powered patient monitoring improve healthcare services today

Applications

7 Grammar Checkers That Can Make Your Life Easier in 2025

Alison Perry / Apr 10, 2025

Discover these 7 AI powered grammar checkers that can help you avoid unnecessary mistakes in your writing.

Applications

Google Cloud AI, IBM Watson, and OpenAI: The Driving Force Behind AI APIs

Alison Perry / Apr 20, 2025

How AI APIs from Google Cloud AI, IBM Watson, and OpenAI are helping businesses build smart applications, automate tasks, and improve customer experiences