Handling Missing Data | Shreni Singh

Welcome back to the Data Science Series.
Handling Missing Data is one of the essential steps in preparing any real-world dataset for analysis. In Pandas, missing values usually appear because information was not collected, entered incorrectly, or lost during processing. These gaps are typically represented as:

None: the Python null object, mostly seen in object or string columns

NaN: a special floating-point marker from NumPy that represents missing numeric values

Before analyzing or modeling data, we must identify and handle these missing values correctly.

Detecting Missing Data in Pandas

Pandas offers simple and effective functions to locate missing entries.

1. Using isnull()

isnull() returns a DataFrame of True (missing) and False (present) values.

Example:

import pandas as pd
df = pd.DataFrame({"name": ["A","B",None], "age": [25,None,30]})
df.isnull()

2. Using isna()

isna() works exactly like isnull() and is often used interchangeably.

df.isna()

3. Using notnull()

notnull() flips the logic and indicates only the valid, non-missing values.

df.notnull()

You can also count missing values quickly:

df.isna().sum()

Filling Missing Values

Instead of removing missing entries, we can replace them using strategies that preserve data integrity.

1. Using fillna()

fillna() replaces missing values with constants or computed values.

df.fillna({"name": "Unknown", "age": 0})

2. Using replace()

replace() can substitute specific missing patterns such as None, NaN, or other placeholders.

df.replace(to_replace=None, value="Missing")

3. Using interpolate()

interpolate() estimates missing data based on surrounding values—useful for numeric or time-series data.

df["age"].interpolate(method="linear")

Dropping Missing Values

Sometimes missing values are too extensive or unreliable, and removing them becomes the best option.

1. Dropping rows with at least one missing value

df.dropna()

2. Dropping rows where all values are missing

df.dropna(how="all")

3. Dropping columns with missing values

df.dropna(axis=1)

4. Dropping missing values directly when loading CSV

df = pd.read_csv("data.csv").dropna()

Final Thoughts

Missing values are unavoidable, but Pandas provides clear tools to detect, fill, or remove them. Understanding when to impute values and when to drop them helps ensure your dataset is clean, consistent, and ready for deeper analysis or machine learning.

If you have questions or want help, feel free to reach out. I am always happy to support your learning journey.