Missing values can derail your analysis. In pandas, you can use the .dropna()
method to remove rows or columns containing null values—in other words, missing data—so you can work with clean DataFrames. In this tutorial, you’ll learn how this method’s parameters let you control exactly which data gets removed. As you’ll see, these parameters give you fine-grained control over how much of your data to clean.
Dealing with null values is essential for keeping datasets clean and avoiding the issues they can cause. Missing entries can lead to misinterpreted column data types, inaccurate conclusions, and errors in calculations. Simply put, nulls can cause havoc if they find their way into your calculations.
By the end of this tutorial, you’ll understand that:
- You can use
.dropna()
to remove rows and columns from a pandas DataFrame. - You can remove rows and columns based on the content of a subset of your DataFrame.
- You can remove rows and columns based on the volume of null values within your DataFrame.
To get the most out of this tutorial, it’s recommended that you already have a basic understanding of how to create pandas DataFrames from files.
You’ll use the Python REPL along with a file named sales_data_with_missing_values.csv
, which contains several null values you’ll deal with during the exercises. Before you start, extract this file from the downloadable materials by clicking the link at the end of this section.
The sales_data_with_missing_values.csv
file is based on the publicly available and complete sales data file from Kaggle. Understanding the file’s content isn’t essential for this tutorial, but you can explore the Kaggle link above for more details if you’d like.
You’ll also need to install both the pandas and PyArrow libraries to make sure all code examples work in your environment:
It’s time to refine your pandas skills by learning how to handle missing data in a variety of ways.
You’ll find all code examples and the sales_data_with_missing_values.csv
file in the materials for this tutorial, which you can download by clicking the link below:
Get Your Code: Click here to download the free sample code that you’ll use to learn how to drop null values in pandas.
Take the Quiz: Test your knowledge with our interactive “How to Drop Null Values in pandas” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
How to Drop Null Values in pandasQuiz yourself on pandas .dropna(): remove nulls, clean missing data, and prepare DataFrames for accurate analysis.
How to Drop Rows Containing Null Values in pandas
Before you start dropping rows, it’s helpful to know what options .dropna()
gives you. This method supports six parameters that let you control exactly what’s removed:
axis
: Specifies whether to remove rows or columns containing null values.thresh
andhow
: Define how many missing values to remove or retain.subset
: Limits the removal of null values to specific parts of your DataFrame.inplace
: Determines whether the operation modifies the original DataFrame or returns a new copy.ignore_index
: Resets the DataFrame index after removing rows.
Don’t worry if any of these parameters don’t make sense to you just yet—you’ll learn why each is used during this tutorial. You’ll also get the chance to practice your skills.
Note: Although this tutorial teaches you how pandas DataFrames use .dropna()
, DataFrames aren’t the only pandas objects that use it.
Series objects also have their own .dropna()
method. However, the Series version contains only four parameters—axis
, inplace
, how
, and ignore_index
—instead of the six supported by the DataFrame version. Of these, only inplace
and ignore_index
are used, and they work the same way as in the DataFrame method. The rest are kept for compatibility with DataFrame, but have no effect.
Indexes also have a .dropna()
method for removing missing index values, and it contains just one parameter: how
.
Before using .dropna()
to drop rows, you should first find out whether your data contains any null values:
>>> import pandas as pd
>>> pd.set_option("display.max_columns", None)
>>> sales_data = pd.read_csv(
... "sales_data_with_missing_values.csv",
... parse_dates=["order_date"],
... date_format="%d/%m/%Y",
... ).convert_dtypes(dtype_backend="pyarrow")
>>> sales_data
order_number order_date customer_name \
0 <NA> 2025-02-09 00:00:00 Skipton Fealty
1 70041 <NA> Carmine Priestnall
2 70042 2025-02-09 00:00:00 <NA>
3 70043 2025-02-10 00:00:00 Lanni D'Ambrogi
4 70044 2025-02-10 00:00:00 Tann Angear
5 70045 2025-02-10 00:00:00 Skipton Fealty
6 70046 2025-02-11 00:00:00 Far Pow
7 70047 2025-02-11 00:00:00 Hill Group
8 70048 2025-02-11 00:00:00 Devlin Nock
9 <NA> <NA> <NA>
10 70049 2025-02-12 00:00:00 Swift Inc
product_purchased discount sale_price
0 Chili Extra Virgin Olive Oil True 135.0
1 <NA> <NA> 150.0
2 Rosemary Olive Oil Candle False 78.0
3 <NA> True 19.5
4 Vanilla and Olive Oil Candle <NA> 13.98
5 Basil Extra Virgin Olive Oil True <NA>
6 Chili Extra Virgin Olive Oil False 150.0
7 Chili Extra Virgin Olive Oil True 135.0
8 Lavender and Olive Oil Lotion False 39.96
9 <NA> <NA> <NA>
10 Garlic Extra Virgin Olive Oil True 936.0
To make sure all columns appear on your screen, you configure pd.set_option("display.max_columns", None)
. By passing None
as the second parameter, you make sure all columns are displayed.
You read the sales_data_with_missing_values.csv
file into a DataFrame using the pandas read_csv()
function, then view the data. The order dates are in the "%d/%m/%Y"
format in the file, so to make sure the order_date
data is read correctly, you use both the parse_dates
and date_format
parameters. The output reveals there are ten rows and six columns of data in your file.
Note: By default, pandas uses the NumPy library for its back-end data types. In the future—starting with pandas 3, which is under development—the default back end will be the more efficient PyArrow. You can still get the benefits of this new back end and its data types by configuring pandas using the .convert_dtypes(dtype_backend="pyarrow")
method on your DataFrame.
Using PyArrow also means the null values are displayed consistently as <NA>
. If you use the default NumPy back end, then the null value in order_date
would display as NaT
, for not a time, while the rest would be shown as NaN
, for not a number. Fortunately, .dropna()
treats them all as null values.
In a real data analysis, your DataFrame may be too large to allow you to see everything that’s missing. To solve this, you use the DataFrame’s .isna()
and .sum()
methods together:
>>> sales_data.isna().sum()
order_number 2
order_date 2
customer_name 2
product_purchased 3
discount 3
sale_price 2
dtype: int64
As you can see, each of your columns is missing data.
When you use sales_data.isna()
, you create a Boolean DataFrame the same size as sales_data
, but with null values replaced with True
and everything else replaced with False
. The .sum()
method counts each True
and returns it to you as shown.
Note: You may see .isnull()
being used. This is an alias for .isna()
, and both work the same way. In addition, pandas provides .notna()
and .notnull()
, which you can use to reveal how many values in each column don’t contain nulls.
To remove the rows with null values, you use .dropna()
:
>>> sales_data.dropna()
order_number order_date customer_name \
6 70046 2025-02-11 00:00:00 Far Pow
7 70047 2025-02-11 00:00:00 Hill Group
8 70048 2025-02-11 00:00:00 Devlin Nock
10 70049 2025-02-12 00:00:00 Swift Inc
product_purchased discount sale_price
6 Chili Extra Virgin Olive Oil False 150.0
7 Chili Extra Virgin Olive Oil True 135.0
8 Lavender and Olive Oil Lotion False 39.96
10 Garlic Extra Virgin Olive Oil True 936.0
As you can see, rows with null values are nowhere to be seen.
You might think the rows containing the null values from sales_data
have been deleted. After all, that’s what the output shows. However, this isn’t true. When you use sales_data.dropna()
, the results are placed into a new DataFrame, leaving the original unchanged. If you want to retain this second DataFrame, then you need to assign the output from .dropna()
to a new variable:
>>> clean_sales_data = sales_data.dropna()
Now the clean_sales_data
DataFrame contains a second copy of the original sales_data
DataFrame, but without those rows that contained one or more null values. Of course, this is wasteful of memory, so a better option would be to update the original DataFrame. To do this, you pass inplace=True
to .dropna()
:
>>> sales_data.dropna(inplace=True)
The original sales_data
DataFrame will no longer contain rows with null values, and any future analysis will occur against this new version.
Note: Even though inplace=True
allows you to save memory, it offers little performance benefit during processing because a temporary copy is still made in the background. Also, applying some of your processing to your original DataFrame and some to a second copy can make finding errors difficult. A pandas enhancement proposal (PDEP) calling for a deprecation was created because of these issues.
Setting inplace=True
is most useful when you want to make a one-time permanent change to your data and then use that version in all future analysis.
You’ve now seen the basic use case for .dropna()
, where you drop rows with at least one missing value. But the method also includes several other parameters worth exploring.
How to Drop Columns Containing Null Values in pandas
So far, you’ve seen how to drop rows containing missing values, but it’s also possible to drop incomplete columns. You do this using the axis
parameter. By default, axis
is set to 0
, or "index"
, which means it operates on rows of data. If you want to apply it across columns, then you set axis
to 1
or "columns"
.
Suppose you want to remove incomplete columns containing null values from the original sales_data
. Here’s how you do it:
>>> import pandas as pd
>>> sales_data = pd.read_csv(
... "sales_data_with_missing_values.csv",
... parse_dates=["order_date"],
... date_format="%d/%m/%Y",
... ).convert_dtypes(dtype_backend="pyarrow")
>>> sales_data.dropna(axis="columns")
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Because every column has at least one missing value, they’ve all been removed, leaving the DataFrame empty except for its index.
So far, you’ve taken a fairly aggressive approach by removing entire rows and columns. Next, you’ll look at some less destructive alternatives.
How to Work With a Part of Your Data
Earlier, you used .dropna()
to remove all null values from your DataFrame. While this is a common use case, you can also remove nulls from specific parts of the DataFrame or based on a certain quantity threshold. You’ll learn how in this section.
Removing Data Based on Specific Rows or Columns
You can restrict removal of data based on the rows or columns where the null values appear. This is where the subset
parameter comes in handy.
By passing a column label or a sequence of labels to subset
, you tell .dropna()
which columns to check for null values and remove their associated rows. Similarly, by passing subset
an index position or a sequence of index positions and setting axis="columns"
, you remove the column or columns that contain null values at the specified index positions.
For example, suppose you want to remove rows containing null values in either the discount
or sale_price columns
. If there are null values in other columns besides discount
or sale_price
, then those rows will remain untouched. Here’s how you do it:
>>> import pandas as pd
>>> pd.set_option("display.max_columns", None)
>>> sales_data = pd.read_csv(
... "sales_data_with_missing_values.csv",
... parse_dates=["order_date"],
... date_format="%d/%m/%Y",
... ).convert_dtypes(dtype_backend="pyarrow")
>>> sales_data.dropna(axis=0, subset=(["discount", "sale_price"]))
order_number order_date customer_name \
0 <NA> 2025-02-09 00:00:00 Skipton Fealty
2 70042 2025-02-09 00:00:00 <NA>
3 70043 2025-02-10 00:00:00 Lanni D'Ambrogi
6 70046 2025-02-11 00:00:00 Far Pow
7 70047 2025-02-11 00:00:00 Hill Group
8 70048 2025-02-11 00:00:00 Devlin Nock
10 70049 2025-02-12 00:00:00 Swift Inc
product_purchased discount sale_price
0 Chili Extra Virgin Olive Oil True 135.0
2 Rosemary Olive Oil Candle False 78.0
3 <NA> True 19.5
6 Chili Extra Virgin Olive Oil False 150.0
7 Chili Extra Virgin Olive Oil True 135.0
8 Lavender and Olive Oil Lotion False 39.96
10 Garlic Extra Virgin Olive Oil True 936.0
This time, you’ve removed four rows. To clarify that you’re removing rows, you explicitly set axis
to 0
. However, you can omit this argument since 0
is the default value for the axis
parameter. To restrict your analysis to the two columns you want, you pass the Python list ["discount", "sale_price"]
as the subset
parameter.
Removing Data Based on the Quantity of Missing Values
It’s possible to instruct .dropna()
to remove rows or columns based on a specified quantity of data. Suppose you’re concerned about the volume of data missing in your DataFrame. You’re worried about rows with more than one missing value. In other words, you want to delete those rows containing at least five pieces of data in their six columns.
To do this, you can set this threshold using the thresh
parameter:
>>> sales_data.dropna(thresh=5)
order_number order_date customer_name \
0 <NA> 2025-02-09 00:00:00 Skipton Fealty
2 70042 2025-02-09 00:00:00 <NA>
3 70043 2025-02-10 00:00:00 Lanni D'Ambrogi
4 70044 2025-02-10 00:00:00 Tann Angear
5 70045 2025-02-10 00:00:00 Skipton Fealty
6 70046 2025-02-11 00:00:00 Far Pow
7 70047 2025-02-11 00:00:00 Hill Group
8 70048 2025-02-11 00:00:00 Devlin Nock
10 70049 2025-02-12 00:00:00 Swift Inc
product_purchased discount sale_price
0 Chili Extra Virgin Olive Oil True 135.0
2 Rosemary Olive Oil Candle False 78.0
3 <NA> True 19.5
4 Vanilla and Olive Oil Candle <NA> 13.98
5 Basil Extra Virgin Olive Oil True <NA>
6 Chili Extra Virgin Olive Oil False 150.0
7 Chili Extra Virgin Olive Oil True 135.0
8 Lavender and Olive Oil Lotion False 39.96
10 Garlic Extra Virgin Olive Oil True 936.0
Rows with indices 1
and 9
have been removed since they contain multiple missing values. By setting thresh
to 5
, you tell .dropna()
to keep all rows that contain at least five values. Since there are six columns, all remaining rows have no more than one missing value.
Removing Empty Rows or Columns
Sometimes you might want to remove only rows or columns that are completely empty, rather than those with just some missing values. You can control this with the how
parameter.
For example, suppose you want to remove rows where all values are missing:
>>> sales_data.dropna(how="all")
order_number order_date customer_name \
0 <NA> 2025-02-09 00:00:00 Skipton Fealty
1 70041 <NA> Carmine Priestnall
2 70042 2025-02-09 00:00:00 <NA>
3 70043 2025-02-10 00:00:00 Lanni D'Ambrogi
4 70044 2025-02-10 00:00:00 Tann Angear
5 70045 2025-02-10 00:00:00 Skipton Fealty
6 70046 2025-02-11 00:00:00 Far Pow
7 70047 2025-02-11 00:00:00 Hill Group
8 70048 2025-02-11 00:00:00 Devlin Nock
10 70049 2025-02-12 00:00:00 Swift Inc
product_purchased discount sale_price
0 Chili Extra Virgin Olive Oil True 135.0
1 <NA> <NA> 150.0
2 Rosemary Olive Oil Candle False 78.0
3 <NA> True 19.5
4 Vanilla and Olive Oil Candle <NA> 13.98
5 Basil Extra Virgin Olive Oil True <NA>
6 Chili Extra Virgin Olive Oil False 150.0
7 Chili Extra Virgin Olive Oil True 135.0
8 Lavender and Olive Oil Lotion False 39.96
10 Garlic Extra Virgin Olive Oil True 936.0
As you can see, the blank row at index position 9
has been removed. By passing how="all"
to .dropna()
and using the default value axis=0
, you remove rows containing only null values. Alternatively, you could pass how="any"
to remove rows or columns that have at least one null value. You rarely need to do this, however, because "any"
is the default value.
Note: The how
and thresh
parameters are mutually exclusive. In other words, you can use one or the other, but not both at the same time. It isn’t logical to specify "any"
or "all"
and then also set a threshold.
Adding a Finishing Touch: Reset the Index
Finally, you may have noticed that each time you remove a row—both here and throughout this tutorial—the index doesn’t update. As a finishing touch, passing ignore_index=True
will reindex your DataFrame sequentially:
>>> sales_data.dropna(thresh=5, ignore_index=True)
order_number order_date customer_name \
0 <NA> 2025-02-09 00:00:00 Skipton Fealty
1 70042 2025-02-09 00:00:00 <NA>
2 70043 2025-02-10 00:00:00 Lanni D'Ambrogi
3 70044 2025-02-10 00:00:00 Tann Angear
4 70045 2025-02-10 00:00:00 Skipton Fealty
5 70046 2025-02-11 00:00:00 Far Pow
6 70047 2025-02-11 00:00:00 Hill Group
7 70048 2025-02-11 00:00:00 Devlin Nock
8 70049 2025-02-12 00:00:00 Swift Inc
product_purchased discount sale_price
0 Chili Extra Virgin Olive Oil True 135.0
1 Rosemary Olive Oil Candle False 78.0
2 <NA> True 19.5
3 Vanilla and Olive Oil Candle <NA> 13.98
4 Basil Extra Virgin Olive Oil True <NA>
5 Chili Extra Virgin Olive Oil False 150.0
6 Chili Extra Virgin Olive Oil True 135.0
7 Lavender and Olive Oil Lotion False 39.96
8 Garlic Extra Virgin Olive Oil True 936.0
This time, the index starts at 0
and ends at 8
, and there are only nine rows remaining.
To wrap up what you’ve covered so far, it’s time to test your new skills.
Practice Your Skills
To practice, begin by loading the contents of grades.csv
into a DataFrame:
>>> import pandas as pd
>>> grades = pd.read_csv(
... "grades.csv",
... ).convert_dtypes(dtype_backend="pyarrow")
>>> grades
Subject S1 S2 S3 S4 S5 S6
0 math 18 <NA> 15 20 17 18
1 science 26 35 19 <NA> 33 <NA>
2 art 15 <NA> 9 17 18 14
3 music 14 20 12 20 13 18
4 history 18 19 <NA> 17 <NA> 18
5 sport 20 17 20 17 18 <NA>
6 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
The DataFrame shows results for six students in five subjects. In this case, <NA>
indicates that an exam wasn’t taken.
Now, see if you can use .dropna()
to answer each of the following questions:
- Use
.dropna()
so that it permanently drops the row in the DataFrame containing only null values. Use this version from this point on. - Display the rows for the exams that all students completed.
- Display any columns with no missing data.
- Display the exams taken by at least five students.
- Identify who else completed every exam that both S2 and S4 took.
You’ll find the answers in the downloadable materials below:
Get Your Code: Click here to download the free sample code that you’ll use to learn how to drop null values in pandas.
Conclusion
During this tutorial, you learned to use the DataFrame’s .dropna()
method to remove null values from your DataFrames.
By using the various parameters, you’ve learned how:
- The
axis
parameter allows you to remove rows or columns containing null values. - The
thresh
andhow
parameters allow you to define the quantity of what gets removed. - The
subset
parameter lets you restrict the removal to part of your DataFrame. - The
inplace
parameter allows you to update either the original DataFrame or a copy after deletions. - The
ignore_index
parameter allows you to reset the DataFrame index after rows of data have been removed.
Although you now have a solid understanding of .dropna()
, there’s still lots to explore.
Review what you’ve learned here to identify which techniques best fit your needs for handling missing values. You may also want to apply these skills to the .dropna()
methods for pandas Series and Index objects. Doing so is a great way to continue your learning journey.
Take the Quiz: Test your knowledge with our interactive “How to Drop Null Values in pandas” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
How to Drop Null Values in pandasQuiz yourself on pandas .dropna(): remove nulls, clean missing data, and prepare DataFrames for accurate analysis.
Frequently Asked Questions
Now that you have some experience with dropping null values in pandas, you can use the questions and answers below to check your understanding and recap what you’ve learned.
These FAQs are related to the most important concepts you’ve covered in this tutorial. Click the Show/Hide toggle beside each question to reveal the answer.
You use the .dropna()
method to remove rows or columns with null values from your DataFrame.
You remove empty values by using the .dropna()
method, which lets you drop rows or columns containing null values.
You use the thresh
parameter to specify the minimum number of non-null values a row or column must have to avoid being dropped.
You set the axis
parameter to 1
or "columns"
in the .dropna()
method to drop columns containing null values.
When you set inplace=True
, you update the original DataFrame directly, removing rows or columns with null values without creating a new DataFrame.