How to Drop Null Values in pandas

Missing values can derail your analysis. In pandas, you can use the .dropna() method to remove rows or columns containing null values—in other words, missing data—so you can work with clean DataFrames. In this tutorial, you’ll learn how this method’s parameters let you control exactly which data gets removed. As you’ll see, these parameters give you fine-grained control over how much of your data to clean.

Dealing with null values is essential for keeping datasets clean and avoiding the issues they can cause. Missing entries can lead to misinterpreted column data types, inaccurate conclusions, and errors in calculations. Simply put, nulls can cause havoc if they find their way into your calculations.

By the end of this tutorial, you’ll understand that:

You can use .dropna() to remove rows and columns from a pandas DataFrame.
You can remove rows and columns based on the content of a subset of your DataFrame.
You can remove rows and columns based on the volume of null values within your DataFrame.

To get the most out of this tutorial, it’s recommended that you already have a basic understanding of how to create pandas DataFrames from files.

You’ll use the Python REPL along with a file named sales_data_with_missing_values.csv, which contains several null values you’ll deal with during the exercises. Before you start, extract this file from the downloadable materials by clicking the link at the end of this section.

The sales_data_with_missing_values.csv file is based on the publicly available and complete sales data file from Kaggle. Understanding the file’s content isn’t essential for this tutorial, but you can explore the Kaggle link above for more details if you’d like.

You’ll also need to install both the pandas and PyArrow libraries to make sure all code examples work in your environment:

PS> python -m pip install pandas pyarrow

$ python -m pip install pandas pyarrow

It’s time to refine your pandas skills by learning how to handle missing data in a variety of ways.

You’ll find all code examples and the sales_data_with_missing_values.csv file in the materials for this tutorial, which you can download by clicking the link below:

Get Your Code: Click here to download the free sample code that you’ll use to learn how to drop null values in pandas.

Take the Quiz: Test your knowledge with our interactive “How to Drop Null Values in pandas” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

How to Drop Null Values in pandas

Quiz yourself on pandas .dropna(): remove nulls, clean missing data, and prepare DataFrames for accurate analysis.

How to Drop Rows Containing Null Values in pandas

Before you start dropping rows, it’s helpful to know what options .dropna() gives you. This method supports six parameters that let you control exactly what’s removed:

axis: Specifies whether to remove rows or columns containing null values.
thresh and how: Define how many missing values to remove or retain.
subset: Limits the removal of null values to specific parts of your DataFrame.
inplace: Determines whether the operation modifies the original DataFrame or returns a new copy.
ignore_index: Resets the DataFrame index after removing rows.

Don’t worry if any of these parameters don’t make sense to you just yet—you’ll learn why each is used during this tutorial. You’ll also get the chance to practice your skills.

Note: Although this tutorial teaches you how pandas DataFrames use .dropna(), DataFrames aren’t the only pandas objects that use it.

Series objects also have their own .dropna() method. However, the Series version contains only four parameters—axis, inplace, how, and ignore_index—instead of the six supported by the DataFrame version. Of these, only inplace and ignore_index are used, and they work the same way as in the DataFrame method. The rest are kept for compatibility with DataFrame, but have no effect.

Indexes also have a .dropna() method for removing missing index values, and it contains just one parameter: how.

Before using .dropna() to drop rows, you should first find out whether your data contains any null values:

>>> import pandas as pd

>>> pd.set_option("display.max_columns", None)

>>> sales_data = pd.read_csv(
...     "sales_data_with_missing_values.csv",
...     parse_dates=["order_date"],
...     date_format="%d/%m/%Y",
... ).convert_dtypes(dtype_backend="pyarrow")

>>> sales_data
    order_number           order_date       customer_name  \
0           <NA>  2025-02-09 00:00:00      Skipton Fealty
1          70041                 <NA>  Carmine Priestnall
2          70042  2025-02-09 00:00:00                <NA>
3          70043  2025-02-10 00:00:00     Lanni D'Ambrogi
4          70044  2025-02-10 00:00:00         Tann Angear
5          70045  2025-02-10 00:00:00      Skipton Fealty
6          70046  2025-02-11 00:00:00             Far Pow
7          70047  2025-02-11 00:00:00          Hill Group
8          70048  2025-02-11 00:00:00         Devlin Nock
9           <NA>                 <NA>                <NA>
10         70049  2025-02-12 00:00:00           Swift Inc

                product_purchased discount  sale_price
0    Chili Extra Virgin Olive Oil     True       135.0
1                            <NA>     <NA>       150.0
2       Rosemary Olive Oil Candle    False        78.0
3                            <NA>     True        19.5
4    Vanilla and Olive Oil Candle     <NA>       13.98
5    Basil Extra Virgin Olive Oil     True        <NA>
6    Chili Extra Virgin Olive Oil    False       150.0
7    Chili Extra Virgin Olive Oil     True       135.0
8   Lavender and Olive Oil Lotion    False       39.96
9                            <NA>     <NA>        <NA>
10  Garlic Extra Virgin Olive Oil     True       936.0

To make sure all columns appear on your screen, you configure pd.set_option("display.max_columns", None). By passing None as the second parameter, you make sure all columns are displayed.

You read the sales_data_with_missing_values.csv file into a DataFrame using the pandas read_csv() function, then view the data. The order dates are in the "%d/%m/%Y" format in the file, so to make sure the order_date data is read correctly, you use both the parse_dates and date_format parameters. The output reveals there are ten rows and six columns of data in your file.

Note: By default, pandas uses the NumPy library for its back-end data types. In the future—starting with pandas 3, which is under development—the default back end will be the more efficient PyArrow. You can still get the benefits of this new back end and its data types by configuring pandas using the .convert_dtypes(dtype_backend="pyarrow") method on your DataFrame.

Using PyArrow also means the null values are displayed consistently as <NA>. If you use the default NumPy back end, then the null value in order_date would display as NaT, for not a time, while the rest would be shown as NaN, for not a number. Fortunately, .dropna() treats them all as null values.

In a real data analysis, your DataFrame may be too large to allow you to see everything that’s missing. To solve this, you use the DataFrame’s .isna() and .sum() methods together:

>>> sales_data.isna().sum()
order_number         2
order_date           2
customer_name        2
product_purchased    3
discount             3
sale_price           2
dtype: int64

As you can see, each of your columns is missing data.

When you use sales_data.isna(), you create a Boolean DataFrame the same size as sales_data, but with null values replaced with True and everything else replaced with False. The .sum() method counts each True and returns it to you as shown.

Note: You may see .isnull() being used. This is an alias for .isna(), and both work the same way. In addition, pandas provides .notna() and .notnull(), which you can use to reveal how many values in each column don’t contain nulls.

To remove the rows with null values, you use .dropna():

>>> sales_data.dropna()
    order_number           order_date customer_name  \
6          70046  2025-02-11 00:00:00       Far Pow
7          70047  2025-02-11 00:00:00    Hill Group
8          70048  2025-02-11 00:00:00   Devlin Nock
10         70049  2025-02-12 00:00:00     Swift Inc

                product_purchased discount  sale_price
6    Chili Extra Virgin Olive Oil    False       150.0
7    Chili Extra Virgin Olive Oil     True       135.0
8   Lavender and Olive Oil Lotion    False       39.96
10  Garlic Extra Virgin Olive Oil     True       936.0

As you can see, rows with null values are nowhere to be seen.

You might think the rows containing the null values from sales_data have been deleted. After all, that’s what the output shows. However, this isn’t true. When you use sales_data.dropna(), the results are placed into a new DataFrame, leaving the original unchanged. If you want to retain this second DataFrame, then you need to assign the output from .dropna() to a new variable:

>>> clean_sales_data = sales_data.dropna()

Now the clean_sales_data DataFrame contains a second copy of the original sales_data DataFrame, but without those rows that contained one or more null values. Of course, this is wasteful of memory, so a better option would be to update the original DataFrame. To do this, you pass inplace=True to .dropna():

>>> sales_data.dropna(inplace=True)

The original sales_data DataFrame will no longer contain rows with null values, and any future analysis will occur against this new version.

Note: Even though inplace=True allows you to save memory, it offers little performance benefit during processing because a temporary copy is still made in the background. Also, applying some of your processing to your original DataFrame and some to a second copy can make finding errors difficult. A pandas enhancement proposal (PDEP) calling for a deprecation was created because of these issues.

Setting inplace=True is most useful when you want to make a one-time permanent change to your data and then use that version in all future analysis.

You’ve now seen the basic use case for .dropna(), where you drop rows with at least one missing value. But the method also includes several other parameters worth exploring.

Remove ads

How to Drop Columns Containing Null Values in pandas

So far, you’ve seen how to drop rows containing missing values, but it’s also possible to drop incomplete columns. You do this using the axis parameter. By default, axis is set to 0, or "index", which means it operates on rows of data. If you want to apply it across columns, then you set axis to 1 or "columns".

Suppose you want to remove incomplete columns containing null values from the original sales_data. Here’s how you do it:

>>> import pandas as pd

>>> sales_data = pd.read_csv(
...     "sales_data_with_missing_values.csv",
...     parse_dates=["order_date"],
...     date_format="%d/%m/%Y",
... ).convert_dtypes(dtype_backend="pyarrow")

>>> sales_data.dropna(axis="columns")
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Because every column has at least one missing value, they’ve all been removed, leaving the DataFrame empty except for its index.

So far, you’ve taken a fairly aggressive approach by removing entire rows and columns. Next, you’ll look at some less destructive alternatives.

How to Work With a Part of Your Data

Earlier, you used .dropna() to remove all null values from your DataFrame. While this is a common use case, you can also remove nulls from specific parts of the DataFrame or based on a certain quantity threshold. You’ll learn how in this section.

Removing Data Based on Specific Rows or Columns

You can restrict removal of data based on the rows or columns where the null values appear. This is where the subset parameter comes in handy.

By passing a column label or a sequence of labels to subset, you tell .dropna() which columns to check for null values and remove their associated rows. Similarly, by passing subset an index position or a sequence of index positions and setting axis="columns", you remove the column or columns that contain null values at the specified index positions.

For example, suppose you want to remove rows containing null values in either the discount or sale_price columns. If there are null values in other columns besides discount or sale_price, then those rows will remain untouched. Here’s how you do it:

>>> import pandas as pd

>>> pd.set_option("display.max_columns", None)

>>> sales_data = pd.read_csv(
...     "sales_data_with_missing_values.csv",
...     parse_dates=["order_date"],
...     date_format="%d/%m/%Y",
... ).convert_dtypes(dtype_backend="pyarrow")

>>> sales_data.dropna(axis=0, subset=(["discount", "sale_price"]))
    order_number           order_date    customer_name  \
0           <NA>  2025-02-09 00:00:00   Skipton Fealty
2          70042  2025-02-09 00:00:00             <NA>
3          70043  2025-02-10 00:00:00  Lanni D'Ambrogi
6          70046  2025-02-11 00:00:00          Far Pow
7          70047  2025-02-11 00:00:00       Hill Group
8          70048  2025-02-11 00:00:00      Devlin Nock
10         70049  2025-02-12 00:00:00        Swift Inc

                product_purchased discount  sale_price
0    Chili Extra Virgin Olive Oil     True       135.0
2       Rosemary Olive Oil Candle    False        78.0
3                            <NA>     True        19.5
6    Chili Extra Virgin Olive Oil    False       150.0
7    Chili Extra Virgin Olive Oil     True       135.0
8   Lavender and Olive Oil Lotion    False       39.96
10  Garlic Extra Virgin Olive Oil     True       936.0

This time, you’ve removed four rows. To clarify that you’re removing rows, you explicitly set axis to 0. However, you can omit this argument since 0 is the default value for the axis parameter. To restrict your analysis to the two columns you want, you pass the Python list ["discount", "sale_price"] as the subset parameter.

Removing Data Based on the Quantity of Missing Values

It’s possible to instruct .dropna() to remove rows or columns based on a specified quantity of data. Suppose you’re concerned about the volume of data missing in your DataFrame. You’re worried about rows with more than one missing value. In other words, you want to delete those rows containing at least five pieces of data in their six columns.

To do this, you can set this threshold using the thresh parameter:

>>> sales_data.dropna(thresh=5)
    order_number           order_date    customer_name  \
0           <NA>  2025-02-09 00:00:00   Skipton Fealty
2          70042  2025-02-09 00:00:00             <NA>
3          70043  2025-02-10 00:00:00  Lanni D'Ambrogi
4          70044  2025-02-10 00:00:00      Tann Angear
5          70045  2025-02-10 00:00:00   Skipton Fealty
6          70046  2025-02-11 00:00:00          Far Pow
7          70047  2025-02-11 00:00:00       Hill Group
8          70048  2025-02-11 00:00:00      Devlin Nock
10         70049  2025-02-12 00:00:00        Swift Inc

                product_purchased discount  sale_price
0    Chili Extra Virgin Olive Oil     True       135.0
2       Rosemary Olive Oil Candle    False        78.0
3                            <NA>     True        19.5
4    Vanilla and Olive Oil Candle     <NA>       13.98
5    Basil Extra Virgin Olive Oil     True        <NA>
6    Chili Extra Virgin Olive Oil    False       150.0
7    Chili Extra Virgin Olive Oil     True       135.0
8   Lavender and Olive Oil Lotion    False       39.96
10  Garlic Extra Virgin Olive Oil     True       936.0

Rows with indices 1 and 9 have been removed since they contain multiple missing values. By setting thresh to 5, you tell .dropna() to keep all rows that contain at least five values. Since there are six columns, all remaining rows have no more than one missing value.

Removing Empty Rows or Columns

Sometimes you might want to remove only rows or columns that are completely empty, rather than those with just some missing values. You can control this with the how parameter.

For example, suppose you want to remove rows where all values are missing:

>>> sales_data.dropna(how="all")
    order_number           order_date       customer_name  \
0           <NA>  2025-02-09 00:00:00      Skipton Fealty
1          70041                 <NA>  Carmine Priestnall
2          70042  2025-02-09 00:00:00                <NA>
3          70043  2025-02-10 00:00:00     Lanni D'Ambrogi
4          70044  2025-02-10 00:00:00         Tann Angear
5          70045  2025-02-10 00:00:00      Skipton Fealty
6          70046  2025-02-11 00:00:00             Far Pow
7          70047  2025-02-11 00:00:00          Hill Group
8          70048  2025-02-11 00:00:00         Devlin Nock
10         70049  2025-02-12 00:00:00           Swift Inc

                product_purchased discount  sale_price
0    Chili Extra Virgin Olive Oil     True       135.0
1                            <NA>     <NA>       150.0
2       Rosemary Olive Oil Candle    False        78.0
3                            <NA>     True        19.5
4    Vanilla and Olive Oil Candle     <NA>       13.98
5    Basil Extra Virgin Olive Oil     True        <NA>
6    Chili Extra Virgin Olive Oil    False       150.0
7    Chili Extra Virgin Olive Oil     True       135.0
8   Lavender and Olive Oil Lotion    False       39.96
10  Garlic Extra Virgin Olive Oil     True       936.0

As you can see, the blank row at index position 9 has been removed. By passing how="all" to .dropna() and using the default value axis=0, you remove rows containing only null values. Alternatively, you could pass how="any" to remove rows or columns that have at least one null value. You rarely need to do this, however, because "any" is the default value.

Note: The how and thresh parameters are mutually exclusive. In other words, you can use one or the other, but not both at the same time. It isn’t logical to specify "any" or "all" and then also set a threshold.

Remove ads

Adding a Finishing Touch: Reset the Index

Finally, you may have noticed that each time you remove a row—both here and throughout this tutorial—the index doesn’t update. As a finishing touch, passing ignore_index=True will reindex your DataFrame sequentially:

>>> sales_data.dropna(thresh=5, ignore_index=True)
   order_number           order_date    customer_name  \
0          <NA>  2025-02-09 00:00:00   Skipton Fealty
1         70042  2025-02-09 00:00:00             <NA>
2         70043  2025-02-10 00:00:00  Lanni D'Ambrogi
3         70044  2025-02-10 00:00:00      Tann Angear
4         70045  2025-02-10 00:00:00   Skipton Fealty
5         70046  2025-02-11 00:00:00          Far Pow
6         70047  2025-02-11 00:00:00       Hill Group
7         70048  2025-02-11 00:00:00      Devlin Nock
8         70049  2025-02-12 00:00:00        Swift Inc

               product_purchased discount  sale_price
0   Chili Extra Virgin Olive Oil     True       135.0
1      Rosemary Olive Oil Candle    False        78.0
2                           <NA>     True        19.5
3   Vanilla and Olive Oil Candle     <NA>       13.98
4   Basil Extra Virgin Olive Oil     True        <NA>
5   Chili Extra Virgin Olive Oil    False       150.0
6   Chili Extra Virgin Olive Oil     True       135.0
7  Lavender and Olive Oil Lotion    False       39.96
8  Garlic Extra Virgin Olive Oil     True       936.0

This time, the index starts at 0 and ends at 8, and there are only nine rows remaining.

To wrap up what you’ve covered so far, it’s time to test your new skills.

Practice Your Skills

To practice, begin by loading the contents of grades.csv into a DataFrame:

>>> import pandas as pd

>>> grades = pd.read_csv(
...     "grades.csv",
... ).convert_dtypes(dtype_backend="pyarrow")

>>> grades
   Subject    S1    S2    S3    S4    S5    S6
0     math    18  <NA>    15    20    17    18
1  science    26    35    19  <NA>    33  <NA>
2      art    15  <NA>     9    17    18    14
3    music    14    20    12    20    13    18
4  history    18    19  <NA>    17  <NA>    18
5    sport    20    17    20    17    18  <NA>
6     <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>

The DataFrame shows results for six students in five subjects. In this case, <NA> indicates that an exam wasn’t taken.

Now, see if you can use .dropna() to answer each of the following questions:

Use .dropna() so that it permanently drops the row in the DataFrame containing only null values. Use this version from this point on.
Display the rows for the exams that all students completed.
Display any columns with no missing data.
Display the exams taken by at least five students.
Identify who else completed every exam that both S2 and S4 took.

You’ll find the answers in the downloadable materials below:

Get Your Code: Click here to download the free sample code that you’ll use to learn how to drop null values in pandas.

Conclusion

During this tutorial, you learned to use the DataFrame’s .dropna() method to remove null values from your DataFrames.

By using the various parameters, you’ve learned how:

The axis parameter allows you to remove rows or columns containing null values.
The thresh and how parameters allow you to define the quantity of what gets removed.
The subset parameter lets you restrict the removal to part of your DataFrame.
The inplace parameter allows you to update either the original DataFrame or a copy after deletions.
The ignore_index parameter allows you to reset the DataFrame index after rows of data have been removed.

Although you now have a solid understanding of .dropna(), there’s still lots to explore.

Review what you’ve learned here to identify which techniques best fit your needs for handling missing values. You may also want to apply these skills to the .dropna() methods for pandas Series and Index objects. Doing so is a great way to continue your learning journey.

Take the Quiz: Test your knowledge with our interactive “How to Drop Null Values in pandas” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

How to Drop Null Values in pandas

Quiz yourself on pandas .dropna(): remove nulls, clean missing data, and prepare DataFrames for accurate analysis.

Remove ads

Frequently Asked Questions

Now that you have some experience with dropping null values in pandas, you can use the questions and answers below to check your understanding and recap what you’ve learned.

These FAQs are related to the most important concepts you’ve covered in this tutorial. Click the Show/Hide toggle beside each question to reveal the answer.

You use the .dropna() method to remove rows or columns with null values from your DataFrame.

You remove empty values by using the .dropna() method, which lets you drop rows or columns containing null values.

You use the thresh parameter to specify the minimum number of non-null values a row or column must have to avoid being dropped.

You set the axis parameter to 1 or "columns" in the .dropna() method to drop columns containing null values.

When you set inplace=True, you update the original DataFrame directly, removing rows or columns with null values without creating a new DataFrame.

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.

Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!