Skill - Filter DataFrame rows with conditional statements
Skills Required
- Setup python development environment
- Basic Printing in Python
- Commenting in Python
- Managing Variables in python
- Pandas DataFrame Basics
Please make sure to have all the skills mentioned above to understand and execute the code mentioned below. Go through the above skills if necessary for reference or revision
Pandas is a python library.
DataFrame is a data structure provided by the pandas library.
Please go through Pandas DataFrame Basics to learn the basics of pandas DataFrame.
In this post, we will learn how to filter the rows of a DataFrame based on our desired criteria
Instructions to run the codes below
- Create a folder and place the csv file used in this post from here
- Open the folder in Visual Studio Code
- Create and work on python files in this folder
The excel files should look like the image below
Suppose in a DataFrame df
we want to get rows with values in column ‘A’ greater than 20, then we use df[df['A']>20]
Example
import pandas as pd
# create dataframe from excel
df = pd.read_csv('gen_schedules.csv')
print('Number of rows in df = {0}'.format(df.shape[0]))
# this prints Number of rows in df = 100
# filter the rows with CGPL values greater than 2200
filteredDf = df[df['CGPL']>2200]
print('Number of rows in filteredDf = {0}'.format(filteredDf.shape[0]))
# this prints Number of rows in filteredDf = 65
# filter the rows with CGPL values greater than 2200 and KSTPS7 less than 450
# we can achieve this via the logical operator '&'
filteredDf2 = df[(df['CGPL']>2200) & (df['KSTPS7']<450)]
print('Number of rows in filteredDf2 = {0}'.format(filteredDf2.shape[0]))
# this prints Number of rows in filteredDf2 = 10
Filter DataFrame rows based on null / Nan values
import pandas as pd
df = pd.read_csv("gen_schdules.csv")
# filter rows which have only null values in "Time Block" column
dfWithNa = df[df["Time Block"].isna()]
# filter rows which have only non null values in "Time Block" column
dfWithoutNa = df[~df["Time Block"].isna()]
print(dfWithoutNa)
Video
Video for this post can be found here
Online Interpreter
Although we recommend to practice the above examples in Visual Studio Code, you can run these examples online at https://www.tutorialspoint.com/execute_python_online.php
References
- Official tutorial - https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/03_subset_data.html#min-tut-03-subset
Comments
Post a Comment