Advanced DataFrame rows filtering with lambda function

dataframe_filter_rows_with_lambda

Skill - Advanced DataFrame rows filtering with lambda function

Table of Contents

Skills Required

Please make sure to have all the skills mentioned above to understand and execute the code mentioned below. Go through the above skills if necessary for reference or revision


Pandas is a python library.
DataFrame is a data structure provided by the pandas library.

Please go through Pandas DataFrame Basics to learn the basics of pandas DataFrame.

In this post, we will learn how to filter the rows of a DataFrame based on our desired criteria using lambda functions. This approach can accommodate complex filtering criteria since we will define the filtering criteria as a lambda function. The lambda function can access all the columns of each row while filtering the DataFrame rows


Instructions to run the codes below

  • Create a folder and place the csv file used in this post from here
  • Open the folder in Visual Studio Code
  • Create and work on python files in this folder

The excel files should look like the image below
excel_file_illustration

Example

import pandas as pd
# create dataframe from excel
df = pd.read_csv('gen_schedules.csv')

print('Number of rows in df = {0}'.format(df.shape[0]))
# this prints Number of rows in df = 100

# filter the rows with VSTPS_III values greater than VSTPS_IV
filteredDf = df[df.apply(lambda x: x["VSTPS_III"]>x["VSTPS_IV"], axis=1)]

print('Number of rows in filteredDf = {0}'.format(filteredDf.shape[0]))
# this prints Number of rows in filteredDf = 34

Video

Video for this post can be found here


References


Table of Contents

Comments