Skill - Advanced DataFrame rows filtering with lambda function
Skills Required
- Setup python development environment
- Basic Printing in Python
- Commenting in Python
- Managing Variables in python
- Pandas DataFrame Basics
- Filter DataFrame rows with conditional statements
Please make sure to have all the skills mentioned above to understand and execute the code mentioned below. Go through the above skills if necessary for reference or revision
Pandas is a python library.
DataFrame is a data structure provided by the pandas library.
Please go through Pandas DataFrame Basics to learn the basics of pandas DataFrame.
In this post, we will learn how to filter the rows of a DataFrame based on our desired criteria using lambda functions. This approach can accommodate complex filtering criteria since we will define the filtering criteria as a lambda function. The lambda function can access all the columns of each row while filtering the DataFrame rows
Instructions to run the codes below
- Create a folder and place the csv file used in this post from here
- Open the folder in Visual Studio Code
- Create and work on python files in this folder
The excel files should look like the image below
Example
import pandas as pd
# create dataframe from excel
df = pd.read_csv('gen_schedules.csv')
print('Number of rows in df = {0}'.format(df.shape[0]))
# this prints Number of rows in df = 100
# filter the rows with VSTPS_III values greater than VSTPS_IV
filteredDf = df[df.apply(lambda x: x["VSTPS_III"]>x["VSTPS_IV"], axis=1)]
print('Number of rows in filteredDf = {0}'.format(filteredDf.shape[0]))
# this prints Number of rows in filteredDf = 34
Video
Video for this post can be found here
References
- stack overflow post - https://stackoverflow.com/questions/11418192/pandas-complex-filter-on-rows-of-dataframe
- DataFrame apply function documentation - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
Comments
Post a Comment