Working with Excel and pandas DataFrames

dataframe_and_excel

Skill - Working with excel files and pandas DataFrames

Table of Contents

Skills Required

Please make sure to have all the skills mentioned above to understand and execute the code mentioned below. Go through the above skills if necessary for reference or revision

Pandas is a python library.
DataFrame is a data structure provided by the pandas library.

Please go through Pandas DataFrame Basics to learn the basics of pandas DataFrame.

In this post, we will learn common excel input output scenarios with pandas DataFrames

Instructions to run the codes below

  • Create a folder and place the csv file used in this post from here and excel file used in this post from here
  • Open the folder in Visual Studio Code
  • Create and work on python files in this folder

The excel files should look like the image below
excel_file_illustration

Creating a DataFrame from excel files using ‘read_csv’ or ‘read_excel’

read the documentation of read_csv here
read the documentation of read_excel here

import pandas as pd

dfCsv = pd.read_csv('gen_schedules.csv')
print(dfCsv)

dfExcel = pd.read_excel('gen_schedules.xlsx')
print(dfExcel)

read data from a specific excel sheet using “sheet_name” option

sheet_name can be the name of the sheet or the position of the sheet (zero-based)

import pandas as pd

dfExcel = pd.read_excel('gen_schedules.xlsx', sheet_name='Sheet1')
print(dfExcel)

import excel/csv data without header using “header” option

import pandas as pd

dfExcel = pd.read_excel('gen_schedules.xlsx', header=None)
print(dfExcel)

dfCsv = pd.read_csv('gen_schedules.csv', header=None)
print(dfExcel)

import only specific columns with “usecols” option

import pandas as pd

# read only three columns with headers CGPL,GADARWARA-I,GANDHAR-APM
dfExcel = pd.read_excel('gen_schedules.xlsx', usecols=['CGPL','GADARWARA-I','GANDHAR-APM'])
print(dfExcel)

# read only columns with positions(zero-based) 2,6,8
dfCsv = pd.read_csv('gen_schedules.csv', usecols=[2,6,8])
print(dfExcel)

# read only columns A,C,E:H
dfExcel = pd.read_excel('gen_schedules.xlsx', usecols='A,C,E:H')
print(dfExcel)

skip rows at the top during import using “skiprows” option

If the main excel/csv data starts after some rows, this option will be useful

import pandas as pd

# skip first 5 rows and then read the excel data
dfExcel = pd.read_excel('gen_schedules.xlsx', skiprows=5)

print(dfExcel)

skip rows at the bottom during import using “skipfooter” option

If the main excel/csv data has unnecessary data at the end of the file, this option will be useful

import pandas as pd

# skip the bottom 2 rows and then read the excel data
dfExcel = pd.read_excel('gen_schedules.xlsx', skipfooter=2)

print(dfExcel)

import only specific number of rows using “nrows” option

import pandas as pd

# read only 50 rows
dfExcel = pd.read_excel('gen_schedules.xlsx', nrows=50)
print(dfExcel)

export DataFrame to excel / csv using "to_excel " and “to_csv”

for more options to export data, read the official docs of to_excel and to_csv

import pandas as pd

x = [
        {"Name": "Harris", "Age": 22, "Sex": "male"},
        {"Name": "Mr. William Henry", "Age": 35, "Sex": "male"},
        {"Name": "Miss. Elizabeth", "Age": 58, "Sex": "female"},
    ]

# export DataFrame to a file named out.csv
x.to_csv('out.csv')

# export DataFrame to a file named output.xlsx
x.to_excel('output.xlsx')

# export DataFrame to  C:\Users\Nagasudhir\Documents\Python Projects\taming_python\data.xlsx
x.to_excel(r'C:\Users\Nagasudhir\Documents\Python Projects\taming_python\data.xlsx')

Video

Video for this post can be found here

Online Interpreter

Although we recommend to practice the above examples in Visual Studio Code, you can run these examples online at https://www.tutorialspoint.com/execute_python_online.php

References


Table of Contents

Comments