Pandas is one of the most important packages to grasp when you’re starting to learn Python
It is known for a very useful data structure called the pandas DataFrame. It also allows Python developers to easily deal with tabular data (like spreadsheets) within a Python script.
In this post, you will find frequently used Pandas features and I hope that you can use them to build data-driven Python applications today.
To use Pandas first you will have to import the library in your script.
import pandas as pd
For demonstration, I will be using ‘iris’ dataset and loading it into a dataframe.
Now let’s play with the dataset using Pandas features
- Accessing/selecting rows and columns in the dataset
ilocfunctions are used to select rows and columns from the dataset based on the labels or positions.
- loc: select by labels
- iloc: select by positions
To access the first element of all columns we can use
This will return a Pandas series of the first index or row from the dataframe.
Similarly to get the first 5 elements of column sepal_length
To get the first 4 elements of the first 5columns we can use iloc
2. Groupby function
Pandas has a built-in
groupby function that allows you to group together rows based on a column and perform an aggregate function on the grouped dataset.
For example, you could calculate the mean of all rows using group by.
It is similar to the group by function in SQL language.
I have applied groupby function on column species
As you can see the result of groupby is a Pandas groupby object.
Now we can apply aggregate functions on this object to get the required results.
Similarly, we can apply other functions like min, std, etc. on any of the columns.
3. Map function
map function applies changes to every element of a column
Here I am extracting an integer value before the decimal point from the column sepal_length using the split function
The extraction needs to be done on all the rows, so instead of iterating over the entire dataframe I can use map function and the output is assigned to a new column in the dataframe.
4. Shape and Size
shape function is used to get the number of dimensions as well as the size in each dimension of a dataframe.
Since dataframes are two-dimensional, what shape returns is the number of rows and columns.
size function as the name suggests returns the size of a dataframe which is the number of rows multiplied by the number of columns.
5. Identifying missing values
Identifying missing values is very important in Pandas as it can cause errors or miscalculations in further processing.
To check if the dataframe has any null values or na values we can use isnull() or isna() functions respectively.
On these functions, you can apply additional sum() or all()/any() functions to get the statistics of the missing values.
You can replace the na values by using function fillna()
6. Querying the data
Pandas also has a capability to filter the dataset based on a condition. To query the data we can directly add the filter conditions in loc.
Here I want to filter rows where sepal_length is greater than 7
7. Sorting the data
In Pandas we can sort the data by either rows or columns using function sort_values()
Here, I have sorted the dataframe on column sepal_length and printed its top 5 rows.
The default mode for sorting is ascending mode, you can change the mode by passing a parameter in sort_values function as ascending=False. This will sort the dataframe in descending mode.
I tried to collate all the functions of Pandas used on a day-to-day basis. I hope you will find something useful here. Thank you for reading till the end. And if you like my Blog please hit the clap button below.