
Project - 8 | Data Analysis with Python | #DataScience | Netflix Dataset

Welcome to Data Science Lovers. In this project, we will analyze the Netflix Dataset. And we will use these commands and functions. head function, tail function, shape, size, columns, dtypes, info function, value counts function, unique function, nunique function. To remove the duplicates, we will use the duplicate function. To find the null values, we will use isnull function. To drop the null values (records), we will use dropna function. We will use isin function, str.contains function, str.s
plit, to_datetime function, dt.year.value_counts function, to make the groups - groupby function, to draw the bar graph - the count plot, the min, max and mean function. And, we will also learn how to create new columns, new data frames. And, we will apply filtering on single column as well as on multiple columns. And, we will use the 'AND' operator as well as the 'OR' operator in the filtering. We will also draw the bar graph, using the Seaborn library. So let's start. As we have got 5000 subsc
ribers on this channel, so now i have a reward for you, you can say free gift that i will announce in the middle of this video. And, next reward will be after completing 10000 subscribers. So stay tuned. This Netflix dataset has information about the TV Shows and Movies, that are available on Netflix till 2021. This dataset is collected from Flixable which is a third-party Netflix search engine, and this dataset is available on Kaggle also for free. First of all, we will import our dataset in th
is jupyter notebook. And, to import the dataset, we have to use the Pandas library. So, first of all we will import pandas library. import pandas as pd . Run it. The library has been successfully imported. Now we will import our dataset. And, to import the dataset, the command is pd.read_ and here we have to write the file type or file extension. My file is in csv format. So i will write pd.read_csv. And, now i will pass the path of my file where it is located on my system. Here it is my excel(c
sv) file. I will copy its path from Properties and paste there. Ok. Now, to remove the unicode error, we have write small r here. And i will save this dataset in a data frame and i am naming it as 'data' only for the easiness of this project. So, this is my syntax to load the csv file. If i run this, the file has been successfully imported. Now, i we want to have a look at our data, then i will simply write 'data' here because we have used 'data' here also. If i run this , then this is my data.
Here are 11 columns - Show Id, Category, Title, Director, Cast, Country, Release Date, Rating, Duration, Type and Description. So these are the columns and these are the rows. These are top 5 rows, from index 0 to 4 and the bottom 5 rows, from index 7784 to 7788. And, it is also showing the shape of this dataset, means here are 7789 rows and 11 columns in this dataset. Okay. Now we will try to get some basic information about this dataset using some small functions or commands. Our first functio
n is head function. How to use head function ? We have write our dataframe name data.head function. It is used to show the top 5 records of the dataset. If i run this, then it will show top 5 records only of this dataset. Similarly we can use tail function. Tail function is used to show the bottom 5 records of the dataset. So we can use it data.tail function , run this and it will show bottom 5 records of the dataset. Next we have shape. Shape is used to show the number of rows and number of col
umns. The syntax will be our data frame name . shape, here if you notice there are no parenthesis after the shape command but in head and tail there are these brackets or paranthesis. So you have to remember these things. If i run this i will show the total number of rows 7789 and total number of columns 11 from the dataset , as it is also shown above here , like this. Okay. Next we can use is Size. The Size command is used to show the total number of elements in the dataset. If i run this data
.size, when i run this then it is showing 85679, means in this dataset total 85679 elements. What are the elements of the dataset.. this is our one element , this is our second element, this is third, fourth, fifth and all. So in total there are 85679 elements in this dataset, which can be shown by the size.



I really like that you highlight the functions and methods you cover in the tutorial. This helps provide technical learning objectives many other videos do not cover. Great job.


One of the best channels I have ever come across. Q.10, add this line of code to get accurate results>> data['Minutes'] = data['Minutes'].astype(float)


All these projects helped me build the most important part of data analysis/science which is to 'think questions' in data and finding the solutions using tools like pandas and python language. Thanks for providing the content DSL.


for question 8, we can convert the datatype of Cast column from object datatype to String and then use the following syntax to search for Tom Cruise: dataframe['Cast'] = dataframe['Cast'].astype(str) after that, dataframe[dataframe['Cast'].str.contains('Tom Cruise')]


great efforts taken in making of this video thank you sir


This page is very underrated. Others don't provide such great content. I have done all the projects of this channel related to Data analysis. Looking forward to a more advanced project that will help me to enhance my Python skills in Visualization. Data Science Lover Please provide more content related to seaborn, plotly, matplotlib, numpy.


Thankyou for making this detailed video, it helped practice python and pandas skills.


Very pratical and great content, thank you !


thank you so much sir, for such a amazing content. God bless you


Appreciate video but there are few Mistakes like in Q.10 maximum duration movie df.loc[df['Category'] == 'Movie'].groupby('Category')['Numeric_duration'].max().reset_index() Category Numeric_duration 0 Movie 312 Similarly , there are other mistakes Q.13 , either question is not framed properly, or Solution. There is mismatch .


Do more real world project end to end


10:03 can u explain me how these 2 rows are duplicate??


Hello Sir, thanks for the great job. For question 10.. We are missing two important facts : Duration column has values with two types of units : seasons and mins We cannont just find the max of the column after applying the split function. More to that after we do the split it is good to change the column with number values to int We have to find the max by filtering on each Category type ( movie and TV show) Here is my query : netflix[['Number', 'Unit']] = netflix["Duration"].apply(lambda x: pd.Series(str(x).split(" "))) netflix["Number"]= netflix["Number"].astype(int) netflix_TV_Show = netflix[(netflix["Category"]== "TV Show")] netflix_TV_Show[(netflix_TV_Show["Number"] == netflix_TV_Show["Number"].max())] netflix_Movie = netflix[(netflix["Category"]== "Movie")] netflix_Movie[(netflix_Movie["Number"] == netflix_Movie["Number"].max())] Thanks.


Can you do a video on Transport Optimization using the Pulp library. Using DHL or any other data set. I would love to learn it .


we cant directly apply .max() to a string column becoz we know 90<120 but '90'>'120' . we need to change it to integer first to apply max


Hi when I read or load the dataset it does not appear the way it appears in your videos.what can I do please help


sir thank you so much , learnt a lot


Great video,Thanks


Hey could you pls make a proj that can be put in resume..really looking fwd to it.. Love your channel :)