Main

Project - 8 | Data Analysis with Python | #DataScience | Netflix Dataset

Download Source Code of this project (Rs.29) - https://rzp.io/l/project8sourcecode Download - Complete Course Notes - Data Analyst Self Study Material (Rs.250) - https://datasciencelovers.graphy.com/products/Python---Data-Analytics-Study-Material-64d7b0bdfd6efd7c4587e233?dgps_s=dsh&dgps_u=c&dgps_uid=64cb5694e4b000cf748a30c2&dgps_t=cp_m Download Dataset File - https://shorturl.at/aot24 Enrol in our Udemy courses : 1. Python Data Analytics Projects - https://www.udemy.com/course/bigdata-analysis-python/?referralCode=F75B5F25D61BD4E5F161 2. Python For Data Science - https://www.udemy.com/course/python-for-data-science-real-time-exercises/?referralCode=9C91F0B8A3F0EB67FE67 3. Numpy For Data Science - https://www.udemy.com/course/python-numpy-exercises/?referralCode=FF9EDB87794FED46CBDF Download Free Core Python Notes - https://datasciencelovers.graphy.com/products/Core-Python-Notes-64d116f7c7a9985f7b99a5bf?dgps_s=dsh&dgps_u=c&dgps_uid=64cb5694e4b000cf748a30c2&dgps_t=cp_m Download - Python Pandas Notes ( Rs.50 ) - http://bit.ly/3KxMpgA ----------------------------- Watch demo of Self Study Material - https://youtu.be/LoFpsODdkuo Outside India, PayPal for Self Study Material ($4) - datasciencelovers@gmail.com ....................................................................... Contact Mail Id : datasciencelovers@gmail.com -------------------------------------------------------------------- In this video, you will learn how to work on a real project of Data Analysis with Python. Questions are given in the project and then solved with the help of Python. It is a project of Data Analysis with Python or you can say, Data Science with Python. The commands that we used in this project : * head() - It shows the first N rows in the data (by default, N=5). * tail () - It shows the last N rows in the data (by default, N=5). * shape - It shows the total no. of rows and no. of columns of the dataframe. * size - To show No. of total values(elements) in the dataset. * columns - To show each Column Name. * dtypes - To show the data-type of each column. * info() - To show indexes, columns, data-types of each column, memory at once. * value_counts - In a column, it shows all the unique values with their count. It can be applied on a single column only. * unique() - It shows the all unique values of the series. * nunique() - It shows the total no. of unique values in the series. * duplicated( ) - To check row wise and detect the Duplicate rows. * isnull( ) - To show where Null value is present. * dropna( ) - It drops the rows that contains all missing values. * isin( ) - To show all records including particular elements. * str.contains( ) - To get all records that contains a given string. * str.split( ) - It splits a column's string into different columns. * to_datetime( ) - Converts the data-type of Date-Time Column into datetime[ns] datatype. * dt.year.value_counts( ) - It counts the occurrence of all individual years in Time column. * groupby( ) - Groupby is used to split the data into groups based on some criteria. * sns.countplot(df['Col_name']) - To show the count of all unique values of any column in the form of bar graph. * max( ), min( ) - It shows the maximum/minimum value of the series. * mean( ) - It shows the mean value of the series. You will learn these things also: Creating New Columns & Dataframe Filtering (Single Column & Multiple Columns) Filtering with And and OR Seaborn Library - Bar Graphs .............................................. Task. 1) Is there any Duplicate Record in this dataset ? If yes, then remove the duplicate records. Task. 2) Is there any Null Value present in any column ? Show with Heat-map. Q. 1) For 'House of Cards', what is the Show Id and Who is the Director of this show ? Q. 2) In which year the highest number of the TV Shows & Movies were released ? Show with Bar Graph. Q. 3) How many Movies & TV Shows are in the dataset ? Show with Bar Graph. Q. 4) Show all the Movies that were released in year 2000. Q. 5) Show only the Titles of all TV Shows that were released in India only. Q. 6) Show Top 10 Directors, who gave the highest number of TV Shows & Movies to Netflix ? Q. 7) Show all the Records, where "Category is Movie and Type is Comedies" or "Country is United Kingdom". Q. 8) In how many movies/shows, Tom Cruise was cast ? Q. 9) What are the different Ratings defined by Netflix ? Q. 9.1) How many Movies got the 'TV-14' rating, in Canada ? Q. 9.2) How many TV Shows got the 'R' rating, after year 2018 ? Q. 10) What is the maximum duration of a Movie/Show on Netflix ? Q. 11) Which individual country has the Highest No. of TV Shows ? Q. 12) How can we sort the dataset by Year ? Q. 13) Find all the instances where: Category is 'Movie' and Type is 'Dramas' or Category is 'TV Show' & Type is 'Kids' TV'. ------------------ #python #dataanalytics #datascience #project

DATA SCIENCE LOVERS

2 years ago

Welcome to Data Science Lovers. In this project, we will analyze the Netflix Dataset. And we will use these commands and functions. head function, tail function, shape, size, columns, dtypes, info function, value counts function, unique function, nunique function. To remove the duplicates, we will use the duplicate function. To find the null values, we will use isnull function. To drop the null values (records), we will use dropna function. We will use isin function, str.contains function, str.s
plit, to_datetime function, dt.year.value_counts function, to make the groups - groupby function, to draw the bar graph - the count plot, the min, max and mean function. And, we will also learn how to create new columns, new data frames. And, we will apply filtering on single column as well as on multiple columns. And, we will use the 'AND' operator as well as the 'OR' operator in the filtering. We will also draw the bar graph, using the Seaborn library. So let's start. As we have got 5000 subsc
ribers on this channel, so now i have a reward for you, you can say free gift that i will announce in the middle of this video. And, next reward will be after completing 10000 subscribers. So stay tuned. This Netflix dataset has information about the TV Shows and Movies, that are available on Netflix till 2021. This dataset is collected from Flixable which is a third-party Netflix search engine, and this dataset is available on Kaggle also for free. First of all, we will import our dataset in th
is jupyter notebook. And, to import the dataset, we have to use the Pandas library. So, first of all we will import pandas library. import pandas as pd . Run it. The library has been successfully imported. Now we will import our dataset. And, to import the dataset, the command is pd.read_ and here we have to write the file type or file extension. My file is in csv format. So i will write pd.read_csv. And, now i will pass the path of my file where it is located on my system. Here it is my excel(c
sv) file. I will copy its path from Properties and paste there. Ok. Now, to remove the unicode error, we have write small r here. And i will save this dataset in a data frame and i am naming it as 'data' only for the easiness of this project. So, this is my syntax to load the csv file. If i run this, the file has been successfully imported. Now, i we want to have a look at our data, then i will simply write 'data' here because we have used 'data' here also. If i run this , then this is my data.
Here are 11 columns - Show Id, Category, Title, Director, Cast, Country, Release Date, Rating, Duration, Type and Description. So these are the columns and these are the rows. These are top 5 rows, from index 0 to 4 and the bottom 5 rows, from index 7784 to 7788. And, it is also showing the shape of this dataset, means here are 7789 rows and 11 columns in this dataset. Okay. Now we will try to get some basic information about this dataset using some small functions or commands. Our first functio
n is head function. How to use head function ? We have write our dataframe name data.head function. It is used to show the top 5 records of the dataset. If i run this, then it will show top 5 records only of this dataset. Similarly we can use tail function. Tail function is used to show the bottom 5 records of the dataset. So we can use it data.tail function , run this and it will show bottom 5 records of the dataset. Next we have shape. Shape is used to show the number of rows and number of col
umns. The syntax will be our data frame name . shape, here if you notice there are no parenthesis after the shape command but in head and tail there are these brackets or paranthesis. So you have to remember these things. If i run this i will show the total number of rows 7789 and total number of columns 11 from the dataset , as it is also shown above here , like this. Okay. Next we can use is Size. The Size command is used to show the total number of elements in the dataset. If i run this data
.size, when i run this then it is showing 85679, means in this dataset total 85679 elements. What are the elements of the dataset.. this is our one element , this is our second element, this is third, fourth, fifth and all. So in total there are 85679 elements in this dataset, which can be shown by the size.

Comments

@data_science_lovers

Download Source Code of this project (Rs.29) - https://rzp.io/l/project8sourcecode Download - Python Data Analytics Course Notes and Projects Source Codes ( Rs.250 ) - https://datasciencelovers.graphy.com/products/Python---Data-Analytics-Study-Material-64d7b0bdfd6efd7c4587e233?dgps_s=dsh&dgps_u=c&dgps_uid=64cb5694e4b000cf748a30c2&dgps_t=cp_m Get our "Self Study Material", which includes all the Projects Source Codes and Notes of the complete Data Analytics course, which contain all commands of Core Python, Numpy, Pandas, Matplotlib, SQL that we use for Big-Data Analytics ( cost @ Rs.250 or $20 or €20 ) Contact Mail Id : datasciencelovers@gmail.com

@badmice1

I really like that you highlight the functions and methods you cover in the tutorial. This helps provide technical learning objectives many other videos do not cover. Great job.

@samuel_muly

One of the best channels I have ever come across. Q.10, add this line of code to get accurate results>> data['Minutes'] = data['Minutes'].astype(float)

@nverma2002

All these projects helped me build the most important part of data analysis/science which is to 'think questions' in data and finding the solutions using tools like pandas and python language. Thanks for providing the content DSL.

@paulaji

for question 8, we can convert the datatype of Cast column from object datatype to String and then use the following syntax to search for Tom Cruise: dataframe['Cast'] = dataframe['Cast'].astype(str) after that, dataframe[dataframe['Cast'].str.contains('Tom Cruise')]

@vinay_bhagat9075

great efforts taken in making of this video thank you sir

@riteshpardeshi9866

This page is very underrated. Others don't provide such great content. I have done all the projects of this channel related to Data analysis. Looking forward to a more advanced project that will help me to enhance my Python skills in Visualization. Data Science Lover Please provide more content related to seaborn, plotly, matplotlib, numpy.

@pranitaz9055

Thankyou for making this detailed video, it helped practice python and pandas skills.

@ossimisylvestre7484

Very pratical and great content, thank you !

@partabparmar5537

thank you so much sir, for such a amazing content. God bless you

@prajjwaljaiswal3419

Appreciate video but there are few Mistakes like in Q.10 maximum duration movie df.loc[df['Category'] == 'Movie'].groupby('Category')['Numeric_duration'].max().reset_index() Category Numeric_duration 0 Movie 312 Similarly , there are other mistakes Q.13 , either question is not framed properly, or Solution. There is mismatch .

@asp4628

Do more real world project end to end

@vijaymulimath6519

10:03 can u explain me how these 2 rows are duplicate??

@cococnk388

Hello Sir, thanks for the great job. For question 10.. We are missing two important facts : Duration column has values with two types of units : seasons and mins We cannont just find the max of the column after applying the split function. More to that after we do the split it is good to change the column with number values to int We have to find the max by filtering on each Category type ( movie and TV show) Here is my query : netflix[['Number', 'Unit']] = netflix["Duration"].apply(lambda x: pd.Series(str(x).split(" "))) netflix["Number"]= netflix["Number"].astype(int) netflix_TV_Show = netflix[(netflix["Category"]== "TV Show")] netflix_TV_Show[(netflix_TV_Show["Number"] == netflix_TV_Show["Number"].max())] netflix_Movie = netflix[(netflix["Category"]== "Movie")] netflix_Movie[(netflix_Movie["Number"] == netflix_Movie["Number"].max())] Thanks.

@drdeepurahulful

Can you do a video on Transport Optimization using the Pulp library. Using DHL or any other data set. I would love to learn it .

@panther_.gaming

we cant directly apply .max() to a string column becoz we know 90<120 but '90'>'120' . we need to change it to integer first to apply max

@tshepomaila184

Hi when I read or load the dataset it does not appear the way it appears in your videos.what can I do please help

@moseenmd46

sir thank you so much , learnt a lot

@aryansfunzone

Great video,Thanks

@nimratkaur6153

Hey could you pls make a proj that can be put in resume..really looking fwd to it.. Love your channel :)