Skip to main content

Posts

Showing posts with the label pandas

Value Attribute of pandas.DataFrame and pandas.Series in Python

Make a copy of a dataframe If you want to make a copy, you may use the Value Attribute of pandas.DataFrame and pandas.Series like df = pd.DataFrame(data=[[1, 2, 3], [4, 5, 6]]) a = df.values Noto that it does not give a copy, gives just a view. print(type(a)) print(np.shares_memory(df, a)) # <class 'numpy.ndarray'> # True Use .to_numpy() method to make a copy of a DataFrame or Series. But there is one thing to remain. Set the parameter of copy to be true. The default is false. view = df.to_numpy() print(type(view)) print(np.shares_memory(df, view)) # <class 'numpy.ndarray'> # True copy = df.to_numpy(copy=True) print(type(copy)) print(np.shares_memory(df, copy)) # <class 'numpy.ndarray'> # False

Useful Pandas Functions for Titanic Kaggle Competition

Pandas is widely used for data science, data analysis and machine learning. Here I will show tips in using Pandas to process the data set for Titanic Kaggle Competition. The pandas module has many useful methods or functions. Fist you should use pandas.read_csv to read a comma-separated values (csv) file into DataFrame. import pandas as pd train_data = pd.read_csv('/kaggle/input/titanic/train.csv') test_data = pd.read_csv('/kaggle/input/titanic/test.csv') When you want to make sure the elements in the data, use the function of pandas.DataFrame.head gives information of the first 5 rows in the data frame. train_data.head() PassengerId Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 892 3 Kelly, Mr. James male 34.5 0 0 330911 7.8292 NaN Q 1 893 3 Wilkes, Mrs. James (Ellen Needs) female 47.0 1 0 363272 7.0000 NaN S 2 894 2 Myles, Mr. Thomas Francis male 62.0 0 0 240276 9.6875 NaN Q 3 895 3 Wirz, Mr. Albert male 27.0 0 0 315154 8.6625 NaN S 4 896 3 Hirv...

Select all numeric types from Pandas DataFrame in Python

To extract numeric types from Pandas DataFrame, Use select_types method . DataFrame.select_dtypes(include=None, exclude=None) import numpy as np import pandas as pd df = pd.DataFrame({'a': [1, 2] * 3, 'b': [True, False] * 3, 'c': [1.0, 2.0] * 3, 'd': ['one','two'] * 3}) To select all numeric types, set the parameter 'include' to np.number or 'number'. df.select_dtypes(include='number') df.select_dtypes(include=np.number) a c 0 1 1.0 1 2 2.0 2 1 1.0 3 2 2.0 4 1 1.0 5 2 2.0 You can also use the Python built-in types such as int and float, or 'int' and 'float' as string. df.select_dtypes(include=int) df.select_dtypes(include='int') a 0 1 1 2 2 1 3 2 4 1 5 2 df.select_dtypes(include=float) df.select_dtypes(include='float') c 0 1.0 1 2.0 2 1.0 3 2.0 4 1.0 5 2.0 To select strings you...

Convert string to datetime in Pandas DataFrame, python

Convert_str_to_Timestamp_type Convert string to datetime in Pandas ¶ Time string is converted to datetim data type in pandas DataFrame. Construct Pandas DataFrame ¶ In [1]: import pandas as pd df = pd . DataFrame ( data = { 'time' : [ '2012.11.27 20:00' , '2012.11.28 00:00' ], 'open' : [ 82.131 , 82.141 ], 'high' : [ 82.156 , 82.200 ], 'low' : [ 82.129 , 81.781 ], 'close' : [ 82.137 , 81.862 ], 'volume' : [ 177 , 8163 ], }, ) display ( df ) print ( df . dtypes ) time open high low close volume 0 2012.11.27 20:00 82.131 82.156 82.129 82.137 177 1 2012.11.28 00:00 ...