Skip to main content

Convert string to datetime in Pandas DataFrame, python

Convert_str_to_Timestamp_type

Convert string to datetime in Pandas

Time string is converted to datetim data type in pandas DataFrame.

Construct Pandas DataFrame

In [1]:
import pandas as pd

df = pd.DataFrame(
    data={'time': ['2012.11.27 20:00', '2012.11.28 00:00'], 
          'open': [82.131, 82.141], 
          'high': [82.156, 82.200],
          'low': [82.129, 81.781],
          'close': [82.137, 81.862],
          'volume': [177, 8163],
         },
)

display(df)
print(df.dtypes)
time open high low close volume
0 2012.11.27 20:00 82.131 82.156 82.129 82.137 177
1 2012.11.28 00:00 82.141 82.200 81.781 81.862 8163
time       object
open      float64
high      float64
low       float64
close     float64
volume      int64
dtype: object

Column 'time' has string elements to be converted in datetime data type.

Convert string to datetime by pandas.to_datetime

In [3]:
dt_df = pd.to_datetime(df['time'], format='%Y.%m.%d %H:%M')
print(dt_df.dtypes)
print(type(dt_df[0]))
datetime64[ns]
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

A column of the datetime converted by pandas.to_datetime becomes 'datetime64[ns]' data type, and each element in the column is Timestamp.

In [4]:
import datetime
print(issubclass(pd.Timestamp, datetime.datetime))
True

Timestamp is the pandas equivalent of python's Datatime.

Set DatetimeIndex

In [5]:
df['time']=pd.to_datetime(df['time'], format='%Y.%m.%d %H:%M')
df.set_index('time', inplace=True)
display(df)
open high low close volume
time
2012-11-27 20:00:00 82.131 82.156 82.129 82.137 177
2012-11-28 00:00:00 82.141 82.200 81.781 81.862 8163
In [5]:
pd.to_datetime?
Signature:
pd.to_datetime(
    arg,
    errors='raise',
    dayfirst=False,
    yearfirst=False,
    utc=None,
    format=None,
    exact=True,
    unit=None,
    infer_datetime_format=False,
    origin='unix',
    cache=True,
)
Docstring:
Convert argument to datetime.

Parameters
----------
arg : int, float, str, datetime, list, tuple, 1-d array, Series DataFrame/dict-like
    The object to convert to a datetime.
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
    - If 'raise', then invalid parsing will raise an exception.
    - If 'coerce', then invalid parsing will be set as NaT.
    - If 'ignore', then invalid parsing will return the input.
dayfirst : bool, default False
    Specify a date parse order if `arg` is str or its list-likes.
    If True, parses dates with the day first, eg 10/11/12 is parsed as
    2012-11-10.
    Warning: dayfirst=True is not strict, but will prefer to parse
    with day first (this is a known bug, based on dateutil behavior).
yearfirst : bool, default False
    Specify a date parse order if `arg` is str or its list-likes.

    - If True parses dates with the year first, eg 10/11/12 is parsed as
      2010-11-12.
    - If both dayfirst and yearfirst are True, yearfirst is preceded (same
      as dateutil).

    Warning: yearfirst=True is not strict, but will prefer to parse
    with year first (this is a known bug, based on dateutil behavior).
utc : bool, default None
    Return UTC DatetimeIndex if True (converting any tz-aware
    datetime.datetime objects as well).
format : str, default None
    The strftime to parse time, eg "%d/%m/%Y", note that "%f" will parse
    all the way up to nanoseconds.
    See strftime documentation for more information on choices:
    https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior.
exact : bool, True by default
    Behaves as:
    - If True, require an exact format match.
    - If False, allow the format to match anywhere in the target string.

unit : str, default 'ns'
    The unit of the arg (D,s,ms,us,ns) denote the unit, which is an
    integer or float number. This will be based off the origin.
    Example, with unit='ms' and origin='unix' (the default), this
    would calculate the number of milliseconds to the unix epoch start.
infer_datetime_format : bool, default False
    If True and no `format` is given, attempt to infer the format of the
    datetime strings, and if it can be inferred, switch to a faster
    method of parsing them. In some cases this can increase the parsing
    speed by ~5-10x.
origin : scalar, default 'unix'
    Define the reference date. The numeric values would be parsed as number
    of units (defined by `unit`) since this reference date.

    - If 'unix' (or POSIX) time; origin is set to 1970-01-01.
    - If 'julian', unit must be 'D', and origin is set to beginning of
      Julian Calendar. Julian day number 0 is assigned to the day starting
      at noon on January 1, 4713 BC.
    - If Timestamp convertible, origin is set to Timestamp identified by
      origin.
cache : bool, default True
    If True, use a cache of unique, converted dates to apply the datetime
    conversion. May produce significant speed-up when parsing duplicate
    date strings, especially ones with timezone offsets. The cache is only
    used when there are at least 50 values. The presence of out-of-bounds
    values will render the cache unusable and may slow down parsing.

    .. versionadded:: 0.23.0

    .. versionchanged:: 0.25.0
        - changed default value from False to True.

Returns
-------
datetime
    If parsing succeeded.
    Return type depends on input:

    - list-like: DatetimeIndex
    - Series: Series of datetime64 dtype
    - scalar: Timestamp

    In case when it is not possible to return designated types (e.g. when
    any element of input is before Timestamp.min or after Timestamp.max)
    return will have datetime.datetime type (or corresponding
    array/Series).

See Also
--------
DataFrame.astype : Cast argument to a specified dtype.
to_timedelta : Convert argument to timedelta.
convert_dtypes : Convert dtypes.

Examples
--------
Assembling a datetime from multiple columns of a DataFrame. The keys can be
common abbreviations like ['year', 'month', 'day', 'minute', 'second',
'ms', 'us', 'ns']) or plurals of the same

>>> df = pd.DataFrame({'year': [2015, 2016],
...                    'month': [2, 3],
...                    'day': [4, 5]})
>>> pd.to_datetime(df)
0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

If a date does not meet the `timestamp limitations
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
#timeseries-timestamp-limits>`_, passing errors='ignore'
will return the original input instead of raising any exception.

Passing errors='coerce' will force an out-of-bounds date to NaT,
in addition to forcing non-dates (or non-parseable dates) to NaT.

>>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
datetime.datetime(1300, 1, 1, 0, 0)
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
NaT

Passing infer_datetime_format=True can often-times speedup a parsing
if its not an ISO8601 format exactly, but in a regular format.

>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000)
>>> s.head()
0    3/11/2000
1    3/12/2000
2    3/13/2000
3    3/11/2000
4    3/12/2000
dtype: object

>>> %timeit pd.to_datetime(s, infer_datetime_format=True)  # doctest: +SKIP
100 loops, best of 3: 10.4 ms per loop

>>> %timeit pd.to_datetime(s, infer_datetime_format=False)  # doctest: +SKIP
1 loop, best of 3: 471 ms per loop

Using a unix epoch time

>>> pd.to_datetime(1490195805, unit='s')
Timestamp('2017-03-22 15:16:45')
>>> pd.to_datetime(1490195805433502912, unit='ns')
Timestamp('2017-03-22 15:16:45.433502912')

.. warning:: For float arg, precision rounding might happen. To prevent
    unexpected behavior use a fixed-width exact type.

Using a non-unix epoch origin

>>> pd.to_datetime([1, 2, 3], unit='D',
...                origin=pd.Timestamp('1960-01-01'))
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None)

Comments