Concatenate strings from two or more columns into one column in pandas.DataFrame¶

Concatanete multiple columns into one column by¶

using pandas.Series.str.cat
using + operator

Here come with Constructing DataFrame¶

import datetime
import pandas as pd

df = pd.DataFrame(
    data={'A': ['2012.11.27', '2012.11.28'], 
          'B': ['20:00', '00:00'], 
          'C': [82.131, 82.141], 
          'D': [82.156, 82.200],
          'E': [82.129, 81.781],
          'F': [82.137, 81.862],
          'G': [177, 8163],
         },
)

display(df)

Column A and Column B are date and time in string. It is pratical to read the strings from these two columns at each row together and convert them into datetime dtype. In this case to concatenate columns must be helpful.

Use str.cat method¶

When you want to combin columns, you can use pandas.Series.str.cat.

df['H'] = df['A'].str.cat(df['B'], sep=' ')
df

Columns A and B are combined into new column H. The parameter 'sep' specifies a separator.

Use '+' operator¶

df['H'] = df['A'] + ' ' + df['B']

'+' operator gives the same result.

pd.Series.str.cat?

Signature: pd.Series.str.cat(self, others=None, sep=None, na_rep=None, join='left')
Docstring:
Concatenate strings in the Series/Index with given separator.

If `others` is specified, this function concatenates the Series/Index
and elements of `others` element-wise.
If `others` is not passed, then all values in the Series/Index are
concatenated into a single string with a given `sep`.

Parameters
----------
others : Series, Index, DataFrame, np.ndarray or list-like
    Series, Index, DataFrame, np.ndarray (one- or two-dimensional) and
    other list-likes of strings must have the same length as the
    calling Series/Index, with the exception of indexed objects (i.e.
    Series/Index/DataFrame) if `join` is not None.

    If others is a list-like that contains a combination of Series,
    Index or np.ndarray (1-dim), then all elements will be unpacked and
    must satisfy the above criteria individually.

    If others is None, the method returns the concatenation of all
    strings in the calling Series/Index.
sep : str, default ''
    The separator between the different elements/columns. By default
    the empty string `''` is used.
na_rep : str or None, default None
    Representation that is inserted for all missing values:

    - If `na_rep` is None, and `others` is None, missing values in the
      Series/Index are omitted from the result.
    - If `na_rep` is None, and `others` is not None, a row containing a
      missing value in any of the columns (before concatenation) will
      have a missing value in the result.
join : {'left', 'right', 'outer', 'inner'}, default 'left'
    Determines the join-style between the calling Series/Index and any
    Series/Index/DataFrame in `others` (objects without an index need
    to match the length of the calling Series/Index). To disable
    alignment, use `.values` on any Series/Index/DataFrame in `others`.

    .. versionadded:: 0.23.0
    .. versionchanged:: 1.0.0
        Changed default of `join` from None to `'left'`.

Returns
-------
str, Series or Index
    If `others` is None, `str` is returned, otherwise a `Series/Index`
    (same type as caller) of objects is returned.

See Also
--------
split : Split each string in the Series/Index.
join : Join lists contained as elements in the Series/Index.

Examples
--------
When not passing `others`, all values are concatenated into a single
string:

>>> s = pd.Series(['a', 'b', np.nan, 'd'])
>>> s.str.cat(sep=' ')
'a b d'

By default, NA values in the Series are ignored. Using `na_rep`, they
can be given a representation:

>>> s.str.cat(sep=' ', na_rep='?')
'a b ? d'

If `others` is specified, corresponding values are concatenated with
the separator. Result will be a Series of strings.

>>> s.str.cat(['A', 'B', 'C', 'D'], sep=',')
0    a,A
1    b,B
2    NaN
3    d,D
dtype: object

Missing values will remain missing in the result, but can again be
represented using `na_rep`

>>> s.str.cat(['A', 'B', 'C', 'D'], sep=',', na_rep='-')
0    a,A
1    b,B
2    -,C
3    d,D
dtype: object

If `sep` is not specified, the values are concatenated without
separation.

>>> s.str.cat(['A', 'B', 'C', 'D'], na_rep='-')
0    aA
1    bB
2    -C
3    dD
dtype: object

Series with different indexes can be aligned before concatenation. The
`join`-keyword works as in other methods.

>>> t = pd.Series(['d', 'a', 'e', 'c'], index=[3, 0, 4, 2])
>>> s.str.cat(t, join='left', na_rep='-')
0    aa
1    b-
2    -c
3    dd
dtype: object
>>>
>>> s.str.cat(t, join='outer', na_rep='-')
0    aa
1    b-
2    -c
3    dd
4    -e
dtype: object
>>>
>>> s.str.cat(t, join='inner', na_rep='-')
0    aa
2    -c
3    dd
dtype: object
>>>
>>> s.str.cat(t, join='right', na_rep='-')
3    dd
0    aa
4    -e
2    -c
dtype: object

MeePythoner

Search This Blog

Concatenate strings from two or more columns into one column in pandas.DataFrame

Concatenate strings from two or more columns into one column in pandas.DataFrame¶

Concatanete multiple columns into one column by¶

Here come with Constructing DataFrame¶

Use str.cat method¶

Use '+' operator¶

Labels

Comments

Post a Comment

Popular posts from this blog

Schemdraw AttributeError: 'Text' object has no property 'math_fontfamily'

Iterator object in Python

Convert string to datetime in Pandas DataFrame, python

scipy.optimize.curve_fit() in Python

DOT Language with Graphviz to render in Pyhton

Downsampling using convolutional neural networks

	A	B	C	D	E	F	G
0	2012.11.27	20:00	82.131	82.156	82.129	82.137	177
1	2012.11.28	00:00	82.141	82.200	81.781	81.862	8163