Skip to main content

Concatenate strings from two or more columns into one column in pandas.DataFrame

Concatenate in pandas

Concatenate strings from two or more columns into one column in pandas.DataFrame

Concatanete multiple columns into one column by

  • using pandas.Series.str.cat
  • using + operator

Here come with Constructing DataFrame

In [1]:
import datetime
import pandas as pd

df = pd.DataFrame(
    data={'A': ['2012.11.27', '2012.11.28'], 
          'B': ['20:00', '00:00'], 
          'C': [82.131, 82.141], 
          'D': [82.156, 82.200],
          'E': [82.129, 81.781],
          'F': [82.137, 81.862],
          'G': [177, 8163],
         },
)

display(df)
A B C D E F G
0 2012.11.27 20:00 82.131 82.156 82.129 82.137 177
1 2012.11.28 00:00 82.141 82.200 81.781 81.862 8163

Column A and Column B are date and time in string. It is pratical to read the strings from these two columns at each row together and convert them into datetime dtype. In this case to concatenate columns must be helpful.

Use str.cat method

When you want to combin columns, you can use pandas.Series.str.cat.

In [2]:
df['H'] = df['A'].str.cat(df['B'], sep=' ')
df
Out[2]:
A B C D E F G H
0 2012.11.27 20:00 82.131 82.156 82.129 82.137 177 2012.11.27 20:00
1 2012.11.28 00:00 82.141 82.200 81.781 81.862 8163 2012.11.28 00:00

Columns A and B are combined into new column H. The parameter 'sep' specifies a separator.

Use '+' operator

In [3]:
df['H'] = df['A'] + ' ' + df['B']

'+' operator gives the same result.

In [5]:
pd.Series.str.cat?
Signature: pd.Series.str.cat(self, others=None, sep=None, na_rep=None, join='left')
Docstring:
Concatenate strings in the Series/Index with given separator.

If `others` is specified, this function concatenates the Series/Index
and elements of `others` element-wise.
If `others` is not passed, then all values in the Series/Index are
concatenated into a single string with a given `sep`.

Parameters
----------
others : Series, Index, DataFrame, np.ndarray or list-like
    Series, Index, DataFrame, np.ndarray (one- or two-dimensional) and
    other list-likes of strings must have the same length as the
    calling Series/Index, with the exception of indexed objects (i.e.
    Series/Index/DataFrame) if `join` is not None.

    If others is a list-like that contains a combination of Series,
    Index or np.ndarray (1-dim), then all elements will be unpacked and
    must satisfy the above criteria individually.

    If others is None, the method returns the concatenation of all
    strings in the calling Series/Index.
sep : str, default ''
    The separator between the different elements/columns. By default
    the empty string `''` is used.
na_rep : str or None, default None
    Representation that is inserted for all missing values:

    - If `na_rep` is None, and `others` is None, missing values in the
      Series/Index are omitted from the result.
    - If `na_rep` is None, and `others` is not None, a row containing a
      missing value in any of the columns (before concatenation) will
      have a missing value in the result.
join : {'left', 'right', 'outer', 'inner'}, default 'left'
    Determines the join-style between the calling Series/Index and any
    Series/Index/DataFrame in `others` (objects without an index need
    to match the length of the calling Series/Index). To disable
    alignment, use `.values` on any Series/Index/DataFrame in `others`.

    .. versionadded:: 0.23.0
    .. versionchanged:: 1.0.0
        Changed default of `join` from None to `'left'`.

Returns
-------
str, Series or Index
    If `others` is None, `str` is returned, otherwise a `Series/Index`
    (same type as caller) of objects is returned.

See Also
--------
split : Split each string in the Series/Index.
join : Join lists contained as elements in the Series/Index.

Examples
--------
When not passing `others`, all values are concatenated into a single
string:

>>> s = pd.Series(['a', 'b', np.nan, 'd'])
>>> s.str.cat(sep=' ')
'a b d'

By default, NA values in the Series are ignored. Using `na_rep`, they
can be given a representation:

>>> s.str.cat(sep=' ', na_rep='?')
'a b ? d'

If `others` is specified, corresponding values are concatenated with
the separator. Result will be a Series of strings.

>>> s.str.cat(['A', 'B', 'C', 'D'], sep=',')
0    a,A
1    b,B
2    NaN
3    d,D
dtype: object

Missing values will remain missing in the result, but can again be
represented using `na_rep`

>>> s.str.cat(['A', 'B', 'C', 'D'], sep=',', na_rep='-')
0    a,A
1    b,B
2    -,C
3    d,D
dtype: object

If `sep` is not specified, the values are concatenated without
separation.

>>> s.str.cat(['A', 'B', 'C', 'D'], na_rep='-')
0    aA
1    bB
2    -C
3    dD
dtype: object

Series with different indexes can be aligned before concatenation. The
`join`-keyword works as in other methods.

>>> t = pd.Series(['d', 'a', 'e', 'c'], index=[3, 0, 4, 2])
>>> s.str.cat(t, join='left', na_rep='-')
0    aa
1    b-
2    -c
3    dd
dtype: object
>>>
>>> s.str.cat(t, join='outer', na_rep='-')
0    aa
1    b-
2    -c
3    dd
4    -e
dtype: object
>>>
>>> s.str.cat(t, join='inner', na_rep='-')
0    aa
2    -c
3    dd
dtype: object
>>>
>>> s.str.cat(t, join='right', na_rep='-')
3    dd
0    aa
4    -e
2    -c
dtype: object

Comments