Concatenate strings from two or more columns into one column in pandas.DataFrame¶
Concatanete multiple columns into one column by¶
- using pandas.Series.str.cat
- using + operator
Here come with Constructing DataFrame¶
In [1]:
import datetime
import pandas as pd
df = pd.DataFrame(
data={'A': ['2012.11.27', '2012.11.28'],
'B': ['20:00', '00:00'],
'C': [82.131, 82.141],
'D': [82.156, 82.200],
'E': [82.129, 81.781],
'F': [82.137, 81.862],
'G': [177, 8163],
},
)
display(df)
Column A and Column B are date and time in string. It is pratical to read the strings from these two columns at each row together and convert them into datetime dtype. In this case to concatenate columns must be helpful.
Use str.cat method¶
When you want to combin columns, you can use pandas.Series.str.cat.
In [2]:
df['H'] = df['A'].str.cat(df['B'], sep=' ')
df
Out[2]:
Columns A and B are combined into new column H. The parameter 'sep' specifies a separator.
Use '+' operator¶
In [3]:
df['H'] = df['A'] + ' ' + df['B']
'+' operator gives the same result.
In [5]:
pd.Series.str.cat?
Signature: pd.Series.str.cat(self, others=None, sep=None, na_rep=None, join='left')
Docstring:
Concatenate strings in the Series/Index with given separator.
If `others` is specified, this function concatenates the Series/Index
and elements of `others` element-wise.
If `others` is not passed, then all values in the Series/Index are
concatenated into a single string with a given `sep`.
Parameters
----------
others : Series, Index, DataFrame, np.ndarray or list-like
Series, Index, DataFrame, np.ndarray (one- or two-dimensional) and
other list-likes of strings must have the same length as the
calling Series/Index, with the exception of indexed objects (i.e.
Series/Index/DataFrame) if `join` is not None.
If others is a list-like that contains a combination of Series,
Index or np.ndarray (1-dim), then all elements will be unpacked and
must satisfy the above criteria individually.
If others is None, the method returns the concatenation of all
strings in the calling Series/Index.
sep : str, default ''
The separator between the different elements/columns. By default
the empty string `''` is used.
na_rep : str or None, default None
Representation that is inserted for all missing values:
- If `na_rep` is None, and `others` is None, missing values in the
Series/Index are omitted from the result.
- If `na_rep` is None, and `others` is not None, a row containing a
missing value in any of the columns (before concatenation) will
have a missing value in the result.
join : {'left', 'right', 'outer', 'inner'}, default 'left'
Determines the join-style between the calling Series/Index and any
Series/Index/DataFrame in `others` (objects without an index need
to match the length of the calling Series/Index). To disable
alignment, use `.values` on any Series/Index/DataFrame in `others`.
.. versionadded:: 0.23.0
.. versionchanged:: 1.0.0
Changed default of `join` from None to `'left'`.
Returns
-------
str, Series or Index
If `others` is None, `str` is returned, otherwise a `Series/Index`
(same type as caller) of objects is returned.
See Also
--------
split : Split each string in the Series/Index.
join : Join lists contained as elements in the Series/Index.
Examples
--------
When not passing `others`, all values are concatenated into a single
string:
>>> s = pd.Series(['a', 'b', np.nan, 'd'])
>>> s.str.cat(sep=' ')
'a b d'
By default, NA values in the Series are ignored. Using `na_rep`, they
can be given a representation:
>>> s.str.cat(sep=' ', na_rep='?')
'a b ? d'
If `others` is specified, corresponding values are concatenated with
the separator. Result will be a Series of strings.
>>> s.str.cat(['A', 'B', 'C', 'D'], sep=',')
0 a,A
1 b,B
2 NaN
3 d,D
dtype: object
Missing values will remain missing in the result, but can again be
represented using `na_rep`
>>> s.str.cat(['A', 'B', 'C', 'D'], sep=',', na_rep='-')
0 a,A
1 b,B
2 -,C
3 d,D
dtype: object
If `sep` is not specified, the values are concatenated without
separation.
>>> s.str.cat(['A', 'B', 'C', 'D'], na_rep='-')
0 aA
1 bB
2 -C
3 dD
dtype: object
Series with different indexes can be aligned before concatenation. The
`join`-keyword works as in other methods.
>>> t = pd.Series(['d', 'a', 'e', 'c'], index=[3, 0, 4, 2])
>>> s.str.cat(t, join='left', na_rep='-')
0 aa
1 b-
2 -c
3 dd
dtype: object
>>>
>>> s.str.cat(t, join='outer', na_rep='-')
0 aa
1 b-
2 -c
3 dd
4 -e
dtype: object
>>>
>>> s.str.cat(t, join='inner', na_rep='-')
0 aa
2 -c
3 dd
dtype: object
>>>
>>> s.str.cat(t, join='right', na_rep='-')
3 dd
0 aa
4 -e
2 -c
dtype: object
Comments
Post a Comment