Date and time handling seem to always be a tiny bit frustrating. This is not
only because getting it right is a really hard thing to do but also because
there are many different ways of looking at the problem. In python, there are
lots of differen types and modules for handling dates and times, including the
builtin datetime
module, numpy
's datetime64
system and the many different
tools and abstractions that pandas
provides on top.
import numpy as np import pandas as pd import datetime
This datetime.datetime
objects is initialized based on seconds.
For example, this will work:
dt64 = np.datetime64('2015-02-19T15:20:15') datetime_cast = dt64.astype(datetime.datetime) datetime_cast
datetime.datetime(2015, 2, 19, 14, 20, 15)
However, if you start using types and time bases other than second and minutes,
the conversions will just not work, and return long
objects.
dt64 = np.datetime64('2015-02-19T15:20:15','ns') # nanoseconds datetime_cast = dt64.astype(datetime.datetime) datetime_cast
1424355615000000000L
I stumbled across this when working with pandas
timeseries.
pd.Datetimeindex
objects use datetime64[ns]
objects under the hood to that
if for some reason you were to create a timerange using pandas and work with
the underlying datetime64[ns]
array. A conversion to datetime.datetime
objects will no longer work directly.
Let's say we have this construct:
dates = pd.date_range('2012-1-1', periods=5, freq='D') dates
<class 'pandas.tseries.index.DatetimeIndex'> [2012-01-01, ..., 2012-01-05] Length: 5, Freq: D, Timezone: None
dates_dt64 = dates.values
dates_dt64
array(['2012-01-01T01:00:00.000000000+0100', '2012-01-02T01:00:00.000000000+0100', '2012-01-03T01:00:00.000000000+0100', '2012-01-04T01:00:00.000000000+0100', '2012-01-05T01:00:00.000000000+0100'], dtype='datetime64[ns]')
and we want see represent a value as a datetime.datetime
object.
As long as we stay within pandas, we're fine:
ts = dates[0] # scalar timestamp ts
Timestamp('2012-01-01 00:00:00', offset='D')
ts.to_datetime()
datetime.datetime(2012, 1, 1, 0, 0)
If we use the underlying values however:
dt64 = dates_dt64[0] # scalar datetime64 object dt = dt64.astype(datetime.datetime) # try cast dt
1325376000000000000L
We get stuck with a long
because dates_dt64
base is in nanoseconds.
ts.dtype
'org_babel_python_eoe'
The fix is easy when you know this is going on:
dates_dt64_s = dates_dt64.astype('datetime64[s]') dt64_s = dates_dt64_s[0] # single datetime63 object dt_seconds = dt64_s.astype(datetime.datetime) dt_seconds
datetime.datetime(2012, 1, 1, 0, 0)
Thank god that for most use cases, pandas handles all of these weird casts and conversions beautifully so that you don't need to know any of the above in 99% of your uses.