Given the following CSV
+-------------------------------+-------------+--------------------+--------------+
| Timestamp | DoublePoint | HexPoint | BooleanPoint |
+-------------------------------+-------------+--------------------+--------------+
| 07/23/2019 16:53:12.523-07:00 | 0.0 | 0x0000000000000001 | True |
| 07/23/2019 16:53:14.519-07:00 | 0.0 | 0x0000000000000002 | False |
| 07/23/2019 16:53:16.516-07:00 | 0.25 | 0x0000000000000003 | true |
| 07/23/2019 16:53:18.513-07:00 | 0.25 | 0x00000004 | false |
| 07/23/2019 16:53:20.526-07:00 | 0.0 | 0x00000005 | True |
| 07/23/2019 16:53:22.522-07:00 | 0.50 | 0x00000006 | False |
| 07/23/2019 16:53:24.519-07:00 | 0.5 | 0x00000007 | True |
| 07/23/2019 16:53:26.516-07:00 | 0.9999 | 0x00000008 | False |
+-------------------------------+-------------+--------------------+--------------+
I need to read it with the pandas library and get a DataFrame where all the columns, except the first one, are float. For numbers, this should be automatic, but for other types of input as HexPoint and BooleanPoint I need to provide a conversion function to convert them to numbers.
In this example, the HexPoint values should be converted to decimal and the BooleanPoints should convert True/true to 1 and False/false to 0.
So the resulting DataFrame should look like this:
+-------------------------------+-------------+----------+--------------+
| Timestamp | DoublePoint | HexPoint | BooleanPoint |
+-------------------------------+-------------+----------+--------------+
| 07/23/2019 16:53:12.523-07:00 | 0.0 | 1.0 | 1.0 |
| 07/23/2019 16:53:14.519-07:00 | 0.0 | 2.0 | 0.0 |
| 07/23/2019 16:53:16.516-07:00 | 0.25 | 3.0 | 1.0 |
| 07/23/2019 16:53:18.513-07:00 | 0.25 | 4.0 | 0.0 |
| 07/23/2019 16:53:20.526-07:00 | 0.0 | 5.0 | 1.0 |
| 07/23/2019 16:53:22.522-07:00 | 0.50 | 6.0 | 0.0 |
| 07/23/2019 16:53:24.519-07:00 | 0.5 | 7.0 | 1.0 |
| 07/23/2019 16:53:26.516-07:00 | 0.9999 | 8.0 | 0.0 |
+-------------------------------+-------------+----------+--------------+
Important considerations:
Is there a way to tell pandas to read this CSV and try to convert all columns (except the first one) to float. And when it can't do that natively, run a custom function that would take the value and return its number representation as mentioned above?
This should do the trick.
def convert_to_float(_):
try:
return float((False, True)[_.lower() == "true"])
except:
return float(_)
converters = {_: convert_to_float for _ in pd.read_csv(filename, nrows=1).columns[1:]}
pd.read_csv(filename, converters=converters)
Hex, boolean and double values like the ones present in your table can directly be converted to float using the float()
method in python.
Try this :
import pandas as pd
df = pd.read_csv("data.csv")
column_names = df.columns.tolist()
column_names.remove("Timestamp")
print(df)
print(df.dtypes)
print(type(df["DoublePoint"]))
for name in column_names:
try:
df[name] = df[name].astype(float)
except ValueError:
df[name] = df[name].apply(lambda x: float(int(x, 16)))
print(df)
print(df.dtypes)
Also, in your input df I see true/false is present in small case in 2 values which I think is not correct. If its correct you need to change them to True/False
as in rest of the values.
User contributions licensed under CC BY-SA 3.0