Reading and Writing Data (CSV, Excel, JSON)
Reading a CSV file
CSV is the most common format in analytics.
Read CSV
import pandas as pd
df = pd.read_csv("data/sales.csv")
print(df.head())Read CSV
import pandas as pd
df = pd.read_csv("data/sales.csv")
print(df.head())Useful read_csv()read_csv() options
read_csv options
import pandas as pd
df = pd.read_csv(
"data/sales.csv",
sep=",", # delimiter
encoding="utf-8", # encoding
na_values=["NA", "", "null"],
)
print(df.info())read_csv options
import pandas as pd
df = pd.read_csv(
"data/sales.csv",
sep=",", # delimiter
encoding="utf-8", # encoding
na_values=["NA", "", "null"],
)
print(df.info())Writing a CSV file
Write to CSV
import pandas as pd
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
df.to_csv("output/cleaned.csv", index=False)Write to CSV
import pandas as pd
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
df.to_csv("output/cleaned.csv", index=False)Reading Excel files
Excel needs an engine (often openpyxlopenpyxl).
Read Excel
import pandas as pd
df = pd.read_excel("data/sales.xlsx", sheet_name="Sheet1")
print(df.head())Read Excel
import pandas as pd
df = pd.read_excel("data/sales.xlsx", sheet_name="Sheet1")
print(df.head())Writing Excel
Write Excel
import pandas as pd
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
df.to_excel("output/cleaned.xlsx", index=False)Write Excel
import pandas as pd
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
df.to_excel("output/cleaned.xlsx", index=False)Reading and writing JSON
Reading JSON
Read JSON
import pandas as pd
df = pd.read_json("data/users.json")
print(df.head())Read JSON
import pandas as pd
df = pd.read_json("data/users.json")
print(df.head())Writing JSON
Write JSON
import pandas as pd
df = pd.DataFrame({"user": ["a", "b"], "score": [10, 20]})
df.to_json("output/users.json", orient="records", indent=2)Write JSON
import pandas as pd
df = pd.DataFrame({"user": ["a", "b"], "score": [10, 20]})
df.to_json("output/users.json", orient="records", indent=2)Practical checklist for file IO
- Always confirm shape:
df.shapedf.shape - Inspect columns:
df.columnsdf.columns - Preview:
df.head()df.head()anddf.sample(5)df.sample(5) - Validate types:
df.dtypesdf.dtypes - Watch for missing values:
df.isna().sum()df.isna().sum()
๐งช Try It Yourself
Exercise 1 โ Create a DataFrame
Exercise 2 โ Select a Column
Exercise 3 โ Filter Rows
If this helped you, consider buying me a coffee โ
Buy me a coffeeWas this page helpful?
Let us know how we did
