Skip to content

Reading and Writing Data (CSV, Excel, JSON)

Reading a CSV file

CSV is the most common format in analytics.

Read CSV
import pandas as pd
 
df = pd.read_csv("data/sales.csv")
print(df.head())
Read CSV
import pandas as pd
 
df = pd.read_csv("data/sales.csv")
print(df.head())

Useful read_csv()read_csv() options

read_csv options
import pandas as pd
 
df = pd.read_csv(
    "data/sales.csv",
    sep=",",              # delimiter
    encoding="utf-8",     # encoding
    na_values=["NA", "", "null"],
)
 
print(df.info())
read_csv options
import pandas as pd
 
df = pd.read_csv(
    "data/sales.csv",
    sep=",",              # delimiter
    encoding="utf-8",     # encoding
    na_values=["NA", "", "null"],
)
 
print(df.info())

Writing a CSV file

Write to CSV
import pandas as pd
 
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
 
df.to_csv("output/cleaned.csv", index=False)
Write to CSV
import pandas as pd
 
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
 
df.to_csv("output/cleaned.csv", index=False)

Reading Excel files

Excel needs an engine (often openpyxlopenpyxl).

Read Excel
import pandas as pd
 
df = pd.read_excel("data/sales.xlsx", sheet_name="Sheet1")
print(df.head())
Read Excel
import pandas as pd
 
df = pd.read_excel("data/sales.xlsx", sheet_name="Sheet1")
print(df.head())

Writing Excel

Write Excel
import pandas as pd
 
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
 
df.to_excel("output/cleaned.xlsx", index=False)
Write Excel
import pandas as pd
 
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
 
df.to_excel("output/cleaned.xlsx", index=False)

Reading and writing JSON

Reading JSON

Read JSON
import pandas as pd
 
df = pd.read_json("data/users.json")
print(df.head())
Read JSON
import pandas as pd
 
df = pd.read_json("data/users.json")
print(df.head())

Writing JSON

Write JSON
import pandas as pd
 
df = pd.DataFrame({"user": ["a", "b"], "score": [10, 20]})
 
df.to_json("output/users.json", orient="records", indent=2)
Write JSON
import pandas as pd
 
df = pd.DataFrame({"user": ["a", "b"], "score": [10, 20]})
 
df.to_json("output/users.json", orient="records", indent=2)

Practical checklist for file IO

  • Always confirm shape: df.shapedf.shape
  • Inspect columns: df.columnsdf.columns
  • Preview: df.head()df.head() and df.sample(5)df.sample(5)
  • Validate types: df.dtypesdf.dtypes
  • Watch for missing values: df.isna().sum()df.isna().sum()

๐Ÿงช Try It Yourself

Exercise 1 โ€“ Create a DataFrame

Exercise 2 โ€“ Select a Column

Exercise 3 โ€“ Filter Rows

If this helped you, consider buying me a coffee โ˜•

Buy me a coffee

Was this page helpful?

Let us know how we did