Data
Data types
ipyvizzu
currently supports two types of data series: dimensions and measures.
Dimensions slice the data cube ipyvizzu
uses, whereas measures are values
within the cube.
Dimensions are categorical series that can contain strings and numbers, but both
will be treated as strings. Temporal data such as dates or timestamps should
also be added as dimensions. By default, ipyvizzu
will draw the elements on
the chart in the order they are provided in the data set. Thus we suggest adding
temporal data in a sorted format from oldest to newest.
Measures at the moment can only be numerical.
Adding data
There are multiple ways you can add data to ipyvizzu
.
- Using pandas DataFrame
- Specify data by series - column after column if you think of a spreadsheet
- Specify data by records - row after row
- Using data cube form
- Using JSON
Tip
You should set the data in the first animate call.
chart.animate(data)
Genres | Kinds | Popularity |
---|---|---|
Pop | Hard | 114 |
Rock | Hard | 96 |
Jazz | Hard | 78 |
Metal | Hard | 52 |
Pop | Smooth | 56 |
Rock | Experimental | 36 |
Jazz | Smooth | 174 |
Metal | Smooth | 121 |
Pop | Experimental | 127 |
Rock | Experimental | 83 |
Jazz | Experimental | 94 |
Metal | Experimental | 58 |
Using pandas
DataFrame
Use
add_df
method for adding pandas
DataFrame to
Data
.
import pandas as pd
from ipyvizzu import Data
data = {
"Genres": [
"Pop",
"Rock",
"Jazz",
"Metal",
"Pop",
"Rock",
"Jazz",
"Metal",
"Pop",
"Rock",
"Jazz",
"Metal",
],
"Kinds": [
"Hard",
"Hard",
"Hard",
"Hard",
"Smooth",
"Experimental",
"Smooth",
"Smooth",
"Experimental",
"Experimental",
"Experimental",
"Experimental",
],
"Popularity": [
114,
96,
78,
52,
56,
36,
174,
121,
127,
83,
94,
58,
],
}
df = pd.DataFrame(data)
data = Data()
data.add_df(df)
Note
There is a max_rows
limit of 100k
for dataframes to prevent potential
browser memory issues. If your dataframe surpasses this limit, it will be
randomly sampled down. The default value can be adjusted via the max_rows
parameter of the add_df
function.
df = pd.DataFrame(data)
data = Data()
data.add_df(df, max_rows=110000)
Info
ipyvizzu
makes a difference between two types of data, numeric (measure)
and not numeric (dimension). A column's dtype
specifies that the column is
handled as a measure or as a dimension.
It is also possible to add the data frame's index as a series column while adding the data frame
import pandas as pd
from ipyvizzu import Data
df = pd.DataFrame(
{"Popularity": [114, 96, 78]}, index=["x", "y", "z"]
)
data = Data()
data.add_df(df, include_index="IndexColumnName")
or later with the
add_df_index
method.
import pandas as pd
from ipyvizzu import Data
df = pd.DataFrame(
{"Popularity": [114, 96, 78]}, index=["x", "y", "z"]
)
data = Data()
data.add_df_index(df, column_name="IndexColumnName")
data.add_df(df)
Note
If you want to work with pandas
DataFrame and ipyvizzu
, you need to
install pandas
or install it as an extra:
pip install ipyvizzu[pandas]
Using csv
Download music_data.csv
here.
import pandas as pd
from ipyvizzu import Data
df = pd.read_csv(
"https://ipyvizzu.vizzuhq.com/0.17/assets/data/music_data.csv"
)
data = Data()
data.add_df(df)
Using Excel spreadsheet
Download music_data.xlsx
here.
import pandas as pd
from ipyvizzu import Data
df = pd.read_excel(
"https://ipyvizzu.vizzuhq.com/0.17/assets/data/music_data.xlsx"
)
data = Data()
data.add_df(df)
Using Google Sheets
import pandas as pd
from ipyvizzu import Data
google_sheet_id = "<Google Sheet id>"
worksheet_name = "<Worksheet name>"
df = pd.read_csv(
f"https://docs.google.com/spreadsheets/d/{google_sheet_id}/gviz/tq?tqx=out:csv&sheet={worksheet_name}"
)
data = Data()
data.add_df(df)
For example if the url is
https://docs.google.com/spreadsheets/d/abcd1234/edit#gid=0
then
google_sheet_id
here is abcd1234
.
Using SQLite
import pandas as pd
import sqlite3
from ipyvizzu import Data
# establish a connection to the SQLite database
conn = sqlite3.connect("mydatabase.db")
# read data from a SQLite table into a pandas DataFrame
df = pd.read_sql("SELECT * FROM mytable", conn)
# close the connection
conn.close()
data = Data()
data.add_df(df)
Note
You'll need to adjust the SQL query and the database connection parameters to match your specific use case.
Using MySQL
import pandas as pd
import mysql.connector
from ipyvizzu import Data
# establish a connection to the MySQL database
conn = mysql.connector.connect(
user="myusername",
password="mypassword",
host="myhost",
database="mydatabase",
)
# read data from a MySQL table into a pandas DataFrame
df = pd.read_sql("SELECT * FROM mytable", con=conn)
# close the connection
conn.close()
data = Data()
data.add_df(df)
Note
You'll need to adjust the SQL query and the database connection parameters to match your specific use case.
Using PostgreSQL
import pandas as pd
import psycopg2
from ipyvizzu import Data
# establish a connection to the PostgreSQL database
conn = psycopg2.connect(
user="myusername",
password="mypassword",
host="myhost",
port="5432",
database="mydatabase",
)
# read data from a PostgreSQL table into a pandas DataFrame
df = pd.read_sql("SELECT * FROM mytable", con=conn)
# close the connection
conn.close()
data = Data()
data.add_df(df)
Note
You'll need to adjust the SQL query and the database connection parameters to match your specific use case.
Using Microsoft SQL Server
import pandas as pd
import pyodbc
from ipyvizzu import Data
# establish a connection to the Microsoft SQL Server database
conn = pyodbc.connect(
"Driver={SQL Server};"
"Server=myserver;"
"Database=mydatabase;"
"UID=myusername;"
"PWD=mypassword"
)
# read data from a SQL Server table into a pandas DataFrame
df = pd.read_sql("SELECT * FROM mytable", con=conn)
# close the connection
conn.close()
data = Data()
data.add_df(df)
Note
You'll need to adjust the SQL query and the database connection parameters to match your specific use case.
Using pyspark
DataFrame
Use
add_df
method for adding pyspark
DataFrame to
Data
.
from pyspark.sql import SparkSession
from pyspark.sql.types import (
StructType,
StructField,
StringType,
IntegerType,
)
from ipyvizzu import Data
spark = SparkSession.builder.appName("ipyvizzu").getOrCreate()
spark_schema = StructType(
[
StructField("Genres", StringType(), True),
StructField("Kinds", StringType(), True),
StructField("Popularity", IntegerType(), True),
]
)
spark_data = [
("Pop", "Hard", 114),
("Rock", "Hard", 96),
("Jazz", "Hard", 78),
("Metal", "Hard", 52),
("Pop", "Smooth", 56),
("Rock", "Experimental", 36),
("Jazz", "Smooth", 174),
("Metal", "Smooth", 121),
("Pop", "Experimental", 127),
("Rock", "Experimental", 83),
("Jazz", "Experimental", 94),
("Metal", "Experimental", 58),
]
df = spark.createDataFrame(spark_data, spark_schema)
data = Data()
data.add_df(df)
Note
If you want to work with pyspark
DataFrame and ipyvizzu
, you need to
install pyspark
or install it as an extra:
pip install ipyvizzu[pyspark]
Using numpy
Array
Use
add_np_array
method for adding numpy
Array to
Data
.
import numpy as np
from ipyvizzu import Data
numpy_array = np.array(
[
["Pop", "Hard", 114],
["Rock", "Hard", 96],
["Jazz", "Hard", 78],
["Metal", "Hard", 52],
["Pop", "Smooth", 56],
["Rock", "Experimental", 36],
["Jazz", "Smooth", 174],
["Metal", "Smooth", 121],
["Pop", "Experimental", 127],
["Rock", "Experimental", 83],
["Jazz", "Experimental", 94],
["Metal", "Experimental", 58],
]
)
data = Data()
data.add_np_array(
numpy_array,
column_name={0: "Genres", 1: "Kinds", 2: "Popularity"},
column_dtype={2: int},
)
Info
- Arrays with dimensions higher than 2 are not supported.
- If
column_name
dictionary is not added, column indices will be used as names. - If
column_dtype
dictionary is not added, every column will usenumpy_array.dtype
.
Note
If you want to work with numpy
Array and ipyvizzu
, you need to install
numpy
or install it as an extra:
pip install ipyvizzu[numpy]
Specify data by series
When you specify the data by series or by records, it has to be in first normal form. Here is an example of that:
from ipyvizzu import Data
data = Data()
data.add_series(
"Genres",
[
"Pop",
"Rock",
"Jazz",
"Metal",
"Pop",
"Rock",
"Jazz",
"Metal",
"Pop",
"Rock",
"Jazz",
"Metal",
],
type="dimension",
)
data.add_series(
"Kinds",
[
"Hard",
"Hard",
"Hard",
"Hard",
"Smooth",
"Experimental",
"Smooth",
"Smooth",
"Experimental",
"Experimental",
"Experimental",
"Experimental",
],
type="dimension",
)
data.add_series(
"Popularity",
[114, 96, 78, 52, 56, 36, 174, 121, 127, 83, 94, 58],
type="measure",
)
Specify data by records
from ipyvizzu import Data
data = Data()
data.add_series("Genres", type="dimension")
data.add_series("Kinds", type="dimension")
data.add_series("Popularity", type="measure")
record = ["Pop", "Hard", 114]
data.add_record(record)
records = [
["Rock", "Hard", 96],
["Jazz", "Hard", 78],
["Metal", "Hard", 52],
["Pop", "Smooth", 56],
["Rock", "Experimental", 36],
["Jazz", "Smooth", 174],
["Metal", "Smooth", 121],
["Pop", "Experimental", 127],
["Rock", "Experimental", 83],
["Jazz", "Experimental", 94],
["Metal", "Experimental", 58],
]
data.add_records(records)
Where records can be lists as shown above or dictionaries:
records = [
{
"Genres": "Pop",
"Kinds": "Hard",
"Popularity": 114,
},
{
"Genres": "Rock",
"Kinds": "Hard",
"Popularity": 96,
},
# ...
]
Using data cube form
Note
In the example below, the record Rock,Experimental,36
has been replaced
with Rock,Smooth,36
in order to illustrate that only data with same
dimensions can be used in the data cube form.
Genres | |||||
---|---|---|---|---|---|
Pop | Rock | Jazz | Metal | ||
Kinds | Hard | 114 | 96 | 78 | 52 |
Smooth | 56 | 36 | 74 | 121 | |
Experimental | 127 | 83 | 94 | 58 | |
Popularity |
from ipyvizzu import Data
data = Data()
data.add_dimension("Genres", ["Pop", "Rock", "Jazz", "Metal"])
data.add_dimension("Kinds", ["Hard", "Smooth", "Experimental"])
data.add_measure(
"Popularity",
[
[114, 96, 78, 52],
[56, 36, 174, 121],
[127, 83, 94, 58],
],
)
Using JSON
Download music_data.json
here (in this
example the data stored in the data cube form).
from ipyvizzu import Data
data = Data.from_json("../assets/data/music_data.json")