Loading experimental data with GlassPy

Author: Daniel R. Cassar

Introduction

GlassPy can load experimental data through its glasspy.data subpackage. Currently, SciGlass is the only available data source.

Basic Usage

The minimal example below loads SciGlass data into a pandas DataFrame using the default configuration, which includes most of the available data and metadata.

[1]:

from glasspy.data import SciGlass

source = SciGlass()
df = source.data

The first run may take a while, as GlassPy performs several computations to prepare the data. Subsequent runs will be significantly faster, since the data is cached locally on your machine.

[2]:

df

[2]:

	elements										...	property					metadata
	H	Li	Be	B	C	N	O	F	Na	Mg	...	SurfaceTensionAboveTg	SurfaceTension1173K	SurfaceTension1473K	SurfaceTension1573K	SurfaceTension1673K	ChemicalAnalysis	Author	Year	NumberElements	NumberCompounds
ID
20400020000	0.0	0.0	0.0	0.000000	0.0	0.0	0.666667	0.0	0.000000	0.000000	...	NaN	NaN	NaN	NaN	NaN	False	Volarovich M.P.	1936	2	1
20500020001	0.0	0.0	0.0	0.000000	0.0	0.0	0.579213	0.0	0.196815	0.000000	...	NaN	NaN	NaN	NaN	NaN	False	Hoj J.W.	1992	5	4
20500020002	0.0	0.0	0.0	0.000000	0.0	0.0	0.580869	0.0	0.193449	0.000000	...	NaN	NaN	NaN	NaN	NaN	False	Hoj J.W.	1992	5	4
20500020003	0.0	0.0	0.0	0.000000	0.0	0.0	0.581986	0.0	0.187167	0.000000	...	NaN	NaN	NaN	NaN	NaN	False	Hoj J.W.	1992	5	4
20500020004	0.0	0.0	0.0	0.000000	0.0	0.0	0.583672	0.0	0.183080	0.000000	...	NaN	NaN	NaN	NaN	NaN	False	Hoj J.W.	1992	5	4
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
4493300611694	0.0	0.0	0.0	0.000000	0.0	0.0	0.625485	0.0	0.000000	0.049125	...	NaN	NaN	NaN	NaN	NaN	False	Murata T.	2019	7	6
4493300611695	0.0	0.0	0.0	0.001948	0.0	0.0	0.637540	0.0	0.000000	0.009932	...	NaN	NaN	NaN	NaN	NaN	False	Murata T.	2019	10	9
4493300611696	0.0	0.0	0.0	0.000000	0.0	0.0	0.635921	0.0	0.000000	0.000000	...	NaN	NaN	NaN	NaN	NaN	False	Murata T.	2019	8	7
4493300611697	0.0	0.0	0.0	0.014544	0.0	0.0	0.622226	0.0	0.035890	0.000000	...	NaN	NaN	NaN	NaN	NaN	False	Murata T.	2019	9	8
4493300611698	0.0	0.0	0.0	0.041532	0.0	0.0	0.634462	0.0	0.000000	0.000487	...	NaN	NaN	NaN	NaN	NaN	False	Murata T.	2019	7	6

283102 rows × 793 columns

To avoid naming conflicts and simplify navigation, the DataFrame is organized into two levels. The first level groups information by composition, property, or metadata.

[3]:

print(df.columns.levels[0])

Index(['elements', 'compounds', 'property', 'metadata'], dtype='str')

To explore the chemical composition data, simply filter the DataFrame by the compounds or elements level.

[4]:

els = df["elements"]

els

[4]:

	H	Li	Be	B	C	N	O	F	Na	Mg	...	W	Re	Pt	Au	Hg	Tl	Pb	Bi	Th	U
ID
20400020000	0.0	0.0	0.0	0.000000	0.0	0.0	0.666667	0.0	0.000000	0.000000	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
20500020001	0.0	0.0	0.0	0.000000	0.0	0.0	0.579213	0.0	0.196815	0.000000	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
20500020002	0.0	0.0	0.0	0.000000	0.0	0.0	0.580869	0.0	0.193449	0.000000	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
20500020003	0.0	0.0	0.0	0.000000	0.0	0.0	0.581986	0.0	0.187167	0.000000	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
20500020004	0.0	0.0	0.0	0.000000	0.0	0.0	0.583672	0.0	0.183080	0.000000	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
4493300611694	0.0	0.0	0.0	0.000000	0.0	0.0	0.625485	0.0	0.000000	0.049125	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4493300611695	0.0	0.0	0.0	0.001948	0.0	0.0	0.637540	0.0	0.000000	0.009932	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4493300611696	0.0	0.0	0.0	0.000000	0.0	0.0	0.635921	0.0	0.000000	0.000000	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4493300611697	0.0	0.0	0.0	0.014544	0.0	0.0	0.622226	0.0	0.035890	0.000000	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4493300611698	0.0	0.0	0.0	0.041532	0.0	0.0	0.634462	0.0	0.000000	0.000487	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

283102 rows × 76 columns

The example below shows how to retrieve \(T_g\) data from the property level.

[5]:

Tg = df["property"]["Tg"]

Tg

[5]:

ID
20400020000          NaN
20500020001      1017.15
20500020002      1096.15
20500020003      1013.15
20500020004      1013.15
                  ...
4493300611694        NaN
4493300611695        NaN
4493300611696        NaN
4493300611697        NaN
4493300611698        NaN
Name: Tg, Length: 283102, dtype: float64

As you can see, not all entries have a value for \(T_g\).

To check for all available properties in GlassPy, run:

[6]:

print(SciGlass.available_properties())

['T0', 'T1', 'T2', 'T3', 'T4', 'T5', 'T6', 'T7', 'T8', 'T9', 'T10', 'T11', 'T12', 'Viscosity773K', 'Viscosity873K', 'Viscosity973K', 'Viscosity1073K', 'Viscosity1173K', 'Viscosity1273K', 'Viscosity1373K', 'Viscosity1473K', 'Viscosity1573K', 'Viscosity1673K', 'Viscosity1773K', 'Viscosity1873K', 'Viscosity2073K', 'Viscosity2273K', 'Viscosity2473K', 'Tg', 'Tmelt', 'Tliquidus', 'TLittletons', 'TAnnealing', 'Tstrain', 'Tsoft', 'TdilatometricSoftening', 'AbbeNum', 'RefractiveIndex', 'RefractiveIndexLow', 'RefractiveIndexHigh', 'MeanDispersion', 'Permittivity', 'TangentOfLossAngle', 'TresistivityIs1MOhm.m', 'Resistivity293K', 'Resistivity373K', 'Resistivity423K', 'Resistivity573K', 'Resistivity1073K', 'Resistivity1273K', 'Resistivity1473K', 'Resistivity1673K', 'YoungModulus', 'ShearModulus', 'Microhardness', 'PoissonRatio', 'Density293K', 'Density1073K', 'Density1273K', 'Density1473K', 'Density1673K', 'ThermalConductivity', 'ThermalShockRes', 'CTEbelowTg', 'CTE328K', 'CTE373K', 'CTE433K', 'CTE483K', 'CTE623K', 'Cp293K', 'Cp473K', 'Cp673K', 'Cp1073K', 'Cp1273K', 'Cp1473K', 'Cp1673K', 'NucleationTemperature', 'NucleationRate', 'TMaxGrowthVelocity', 'MaxGrowthVelocity', 'CrystallizationPeak', 'CrystallizationOnset', 'SurfaceTensionAboveTg', 'SurfaceTension1173K', 'SurfaceTension1473K', 'SurfaceTension1573K', 'SurfaceTension1673K']

If you are unfamiliar with pandas DataFrames, refer to the pandas documentation.

Controlling the Initial Data Load

Loading the complete SciGlass dataset can be time-consuming, so it is advisable to load only the data you need. You can control what is loaded by passing configuration dictionaries to the SciGlass class.

For example, suppose you want to exclude glasses containing silver or gold, retrieve only glass transition temperature data, and omit compound information. You can do so as follows:

[7]:

all_properties_except_Tg = SciGlass.available_properties()
all_properties_except_Tg.remove("Tg")

config_el = {
    "drop": ["Ag", "Au"],
}

config_prop = {
    "keep": ["Tg"],
    "drop": all_properties_except_Tg,
}

config_comp = {}

source = SciGlass(
    elements_cfg=config_el,
    properties_cfg=config_prop,
    compounds_cfg=config_comp,
)

df = source.data
df

[7]:

	elements																property	metadata
	H	Li	Be	B	C	N	O	F	Na	Mg	...	Tl	Pb	Bi	Th	U	Tg	ChemicalAnalysis	Author	Year	NumberElements
ID
20500020001	0.0	0.000000	0.0	0.000000	0.0	0.0	57.921249	0.0	19.681530	0.0	...	0.000000	0.0	0.000000	0.0	0.0	1017.15	False	Hoj J.W.	1992	5
20500020002	0.0	0.000000	0.0	0.000000	0.0	0.0	58.086941	0.0	19.344940	0.0	...	0.000000	0.0	0.000000	0.0	0.0	1096.15	False	Hoj J.W.	1992	5
20500020003	0.0	0.000000	0.0	0.000000	0.0	0.0	58.198601	0.0	18.716690	0.0	...	0.000000	0.0	0.000000	0.0	0.0	1013.15	False	Hoj J.W.	1992	5
20500020004	0.0	0.000000	0.0	0.000000	0.0	0.0	58.367241	0.0	18.308001	0.0	...	0.000000	0.0	0.000000	0.0	0.0	1013.15	False	Hoj J.W.	1992	5
20500020005	0.0	0.000000	0.0	0.000000	0.0	0.0	58.282768	0.0	18.264561	0.0	...	0.000000	0.0	0.000000	0.0	0.0	978.15	False	Hoj J.W.	1992	5
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
4493200611415	0.0	7.250638	0.0	2.368801	0.0	0.0	59.389221	0.0	0.000000	0.0	...	8.964828	0.0	5.536447	0.0	0.0	543.15	False	Jung Woo Man	2019	9
4493200611416	0.0	7.445931	0.0	2.358826	0.0	0.0	59.595871	0.0	0.000000	0.0	...	6.650183	0.0	5.808963	0.0	0.0	545.15	False	Jung Woo Man	2019	9
4493200611417	0.0	6.593068	0.0	10.288480	0.0	0.0	59.600090	0.0	0.000000	0.0	...	10.782570	0.0	0.000000	0.0	0.0	532.15	False	Jung Woo Man	2019	9
4493200611418	0.0	5.919064	0.0	1.936039	0.0	0.0	64.014076	0.0	0.000000	0.0	...	7.322553	0.0	0.000000	0.0	0.0	506.15	False	Jung Woo Man	2019	9
4493200611419	0.0	6.371798	0.0	2.019926	0.0	0.0	63.761761	0.0	0.000000	0.0	...	7.882636	0.0	0.000000	0.0	0.0	522.15	False	Jung Woo Man	2019	9

91738 rows × 78 columns

See the documentation for the SciGlass class for more information on how to control your initial data collection.