Overview

Dataset statistics

Number of variables3
Number of observations208
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.2 KiB
Average record size in memory20.6 B

Variable types

NUM2
DATE1

Reproduction

Analysis started2020-08-18 00:53:31.669443
Analysis finished2020-08-18 00:53:36.822824
Duration5.15 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

df_index has unique values Unique
Predicted Recoveries has unique values Unique
Recoveries has 146 (70.2%) zeros Zeros

Variables

df_index
Date

UNIQUE

Distinct count208
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
Minimum2020-01-22 00:00:00
Maximum2020-08-16 00:00:00
Histogram

Predicted Recoveries
Real number (ℝ≥0)

UNIQUE

Distinct count208
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3908618.776227494
Minimum28.0
Maximum16135672.557126561
Zeros0
Zeros (%)0.0%
Memory size1.8 KiB

Quantile statistics

Minimum28
5-th percentile223.6087156
Q170721.85654
median1945895.912
Q36646414.435
95-th percentile13697659.64
Maximum16135672.56
Range16135644.56
Interquartile range (IQR)6575692.578

Descriptive statistics

Standard deviation4610091.863
Coefficient of variation (CV)1.179468279
Kurtosis-0.02410527068
Mean3908618.776
Median Absolute Deviation (MAD)1935489.208
Skewness1.066448496
Sum812992705.5
Variance2.125294698e+13
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3130541.01510.5%
 
1138837.07410.5%
 
47865.5696810.5%
 
32433.8722610.5%
 
16926.5414610.5%
 
4404448.15110.5%
 
35292.331210.5%
 
18878.3835610.5%
 
181478.165410.5%
 
8798900.74910.5%
 
7110558.99310.5%
 
1408564.18710.5%
 
1075079.57410.5%
 
122274.061310.5%
 
10500163.910.5%
 
1494317310.5%
 
2130076.47510.5%
 
566336.594610.5%
 
1618808.29410.5%
 
12046358.8110.5%
 
3785438.08610.5%
 
61549.1867810.5%
 
6616461.44210.5%
 
8052395.15510.5%
 
1982852.90510.5%
 
Other values (183)18388.0%
 
ValueCountFrequency (%) 
2810.5%
 
3010.5%
 
3610.5%
 
3910.5%
 
5210.5%
 
6110.5%
 
10710.5%
 
12610.5%
 
14310.5%
 
171.8410.5%
 
ValueCountFrequency (%) 
16135672.5610.5%
 
15891623.710.5%
 
15650347.610.5%
 
15412377.310.5%
 
15176905.4510.5%
 
1494317310.5%
 
14707088.5910.5%
 
14470478.9310.5%
 
14234930.7310.5%
 
14003488.5810.5%
 

Recoveries
Real number (ℝ≥0)

ZEROS

Distinct count63
Unique (%)30.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9523.95673076923
Minimum0
Maximum98334
Zeros146
Zeros (%)70.2%
Memory size960.0 B

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3237.5
95-th percentile69576.55
Maximum98334
Range98334
Interquartile range (IQR)237.5

Descriptive statistics

Standard deviation22778.61355
Coefficient of variation (CV)2.391717455
Kurtosis4.999526121
Mean9523.956731
Median Absolute Deviation (MAD)0
Skewness2.473596109
Sum1980983
Variance518865235.4
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
014670.2%
 
5210.5%
 
10710.5%
 
394610.5%
 
3671110.5%
 
2288610.5%
 
112410.5%
 
2339410.5%
 
9149910.5%
 
85210.5%
 
468310.5%
 
1258310.5%
 
6110.5%
 
5586510.5%
 
261610.5%
 
8725610.5%
 
1086510.5%
 
3910.5%
 
3610.5%
 
4560210.5%
 
3010.5%
 
2810.5%
 
6069410.5%
 
4822810.5%
 
1435210.5%
 
Other values (38)3818.3%
 
ValueCountFrequency (%) 
014670.2%
 
2810.5%
 
3010.5%
 
3610.5%
 
3910.5%
 
5210.5%
 
6110.5%
 
10710.5%
 
12610.5%
 
14310.5%
 
ValueCountFrequency (%) 
9833410.5%
 
9770410.5%
 
9149910.5%
 
8725610.5%
 
8485410.5%
 
8320710.5%
 
8084010.5%
 
7808810.5%
 
7603410.5%
 
7262410.5%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexPredicted RecoveriesRecoveries
02020-01-2228.0028
12020-01-2330.0030
22020-01-2436.0036
32020-01-2539.0039
42020-01-2652.0052
52020-01-2761.0061
62020-01-28107.00107
72020-01-29126.00126
82020-01-30143.00143
92020-01-31171.84222

Last rows

df_indexPredicted RecoveriesRecoveries
1982020-08-071.400349e+070
1992020-08-081.423493e+070
2002020-08-091.447048e+070
2012020-08-101.470709e+070
2022020-08-111.494317e+070
2032020-08-121.517691e+070
2042020-08-131.541238e+070
2052020-08-141.565035e+070
2062020-08-151.589162e+070
2072020-08-161.613567e+070