Pandas Profiling Report

Dataset statistics

Number of variables	3
Number of observations	208
Missing cells	146
Missing cells (%)	23.4%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	5.0 KiB
Average record size in memory	24.6 B

Variable types

NUM	2
DATE	1

Reproduction

Analysis started	2020-08-18 00:53:46.889499
Analysis finished	2020-08-18 00:53:50.186742
Duration	3.3 seconds
Version	pandas-profiling v2.8.0
Command line	`pandas_profiling --config_file config.yaml [YOUR_FILE.csv]`
Download configuration	config.yaml

Warnings

`Recoveries` has 146 (70.2%) missing values	Missing
`df_index` has unique values	Unique
`Predicted Recoveries` has 9 (4.3%) zeros	Zeros
`Recoveries` has 23 (11.1%) zeros	Zeros

df_index
Date

UNIQUE

Distinct count	208
Unique (%)	100.0%
Missing	0
Missing (%)	0.0%
Memory size	1.8 KiB

Minimum	2020-01-22 00:00:00
Maximum	2020-08-16 00:00:00

Histogram

Histogram

Predicted Recoveries
Real number (ℝ_≥0)

ZEROS

Distinct count	200
Unique (%)	96.2%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%

Mean	1059029.6129939533
Minimum	0.0
Maximum	4164682.9299011193
Zeros	9
Zeros (%)	4.3%
Memory size	1.8 KiB

Quantile statistics

Minimum	0
5-th percentile	0.18079005
Q1	34.41636553
median	596634.8526
Q3	1852301.542
95-th percentile	3548071.754
Maximum	4164682.93
Range	4164682.93
Interquartile range (IQR)	1852267.126

Descriptive statistics

Standard deviation	1204392.14
Coefficient of variation (CV)	1.137260116
Kurtosis	-0.2883747944
Mean	1059029.613
Median Absolute Deviation (MAD)	596630.5754
Skewness	0.9170950852
Sum	220278159.5
Variance	1.450560427e+12

Histogram

Histogram with fixed size bins (bins=10)

Value	Count	Frequency (%)
0	9	4.3%
1036824.947	1	0.5%
454.2400848	1	0.5%
167.1978957	1	0.5%
161563.1322	1	0.5%
9.77493558	1	0.5%
2430537.93	1	0.5%
12.211249	1	0.5%
5.788534164	1	0.5%
1425749.206	1	0.5%
41325.44024	1	0.5%
1061974.901	1	0.5%
1803096.364	1	0.5%
504022.112	1	0.5%
787.4692494	1	0.5%
10.14069009	1	0.5%
1309058.45	1	0.5%
3509548.854	1	0.5%
3689515.546	1	0.5%
1846762.308	1	0.5%
3930409.207	1	0.5%
2152336.101	1	0.5%
1087297.548	1	0.5%
13093.04521	1	0.5%
2392486.603	1	0.5%
Other values (175)	175	84.1%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
0	9	4.3%
0.07	1	0.5%
0.1351	1	0.5%
0.265643	1	0.5%
0.38704799	1	0.5%
0.7099546307	1	0.5%
1.010257807	1	0.5%
1.28953976	1	0.5%
1.549271977	1	0.5%
1.790822939	1	0.5%

Value	Count	Frequency (%)
4164682.93	1	0.5%
4106193.634	1	0.5%
4047680.532	1	0.5%
3989256.163	1	0.5%
3930409.207	1	0.5%
3871463.879	1	0.5%
3811496.601	1	0.5%
3750596.947	1	0.5%
3689515.546	1	0.5%
3628886.554	1	0.5%

Recoveries
Real number (ℝ_≥0)

MISSING
ZEROS

Distinct count	9
Unique (%)	14.5%
Missing	146
Missing (%)	70.2%
Infinite	0
Infinite (%)	0.0%

Mean	6.887096774193548
Minimum	0.0
Maximum	178.0
Zeros	23
Zeros (%)	11.1%
Memory size	1.8 KiB

Quantile statistics

Minimum	0
5-th percentile	0
Q1	0
median	3
Q3	7
95-th percentile	12
Maximum	178
Range	178
Interquartile range (IQR)	7

Descriptive statistics

Standard deviation	22.49962101
Coefficient of variation (CV)	3.266923894
Kurtosis	57.36243821
Mean	6.887096774
Median Absolute Deviation (MAD)	3
Skewness	7.442931096
Sum	427
Variance	506.2329455

Histogram

Histogram with fixed size bins (bins=10)

Value	Count	Frequency (%)
0	23	11.1%
3	12	5.8%
7	11	5.3%
12	4	1.9%
5	4	1.9%
6	3	1.4%
17	2	1.0%
8	2	1.0%
178	1	0.5%
(Missing)	146	70.2%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
0	23	11.1%
3	12	5.8%
5	4	1.9%
6	3	1.4%
7	11	5.3%
8	2	1.0%
12	4	1.9%
17	2	1.0%
178	1	0.5%

Value	Count	Frequency (%)
178	1	0.5%
17	2	1.0%
12	4	1.9%
8	2	1.0%
7	11	5.3%
6	3	1.4%
5	4	1.9%
3	12	5.8%
0	23	11.1%

Predicted Recoveries
Recoveries

Predicted Recoveries
Recoveries

Predicted Recoveries
Recoveries

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

First rows

	df_index	Predicted Recoveries
0	2020-01-22	0.00
1	2020-01-23	0.00
2	2020-01-24	0.00
3	2020-01-25	0.00
4	2020-01-26	0.00
5	2020-01-27	0.00
6	2020-01-28	0.00
7	2020-01-29	0.00
8	2020-01-30	0.00
9	2020-01-31	0.07

Last rows

	df_index	Predicted Recoveries	Recoveries
198	2020-08-07	3.628887e+06	NaN
199	2020-08-08	3.689516e+06	NaN
200	2020-08-09	3.750597e+06	NaN
201	2020-08-10	3.811497e+06	NaN
202	2020-08-11	3.871464e+06	NaN
203	2020-08-12	3.930409e+06	NaN
204	2020-08-13	3.989256e+06	NaN
205	2020-08-14	4.047681e+06	NaN
206	2020-08-15	4.106194e+06	NaN
207	2020-08-16	4.164683e+06	NaN

Overview

Variables

Interactions

Correlations