Dataset statistics
Number of variables | 3 |
---|---|
Number of observations | 208 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 4.2 KiB |
Average record size in memory | 20.6 B |
Variable types
NUM | 2 |
---|---|
DATE | 1 |
Reproduction
Analysis started | 2020-08-18 00:53:31.669443 |
---|---|
Analysis finished | 2020-08-18 00:53:36.822824 |
Duration | 5.15 seconds |
Version | pandas-profiling v2.8.0 |
Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
Download configuration | config.yaml |
df_index has unique values | Unique |
Predicted Recoveries has unique values | Unique |
Recoveries has 146 (70.2%) zeros | Zeros |
Distinct count | 208 |
---|---|
Unique (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 1.8 KiB |
Minimum | 2020-01-22 00:00:00 |
---|---|
Maximum | 2020-08-16 00:00:00 |
Histogram
Distinct count | 208 |
---|---|
Unique (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 3908618.776227494 |
---|---|
Minimum | 28.0 |
Maximum | 16135672.557126561 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 1.8 KiB |
Quantile statistics
Minimum | 28 |
---|---|
5-th percentile | 223.6087156 |
Q1 | 70721.85654 |
median | 1945895.912 |
Q3 | 6646414.435 |
95-th percentile | 13697659.64 |
Maximum | 16135672.56 |
Range | 16135644.56 |
Interquartile range (IQR) | 6575692.578 |
Descriptive statistics
Standard deviation | 4610091.863 |
---|---|
Coefficient of variation (CV) | 1.179468279 |
Kurtosis | -0.02410527068 |
Mean | 3908618.776 |
Median Absolute Deviation (MAD) | 1935489.208 |
Skewness | 1.066448496 |
Sum | 812992705.5 |
Variance | 2.125294698e+13 |
Histogram with fixed size bins (bins=10)
Value | Count | Frequency (%) | |
3130541.015 | 1 | 0.5% | |
1138837.074 | 1 | 0.5% | |
47865.56968 | 1 | 0.5% | |
32433.87226 | 1 | 0.5% | |
16926.54146 | 1 | 0.5% | |
4404448.151 | 1 | 0.5% | |
35292.3312 | 1 | 0.5% | |
18878.38356 | 1 | 0.5% | |
181478.1654 | 1 | 0.5% | |
8798900.749 | 1 | 0.5% | |
7110558.993 | 1 | 0.5% | |
1408564.187 | 1 | 0.5% | |
1075079.574 | 1 | 0.5% | |
122274.0613 | 1 | 0.5% | |
10500163.9 | 1 | 0.5% | |
14943173 | 1 | 0.5% | |
2130076.475 | 1 | 0.5% | |
566336.5946 | 1 | 0.5% | |
1618808.294 | 1 | 0.5% | |
12046358.81 | 1 | 0.5% | |
3785438.086 | 1 | 0.5% | |
61549.18678 | 1 | 0.5% | |
6616461.442 | 1 | 0.5% | |
8052395.155 | 1 | 0.5% | |
1982852.905 | 1 | 0.5% | |
Other values (183) | 183 | 88.0% |
Value | Count | Frequency (%) | |
28 | 1 | 0.5% | |
30 | 1 | 0.5% | |
36 | 1 | 0.5% | |
39 | 1 | 0.5% | |
52 | 1 | 0.5% | |
61 | 1 | 0.5% | |
107 | 1 | 0.5% | |
126 | 1 | 0.5% | |
143 | 1 | 0.5% | |
171.84 | 1 | 0.5% |
Value | Count | Frequency (%) | |
16135672.56 | 1 | 0.5% | |
15891623.7 | 1 | 0.5% | |
15650347.6 | 1 | 0.5% | |
15412377.3 | 1 | 0.5% | |
15176905.45 | 1 | 0.5% | |
14943173 | 1 | 0.5% | |
14707088.59 | 1 | 0.5% | |
14470478.93 | 1 | 0.5% | |
14234930.73 | 1 | 0.5% | |
14003488.58 | 1 | 0.5% |
Distinct count | 63 |
---|---|
Unique (%) | 30.3% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 9523.95673076923 |
---|---|
Minimum | 0 |
Maximum | 98334 |
Zeros | 146 |
Zeros (%) | 70.2% |
Memory size | 960.0 B |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
median | 0 |
Q3 | 237.5 |
95-th percentile | 69576.55 |
Maximum | 98334 |
Range | 98334 |
Interquartile range (IQR) | 237.5 |
Descriptive statistics
Standard deviation | 22778.61355 |
---|---|
Coefficient of variation (CV) | 2.391717455 |
Kurtosis | 4.999526121 |
Mean | 9523.956731 |
Median Absolute Deviation (MAD) | 0 |
Skewness | 2.473596109 |
Sum | 1980983 |
Variance | 518865235.4 |
Histogram with fixed size bins (bins=10)
Value | Count | Frequency (%) | |
0 | 146 | 70.2% | |
52 | 1 | 0.5% | |
107 | 1 | 0.5% | |
3946 | 1 | 0.5% | |
36711 | 1 | 0.5% | |
22886 | 1 | 0.5% | |
1124 | 1 | 0.5% | |
23394 | 1 | 0.5% | |
91499 | 1 | 0.5% | |
852 | 1 | 0.5% | |
4683 | 1 | 0.5% | |
12583 | 1 | 0.5% | |
61 | 1 | 0.5% | |
55865 | 1 | 0.5% | |
2616 | 1 | 0.5% | |
87256 | 1 | 0.5% | |
10865 | 1 | 0.5% | |
39 | 1 | 0.5% | |
36 | 1 | 0.5% | |
45602 | 1 | 0.5% | |
30 | 1 | 0.5% | |
28 | 1 | 0.5% | |
60694 | 1 | 0.5% | |
48228 | 1 | 0.5% | |
14352 | 1 | 0.5% | |
Other values (38) | 38 | 18.3% |
Value | Count | Frequency (%) | |
0 | 146 | 70.2% | |
28 | 1 | 0.5% | |
30 | 1 | 0.5% | |
36 | 1 | 0.5% | |
39 | 1 | 0.5% | |
52 | 1 | 0.5% | |
61 | 1 | 0.5% | |
107 | 1 | 0.5% | |
126 | 1 | 0.5% | |
143 | 1 | 0.5% |
Value | Count | Frequency (%) | |
98334 | 1 | 0.5% | |
97704 | 1 | 0.5% | |
91499 | 1 | 0.5% | |
87256 | 1 | 0.5% | |
84854 | 1 | 0.5% | |
83207 | 1 | 0.5% | |
80840 | 1 | 0.5% | |
78088 | 1 | 0.5% | |
76034 | 1 | 0.5% | |
72624 | 1 | 0.5% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
df_index | Predicted Recoveries | Recoveries | |
---|---|---|---|
0 | 2020-01-22 | 28.00 | 28 |
1 | 2020-01-23 | 30.00 | 30 |
2 | 2020-01-24 | 36.00 | 36 |
3 | 2020-01-25 | 39.00 | 39 |
4 | 2020-01-26 | 52.00 | 52 |
5 | 2020-01-27 | 61.00 | 61 |
6 | 2020-01-28 | 107.00 | 107 |
7 | 2020-01-29 | 126.00 | 126 |
8 | 2020-01-30 | 143.00 | 143 |
9 | 2020-01-31 | 171.84 | 222 |
Last rows
df_index | Predicted Recoveries | Recoveries | |
---|---|---|---|
198 | 2020-08-07 | 1.400349e+07 | 0 |
199 | 2020-08-08 | 1.423493e+07 | 0 |
200 | 2020-08-09 | 1.447048e+07 | 0 |
201 | 2020-08-10 | 1.470709e+07 | 0 |
202 | 2020-08-11 | 1.494317e+07 | 0 |
203 | 2020-08-12 | 1.517691e+07 | 0 |
204 | 2020-08-13 | 1.541238e+07 | 0 |
205 | 2020-08-14 | 1.565035e+07 | 0 |
206 | 2020-08-15 | 1.589162e+07 | 0 |
207 | 2020-08-16 | 1.613567e+07 | 0 |