Dataset statistics
| Number of variables | 3 |
|---|---|
| Number of observations | 208 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 4.2 KiB |
| Average record size in memory | 20.6 B |
Variable types
| NUM | 2 |
|---|---|
| DATE | 1 |
Reproduction
| Analysis started | 2020-08-18 00:53:31.669443 |
|---|---|
| Analysis finished | 2020-08-18 00:53:36.822824 |
| Duration | 5.15 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
df_index has unique values | Unique |
Predicted Recoveries has unique values | Unique |
Recoveries has 146 (70.2%) zeros | Zeros |
| Distinct count | 208 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.8 KiB |
| Minimum | 2020-01-22 00:00:00 |
|---|---|
| Maximum | 2020-08-16 00:00:00 |
Histogram
| Distinct count | 208 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3908618.776227494 |
|---|---|
| Minimum | 28.0 |
| Maximum | 16135672.557126561 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 1.8 KiB |
Quantile statistics
| Minimum | 28 |
|---|---|
| 5-th percentile | 223.6087156 |
| Q1 | 70721.85654 |
| median | 1945895.912 |
| Q3 | 6646414.435 |
| 95-th percentile | 13697659.64 |
| Maximum | 16135672.56 |
| Range | 16135644.56 |
| Interquartile range (IQR) | 6575692.578 |
Descriptive statistics
| Standard deviation | 4610091.863 |
|---|---|
| Coefficient of variation (CV) | 1.179468279 |
| Kurtosis | -0.02410527068 |
| Mean | 3908618.776 |
| Median Absolute Deviation (MAD) | 1935489.208 |
| Skewness | 1.066448496 |
| Sum | 812992705.5 |
| Variance | 2.125294698e+13 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 3130541.015 | 1 | 0.5% | |
| 1138837.074 | 1 | 0.5% | |
| 47865.56968 | 1 | 0.5% | |
| 32433.87226 | 1 | 0.5% | |
| 16926.54146 | 1 | 0.5% | |
| 4404448.151 | 1 | 0.5% | |
| 35292.3312 | 1 | 0.5% | |
| 18878.38356 | 1 | 0.5% | |
| 181478.1654 | 1 | 0.5% | |
| 8798900.749 | 1 | 0.5% | |
| 7110558.993 | 1 | 0.5% | |
| 1408564.187 | 1 | 0.5% | |
| 1075079.574 | 1 | 0.5% | |
| 122274.0613 | 1 | 0.5% | |
| 10500163.9 | 1 | 0.5% | |
| 14943173 | 1 | 0.5% | |
| 2130076.475 | 1 | 0.5% | |
| 566336.5946 | 1 | 0.5% | |
| 1618808.294 | 1 | 0.5% | |
| 12046358.81 | 1 | 0.5% | |
| 3785438.086 | 1 | 0.5% | |
| 61549.18678 | 1 | 0.5% | |
| 6616461.442 | 1 | 0.5% | |
| 8052395.155 | 1 | 0.5% | |
| 1982852.905 | 1 | 0.5% | |
| Other values (183) | 183 | 88.0% |
| Value | Count | Frequency (%) | |
| 28 | 1 | 0.5% | |
| 30 | 1 | 0.5% | |
| 36 | 1 | 0.5% | |
| 39 | 1 | 0.5% | |
| 52 | 1 | 0.5% | |
| 61 | 1 | 0.5% | |
| 107 | 1 | 0.5% | |
| 126 | 1 | 0.5% | |
| 143 | 1 | 0.5% | |
| 171.84 | 1 | 0.5% |
| Value | Count | Frequency (%) | |
| 16135672.56 | 1 | 0.5% | |
| 15891623.7 | 1 | 0.5% | |
| 15650347.6 | 1 | 0.5% | |
| 15412377.3 | 1 | 0.5% | |
| 15176905.45 | 1 | 0.5% | |
| 14943173 | 1 | 0.5% | |
| 14707088.59 | 1 | 0.5% | |
| 14470478.93 | 1 | 0.5% | |
| 14234930.73 | 1 | 0.5% | |
| 14003488.58 | 1 | 0.5% |
| Distinct count | 63 |
|---|---|
| Unique (%) | 30.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9523.95673076923 |
|---|---|
| Minimum | 0 |
| Maximum | 98334 |
| Zeros | 146 |
| Zeros (%) | 70.2% |
| Memory size | 960.0 B |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 237.5 |
| 95-th percentile | 69576.55 |
| Maximum | 98334 |
| Range | 98334 |
| Interquartile range (IQR) | 237.5 |
Descriptive statistics
| Standard deviation | 22778.61355 |
|---|---|
| Coefficient of variation (CV) | 2.391717455 |
| Kurtosis | 4.999526121 |
| Mean | 9523.956731 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.473596109 |
| Sum | 1980983 |
| Variance | 518865235.4 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 146 | 70.2% | |
| 52 | 1 | 0.5% | |
| 107 | 1 | 0.5% | |
| 3946 | 1 | 0.5% | |
| 36711 | 1 | 0.5% | |
| 22886 | 1 | 0.5% | |
| 1124 | 1 | 0.5% | |
| 23394 | 1 | 0.5% | |
| 91499 | 1 | 0.5% | |
| 852 | 1 | 0.5% | |
| 4683 | 1 | 0.5% | |
| 12583 | 1 | 0.5% | |
| 61 | 1 | 0.5% | |
| 55865 | 1 | 0.5% | |
| 2616 | 1 | 0.5% | |
| 87256 | 1 | 0.5% | |
| 10865 | 1 | 0.5% | |
| 39 | 1 | 0.5% | |
| 36 | 1 | 0.5% | |
| 45602 | 1 | 0.5% | |
| 30 | 1 | 0.5% | |
| 28 | 1 | 0.5% | |
| 60694 | 1 | 0.5% | |
| 48228 | 1 | 0.5% | |
| 14352 | 1 | 0.5% | |
| Other values (38) | 38 | 18.3% |
| Value | Count | Frequency (%) | |
| 0 | 146 | 70.2% | |
| 28 | 1 | 0.5% | |
| 30 | 1 | 0.5% | |
| 36 | 1 | 0.5% | |
| 39 | 1 | 0.5% | |
| 52 | 1 | 0.5% | |
| 61 | 1 | 0.5% | |
| 107 | 1 | 0.5% | |
| 126 | 1 | 0.5% | |
| 143 | 1 | 0.5% |
| Value | Count | Frequency (%) | |
| 98334 | 1 | 0.5% | |
| 97704 | 1 | 0.5% | |
| 91499 | 1 | 0.5% | |
| 87256 | 1 | 0.5% | |
| 84854 | 1 | 0.5% | |
| 83207 | 1 | 0.5% | |
| 80840 | 1 | 0.5% | |
| 78088 | 1 | 0.5% | |
| 76034 | 1 | 0.5% | |
| 72624 | 1 | 0.5% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | Predicted Recoveries | Recoveries | |
|---|---|---|---|
| 0 | 2020-01-22 | 28.00 | 28 |
| 1 | 2020-01-23 | 30.00 | 30 |
| 2 | 2020-01-24 | 36.00 | 36 |
| 3 | 2020-01-25 | 39.00 | 39 |
| 4 | 2020-01-26 | 52.00 | 52 |
| 5 | 2020-01-27 | 61.00 | 61 |
| 6 | 2020-01-28 | 107.00 | 107 |
| 7 | 2020-01-29 | 126.00 | 126 |
| 8 | 2020-01-30 | 143.00 | 143 |
| 9 | 2020-01-31 | 171.84 | 222 |
Last rows
| df_index | Predicted Recoveries | Recoveries | |
|---|---|---|---|
| 198 | 2020-08-07 | 1.400349e+07 | 0 |
| 199 | 2020-08-08 | 1.423493e+07 | 0 |
| 200 | 2020-08-09 | 1.447048e+07 | 0 |
| 201 | 2020-08-10 | 1.470709e+07 | 0 |
| 202 | 2020-08-11 | 1.494317e+07 | 0 |
| 203 | 2020-08-12 | 1.517691e+07 | 0 |
| 204 | 2020-08-13 | 1.541238e+07 | 0 |
| 205 | 2020-08-14 | 1.565035e+07 | 0 |
| 206 | 2020-08-15 | 1.589162e+07 | 0 |
| 207 | 2020-08-16 | 1.613567e+07 | 0 |