Genomic Prediction
Last updated
Last updated
Genomic Prediction interface
Choose BED: Click to select a BED file containing genotype data for genomic prediction.
Choose phenotype: Click to select a file with phenotype data that will be used in the genomic prediction analysis.
Algorithm dropdown: Select the algorithm for genomic prediction.
Run button: Initiate the genomic prediction process with the selected BED file, phenotype data, and algorithm.
A regression plot with correlation analysis is used in genomic prediction to assess the accuracy of prediction models. The plot compares predicted values against observed values, where a strong linear relationship indicates high predictive ability. The correlation coefficient quantifies this relationship, with values close to 1.0 signaling strong predictive accuracy.
Data points (blue dots): Each blue dot represents a pair of predicted and real values for an individual instance. The x-coordinate of a dot is the mean of the predicted and real values for that instance, and the y-coordinate is their difference (Predicted_value - Real_value). This arrangement shows how much each prediction deviates from the real value and whether this deviation is consistent across different levels of measurement.
Mean difference (red dashed line): This line represents the average difference between the predicted and actual values. Ideally, in a perfect prediction scenario, this line would be at zero, indicating no difference between predicted and real values. The position of this line (above or below zero) can indicate a systematic bias in the predictions â for instance, if it's above zero, it means the predictions are generally higher than the actual values.
Limits of agreement (green dashed lines): These lines represent the bounds where most differences between predicted and actual values lie. They are calculated as the mean difference plus and minus 1.96 times the standard deviation of the differences. This metric is based on the assumption that the differences are normally distributed, and thus, roughly 95% of the data points should lie between these lines. If the data points are widely spread between these lines, it indicates a larger variability in the differences.
5837 5837 1
6009 6009 1
6898 6898 1
6900 6900 0
6901 6901 0
6903 6903 1