Define the probability of an outcome as the proportion of times the outcome would occur if we observed the random process that gives rise to it an infinite number of times
P(A and B) = 0Define complementary outcomes as mutually exclusive outcomes of the same random process whose probabilities add up to 1: if A and B are complementary, then P(A) + P(B) = 1
P(A or B) = P(A) + P(B) − P(A and B)P (A or B) = P (A) + P (B) (since for mutually exclusive events P(A and B) = 0)P(A and B) = P(A) × P(B)P(A and B) = P(A|B) × P(B)P(A|B) = P(A and B) / P(B) (thus, P(A and B) = P(A|B) * P(B) (see above))
Z = (X − μ) / σ
Z(μ) = 0pnorm(X, μ, σ) to detect the percentile (i.e., the probability that any event from this outcome lies below event X)qnorm(P, μ, σ) to detect the value of event X, which is the upper border for all the events from this outcome, which occur with the probability Plower.tail optionn, is fixed.p, is the same for each trial.rbinom(n, SIZE, p) (compare to rnorm(SIZE, μ, σ))k successes in n: choose(n, k)pbinom(k-1, n, p, lower.tail = F) or sum(dbinom(k:n, n, p))μ = n * p and its standard deviation σ = sqrt(n * p * (1 − p)) (see bitono function)bitono function)SE = σ / sqrt(n) where σ is the population standard deviation
σ is not known (almost always), the standard error SE can be estimated using the sample standard deviation s, so that SE = s / sqrt(n)σ or s) and standard error (SE): standard deviation measures the variability in the data, while standard error measures the variability in point estimates (aka sample statistics) from different samples of the same size and from the same population, i.e. measures the sampling variabilityx_bar ~ N (mean = μ, SE = σ / sqrt(n)point_estimate ± z⋆ * SE (note that z⋆ is always positive ; see SI.confidence_intreval function)
x_bar ± z⋆ * σ / sqrt(n)Define margin of error as the distance required to travel in either direction away from the point estimate when constructing a confidence interval, i.e. z⋆ * σ / sqrt(n)
Finally, interpret a confidence interval as “We are XX% confident that the true population parameter is in this interval”, where XX% is the desired confidence level
μ) and not the sample statistics (e.g. sample mean, x_bar). Note that the population parameter is unknown while the sample statistic is measured using the observed data and hence there is no point in hypothesizing about it.p−value = P(observed or more extreme sample statistic | H0 true)Z = (point estimate − null value) / SE or Z = (x_bar - μ) / SE (see SI.z.score and SI.pvalue functions)
- Formulate the framework for statistical inference using hypothesis testing and nearly normal point estimates:
- Set up the hypotheses first in plain language and then using appropriate notation
- Identify the appropriate sample statistic that can be used as a point estimate for the parameter of interest
- Verify that the conditions for the CLT hold (if the conditions necessary for the CLT to hold are not met, note this and do not go forward with the analysis)
- Compute the SE, sketch the sampling distribution, and shade area(
s) representing thep-value- Using the sketch and the normal model, calculate the p-value and determine if the null hypothesis should be rejected or not, and state your conclusion in context of the data and the research question
SI.bootSI.boot.confidence_interval):
x_boot ± z * SE_bootSE = sqrt( (sd_1^2 / num_1) + (sd_2^2 / num_2) ) and use this standard error in hypothesis testing and confidence intervals comparing means of independent groupsby)SI.t.score, SI.t.confidence_interval and SI.t.pvalue)df = n−1 for inference for a population mean using data from a small sample: CI: x_bar ± t_df * SE, HT: T_df = (x_bar − μ) / SE, where SE = sd / sqrt(n) (see function SI.t.confidence_interval and SI.t.pvalue)df = min(n_1 - 1, n_2 − 1) for inference for difference between means of two population means using data from two small samples, where SE = sqrt( (sd_1^2 / n_1) + (sd_2^2 / n_2) )s_pooled = sqrt ( (s_1^2 * (n_1 - 1) + s_2^2 * (n_2 -1)) / (n_1 + n_2 -2) )pt(T, df = n - 1) (e.g. pt(1.75, 19, lower.tail = F))qt((1 - (1 - CI)/2), df = n - 1)H0: μ1=μ2=...=μk and HA: At least one mean is differentSI.anova functionα⋆ = α/K, where K is the number of comparisons being considered, K = k * (k - 1) / 2, where k is the number of groups; see combn function) to combat inflating this error rateSI.anova.pairwise function)p (parameter) and sample proportion p_bar (point estimate)SE = sqrt(p * (1−p) / n), where p is the population proportion
p is not known (almost always), this can be estimated using the sample proportion, p_bar, SE = sqrt(p_bar * (1 - p_bar) / n).SI.prop.standart_errorn * p ≥ 10 and n * (1 − p) ≥ 10)p_bar ~ N(mean = p, SE = sqrt(p * (1−p) / n)Besides, note that if the CLT doesn’t apply and the sample proportion is low (close to 0) the sampling distribution will likely be right skewed, if the sample proportion is high (close to 1) the sampling distribution will likely be left skewed
point estimate ± margin of error (feel free to use usual function SI.confidence_intreval)Remember that est tstatistics are calculated as: test statistic = (point estimate - null value) / standard error (again, feel free to use usual function SI.pvalue, see examples)
p-value = P(observed or more extreme test statistic | H0 true))
p_bar (aka observed sample proportion) when calculating the standard error and checking the success/failure condition: SE = sqrt(p_bar * (1 - p_bar) / n)p0 (aka null value) when calculating the standard error and checking the success/failure condition: SE = sqrt(p0 * (1 - p0) / n)SI.prop.standart_error function):
H0: p_1 − p_2 = some value other than 0: SE(p_bar_1 − p_bar_2) = sqrt((p_bar_1* (1 − p_bar_1) / n_1) + (p_bar_2 * (1 − p_bar_2) / n_2))H0: p1 − p2 = 0: SE(p_bar_1 − p_bar_2) = sqrt((p_pool * (1 − p_pool) / n_1) + (p_pool * (1 − p_pool) / n_2)) where p_pool is the overall rate of success: (number of successes in group 1 + number of successes in group 2) / n_1 + n_2 (note that in order to calculate pooled SE with SI.prop.standart_error we provide it with real numbers, not proportions, and set switch pool to TRUE; see examples)expected = n * ratioχ2 = sum( (observed count − expected count)^2 / expected count) (use SI.chisq function)df = k − 1, where k is the number of cellsSI.chisq.pvalue function to obtain p-value, given the chi-square statistic and the number of levelsE = (row total * column total) / grand totaldf = (R − 1) * (C − 1), where R is the number of rows in a two-way table, and C is the number of columnsSI.chisq.independence_test function to obtain p-value for the independence test, given the chi-square statistic and the number of levelsy = β_0 + β_1 * x, where where β_0 is the intercept, and β_1 is the slope (note that the point estimates (estimated from observed data) for β_0 and β_1 are b_0 and b_1, respectivelye as the difference between the observed y and predicted y_bar values of the response variable:e_i = y_i − y_bar_i (or use resid function in connection with lm function)b_1 as b_1 = R * s_y / s_x, where R is the correlation coefficient, s_y is the standard deviation of the response variable, and s_x is the standard deviation of the explanatory variable (or use SI.corr.slope function)x, we would expect y to be lower/higher on average by |b_1| units|b_1| units higher/lower between the baseline level and the other level of the explanatory variable.b_1(mean_x, mean_y)b_0 as b_0 = mean_y − b_1 * mean_x, where b_1 is the slope, mean_y is the average of the response variable, and mean_x is the average of explanatory variable (or use SI.corr.intercept function)x = 0, we would expect y to equal, on average, b_0b_0x, by plugging in x in the linear model: y = b_0 + b_1 * x
R^2 (aka R-squared) as the percentage of the variability in the response variable explained by the explanatory variable
H0: β_1 = 0, and recognize that the standard software output yields the p-value for the two-sided alternative hypothesis (use summary function for the lm output)
β_1 = 0 means the regression line is horizontal, hence suggesting that there is no relationship between the explanatory and response variablesT_df = (b_1 − null value) / SE_b1 with df = n − 2 (note that the T score has n − 2 degrees of freedom since we lose one degree of freedom for each parameter we estimate, and in this case we estimate the intercept and the slope)b_1 ± t_df * SE_b1, where df = n − 2 and t_df is the critical score associated with the given confidence level at the desired degrees of freedom, and the standard error of the slope estimate SE_b1 can be found on the regression output (again, see summary of the lm output)y_hat =β_0 + β_1 * x_1 + β_2 * x_2 + … + β_k * x_k where there are k predictors (explanatory variables)β_0 as the expected value of y when all predictors are equal to 0, on averageβ_1) as “All else held constant, for each unit increase in x_1, we would expect y to be higher/lower on average by β_1”R^2 will increase with each explanatory variable added to the model, regardless of whether or not the added variable is a meaningful predictor of the response variable. Therefore we use adjusted R^2, which applies a penalty for the number of predictors included in the model, to better assess the strength of a multiple linear regression model: R_adj^2 = 1 − (SSE / (n − k − 1)) / SST / (n − 1), where n is the number of cases and k is the number of predictorsR_adj^2 will only increase if the added variable has a meaningful contribution to the amount of explained variability in y, i.e. if the gains from adding the variable exceeds the penaltyH_0: β_1 = β_2 = … = β_kH_A: At least one β_i ≠ 0df = n − k − 1summary(model))H_0: β_1 = 0, given all other variables are included in the modelH_A: β_1 ≠ 0, given all other variables are included in the modeln − k − 1 degrees of freedomb_i ± t * SE_bi, where t is a t statistic computed with n - k - 1 degrees of freedomR^2 (choose the model with higher adjusted R^2)R^2 method:
R^2R^2 is reachedR^2 method:
R^2R^2R^2 is reachedR^2 method is more computationally intensive, but it is more reliable, since it doesn’t depend on an arbitrary significance levelpairs.panel function from psych library), and residuals plots of residuals vs. each x (looking for a complete random scatter around zero): plot(model$residuals ~ initial_data$x)hist(model$residuals), qqplot(model$residuals) ; qqline(model$residuals)y_hat aka the predicted values of the response variable (we want the residuals to be randomly scattered in a band with a constant width around zero): plot(model$residuals ~ model$fitted) (no “fan shape” is expected here)plot(abs(model$residuals) ~ model$fitted) (no “triangle” is expected here)plot(model$residuals) (no pattern of any kind are expected)# SKEWED NORMAL DISTRIBUTION ####
# Generates a skewed normal distribution
SI.rnorm_skewed <- function(coef = 1, direction = "right", mu = 0, sigma = 1) {
# generates a skewed normal distribution given the coef of skewness and its direction
direction <- switch (direction,
right = 1,
left = -1)
RNORM_SKEWED <- rnorm(5000, mu, sigma) + 3 * sigma
RNORM_SKEWED <- RNORM_SKEWED[RNORM_SKEWED > 0 & RNORM_SKEWED < 6 * sigma]
RNORM_SKEWED <- direction * (RNORM_SKEWED ^ (coef + 1))
hist(RNORM_SKEWED)
abline(v=mean(RNORM_SKEWED), col="red")
abline(v=median(RNORM_SKEWED), col="blue")
return(RNORM_SKEWED)
}
# BINOMIAL DISTRIBUTIONS ####
# Converts a Binomial Distribution to a Normal Distribution
SI.bitono <- function(n, p) {
# converts a binomial distribution to a normal distribution
# given its success probability and a number of trials
mu <- n * p
si <- sqrt(mu * (1 - p))
vec <- c(mu, si)
return(vec)
}
# STANDARD ERROR ####
# Caclulates the standard error for a normal distribution
SI.standart_error <- function(sigma, sample_size) {
# caclulates the standard error given the variance and the size of a sample
tmp <- sigma^2/sample_size
if (length(sigma) > 1) {
se <- sqrt(sum(tmp))
} else {
se <- sqrt(tmp)
}
return(se)
}
# Caclulates the standard error for the distribution of proportions
SI.prop.standart_error <- function(population_proportion, sample_size, pool = FALSE) {
# caclulates the standard error given the proportion of a population and the size of a sample
if (pool) {
pooled_proportion <- sum(population_proportion) / sum(sample_size)
return( sqrt(
(pooled_proportion * (1 - pooled_proportion) / sample_size[1]) +
(pooled_proportion * (1 - pooled_proportion) / sample_size[2])
) )
}
tmp <- (population_proportion * (1 - population_proportion)) / sample_size
if (length(population_proportion) > 1) {
se <- sqrt(sum(tmp))
} else {
se <- sqrt(tmp)
}
return(se)
}
# CONFIDENCE INTERVALS ####
# Important Reminder:
#
# Commonly used confidence levels (CLs) are 90%, 95%, 98%, and 99%
# "...With the confidence level A we state that the answer lies in B..."
#
# Calculates the CI for a population
SI.confidence_intreval <- function(confidence_level, mu, standard_error) {
# calculates the CI of the population given the CL and the mean and the standard error of a sample
value <- (1 - confidence_level) / 2
return(c(qnorm(value, mu, standard_error), qnorm(confidence_level + value, mu, standard_error)))
}
# Calculates the needed sample size
SI.sample_size <- function(margin_of_error, sigma, confidence_level) {
# calculates the needed sample size, given the standard deviation of a population,
# the desired margin of error and the confidence level of a projected sample
z <- qnorm((1 - confidence_level) / 2)
size <- ((z * sigma) / margin_of_error) ^ 2
return(ceiling(size))
}
SI.prop.sample_size <- function(margin_of_error, population_proportion, confidence_level) {
# calculates the needed sample size, given the standard deviation of a population,
# the desired margin of error and the confidence level of a projected sample
z <- qnorm((1 - confidence_level) / 2)
return(ceiling(z^2 * population_proportion * (1 - population_proportion) / margin_of_error ^ 2))
}
# Important Reminder:
#
# pnorm() takes values from a distribution as an input, and returns the probability with which that value
# occurs in a given distribution
# qnorm() - on the contrary - takes the probability as an input, and returns the value occurring with
# that probability in a given distribution
#
# HYPOTHESIS TESTING ####
# Calculates the z-score
SI.z.score <- function(test_value, mu, standard_error) {
# calculates the z-score, given the proposed test value abd the mean and the standard error of a sample
return((test_value - mu)/standard_error)
}
# Calculates the p-value
SI.pvalue <- function (null_hypothesis, standard_error, test_value_left = NULL, test_value_right = NULL) {
# calculates the p-value, given the null hypothesis, the standard error and the test values
if (is.null(test_value_left) && is.null(test_value_right)) {
stop("No values to test provided")
}
if (!is.null(test_value_left) && is.null(test_value_right)) {
return(pnorm((test_value_left - null_hypothesis)/standard_error))
}
if (is.null(test_value_left) && !is.null(test_value_right)) {
return(pnorm((test_value_right - null_hypothesis)/standard_error, lower.tail = F))
}
if (!is.null(test_value_left) && !is.null(test_value_right)) {
return( pnorm((test_value_left - null_hypothesis)/standard_error, lower.tail = T)
+ pnorm((test_value_right - null_hypothesis)/standard_error, lower.tail = F) )
}
}
# Calculates p-value using standard deviation
SI.pvalue.sd <- function (null_hypothesis, sigma, sample_size,
test_value_left = NULL, test_value_right = NULL) {
# calculates p-value, given the mean and the variance of a population and the size of a sample
if (is.null(test_value_left) && is.null(test_value_right)) {
stop("No values to test provided")
}
if (!is.null(test_value_left) && is.null(test_value_right)) {
return(pnorm((test_value_left - null_hypothesis)/(sigma/sqrt(sample_size))))
}
if (is.null(test_value_left) && !is.null(test_value_right)) {
return(pnorm((test_value_right - null_hypothesis)/(sigma/sqrt(sample_size)), lower.tail = F))
}
if (!is.null(test_value_left) && !is.null(test_value_right)) {
return( pnorm((test_value_left - null_hypothesis)/(sigma/sqrt(sample_size)), lower.tail = T)
+ pnorm((test_value_right - null_hypothesis)/(sigma/sqrt(sample_size)), lower.tail = F) )
}
}
# BOOTSTRAPPING ####
# Generates a bootstrapping distribution
SI.boot <- function(data, boot_size, statistic) {
# generates a bootstrapping distribution of a statistic given the original data and the required size
boot <- rep(NA, boot_size)
for (i in 1:boot_size) {
boot[i] <- match.fun(statistic)(sample(data, length(data), replace = T))
}
return(boot)
}
# Calculates the CI for a bootstrapping distribution
SI.boot.confidence_interval <- function(data, confidence_interval = 0.9, method = "percentile") {
# calculates the CI given the bootstrapping distribution and the required interval
value <- (1 - confidence_interval) / 2
if (method == "percentile") {
quantile(data, c(value, 1-value))
} else {
if (method == "se") {
mean <- mean(data)
sd <- sd(data)
v1 <- mean + qnorm(value) * sd
v2 <- mean + qnorm(1-value) * sd
rez <- c(v1, v2)
names(rez) <- c(paste(value * 100, "%", sep=""), paste((1-value) * 100, "%", sep=""))
return(rez)
} else {
stop("Invaid method")
}
}
}
# STUDENT'S T DISTRIBUTION ####
# Calculates the t-score
SI.t.score <- function(confidence_level = NULL, confidence_interval = NULL, sample_size) {
# calculates the t-score, given the confidence level or the confidence interval & the size of a sample
if (is.null(confidence_level) && is.null(confidence_interval)) {
stop("Please, provide thr confidence level or the confidence interval")
}
if (is.null(confidence_interval)) {
return( qt((1 - (1 - confidence_level) / 2), sample_size - 1) )
}
if (is.null(confidence_level)) {
return( qt((1 - (1 - confidence_interval) / 2), sample_size - 1) )
}
}
# Calculates the CI for a population
SI.t.confidence_intreval <- function(confidence_level, mean, standard_error, sample_size) {
# calculates the CI of the population given the CL and the mean, the sd and the size of a sample
tscore <- qt((1 - confidence_level) / 2, sample_size - 1)
return( c(mean + tscore * standard_error, mean - tscore * standard_error ) )
}
# Calculates p-value
SI.t.pvalue <- function(null_hypothesis, standard_error, sample_size,
test_value_left = NULL, test_value_right = NULL) {
# calculates the p-value, given the null hypothesis, the standard error and the test values
if (is.null(test_value_left) && is.null(test_value_right)) {
stop("No values to test provided")
}
df <- min(sample_size - 1)
if (!is.null(test_value_left) && is.null(test_value_right)) {
return(pt((test_value_left - null_hypothesis)/standard_error, df))
}
if (is.null(test_value_left) && !is.null(test_value_right)) {
return(pt((test_value_right - null_hypothesis)/standard_error, df, lower.tail = F))
}
if (!is.null(test_value_left) && !is.null(test_value_right)) {
return( pt((test_value_left - null_hypothesis)/standard_error, df, lower.tail = T)
+ pt((test_value_right - null_hypothesis)/standard_error, df, lower.tail = F) )
}
}
# ANOVA ####
# Compute analysis of variance (or deviance) tables for one or more fitted model objects
SI.anova <- function(list_of_distributions) {
# compute analysis of variance tables given a list of distributions
# analyze the data
lst <- list_of_distributions
len <- length(lst)
dat <- unlist(lst)
lab <- c()
# transform the data according to the analysis
for (i in 1:len) {
vec <- rep(labels(lst)[i], length(lst[[i]]))
lab <- append(lab, vec)
}
# create a model
fit <- lm(dat ~ lab)
# supply the model to `anova` function
return(anova(fit))
}
SI.anova_pairwise <- function(list_of_distributions, significance_level = 0.05) {
lst <- list_of_distributions
len <- length(lst)
# number of comparisons: (number of groups) * (number of groups - 1) / 2
num <- len * (len - 1) / 2
# correction Bonferroni: (significance level) / (number of comparisons)
BC <- significance_level / num
# to calculate SE for pairwise comparisons we need to calculate MSE:
mean_list <- lapply(lst, mean); mean_list
# calculate the mean for all elements
mean_all <- mean(unlist(lst)) ; mean_all
# calculate SST
st <- lapply(lst, function(x) x <- (x - mean_all)^2)
# sapply applies a function (sum) to a list (st), returns its results as a vector, not as list (lapply)
sst <- sum(sapply(st, sum)) ; sst
# calculate SSG
sg <- lapply(mean_list, function(x) x <- (x - mean_all)^2)
counts <- lapply(lst, length)
ssg <- sum(unlist(sg) * unlist(counts))
ssg
# calculate SSE
sse <- sst - ssg
# calculate degrees of freedom
dft <- length(unlist(lst)) - 1
dfg <- length(lst) - 1
dfe <- dft - dfg
# calculate means squares
msg <- ssg / dfg
mse <- sse / dfe ; mse
# get names for every pair of distributions
names <- combn(c(1:len), 2, simplify = F) ; names
names <- as.character(names) ; names
names <- gsub(":", ", ", gsub("c|\\(|\\)", "", names)) ; names
# get list of lengths for every pair of distributions
len_pairs <- combn(sapply(lst, length), 2, simplify = F)
# separate it
len_pairs1 <- lapply(len_pairs, "[[", 1)
len_pairs2 <- lapply(len_pairs, "[[", 2)
# produce a vector of SEs for every pair of distributions
ses <- sqrt(sapply(len_pairs1, function(x) mse/x) + sapply(len_pairs2, function(x) mse/x))
# produce a vector of the corresponding mean differences
mean_pairs <- combn(sapply(lst, mean), 2, simplify = F)
mean_diffs <- sapply(lapply(mean_pairs, rev), diff)
# produce a vector of T statistics
t <- mean_diffs / ses ;
# convert negative values to positive in place them to the right tail
t <- abs(t)
# calculate p-values for these pairs (for degrees of freedom we use dfe)
p <- 2 * pt(t, dfe, lower.tail = F)
# assign corresponding names
names(p) <- names ; p
# compare to Bonferroni correction
fail <- p[p > BC]
succ <- p[p < BC]
print("The following list of distributions was provided")
print(str(lst))
if (!is.null(succ)) {
print("Null hyphothesis successfully rejected for the pair(s): ")
print(succ)
}
if (!is.null(succ)) {
print("Failed to reject null hyphothesis for the pair(s): ")
print(fail)
}
}
# CHI-SQUARE ####
# Calculates the chi-square
SI.chisq <- function(observed, expected) {
# calculates the chi-square given the vectors of observed and expected outcomes
if (length(observed) != length(expected)) {
stop("Check data, please")
}
return(sum( (observed - expected)^2 / expected ))
}
# Calculates the p-value for a chi-square distribution
SI.chisq.pvalue <- function (chisq, number_of_levels) {
# calculates the p-value given the chi-square statistic and the number of categorical levels
# goodness of fit: one categorical variable, more than two levels
return(pchisq(chisq, number_of_levels - 1, lower.tail = F))
}
SI.chisq.independence_test <- function (chisq, number_of_levels) {
df <- (number_of_levels[1] - 1) * (number_of_levels[2] - 1)
return(pchisq(chisq, df, lower.tail = F))
}
# MULTIPLE LINEAR REGRESSION MODELS ####
# Fits separate models for separate categories of a categorical variable on a scatterplot
SI.model.cat_separated <- function(model, ...){
# fits separate models for separate categories of a categorical variable on a scatterplot
# the code provided by DASI Team
if(class(model)!="lm"){
warning("Model must be the output of the function lm()")
}
if(length(model$xlevels)!=1){
warning("Model must contain exactly one categorical predictor")
}
if(length(model$coef)-length(model$xlevels[[1]])!=1){
warning("Model must contain exactly one non-categorical predictor")
}
palette <- c("#E69F00", "#56B4E9", "#D55E00", "#009E73", "#CC79A7", "#F0E442", "#0072B2")
baseIntercept <- model$coef[1]
nLines <- length(model$xlevels[[1]])
intercepts <- c(baseIntercept, rep(0, nLines-1))
indicatorInd <- c(1, rep(0, nLines)) # used to find slope parameter by process of elimination
for(i in 1:(nLines-1)){
indicatorName <- paste(names(model$contrasts),model$xlevels[[1]][1+i], sep = "")
intercepts[i+1] <- baseIntercept + model$coef[names(model$coef)==indicatorName]
indicatorInd <- indicatorInd + (names(model$coef)==indicatorName)
}
slope <- model$coef[!indicatorInd]
num_pred = which(names(model$model[,-1]) != names(model$xlevels)) + 1
cat_pred = which(names(model$model[,-1]) == names(model$xlevels)) + 1
model$model$COL = NA
model$model$PCH = NA
for(i in 1:nLines){
model$model$COL[model$model[,cat_pred] == levels(model$model[,cat_pred])[i]] = adjustcolor(palette[i],0.40)
model$model$PCH[model$model[,cat_pred] == levels(model$model[,cat_pred])[i]] = i+14
}
plot(model$model[,1] ~ jitter(model$model[,num_pred]), col = model$model$COL, pch = model$model$PCH,
ylab = names(model$model)[1],
xlab = names(model$model)[num_pred])
for(j in 1:nLines){
abline(intercepts[j], slope, col = palette[j], lwd = 2, ...)
}
if(slope > 0){legend_pos = "bottomright"}
if(slope < 0){legend_pos = "topleft"}
legend(legend_pos, col = palette[1:nLines], lty = 1, legend = levels(model$model[,cat_pred]), lwd = 2)
}