sktidy package

Submodules

sktidy.sktidy module

sktidy.sktidy.augment_kmeans(model, X)[source]

This function returns a dataframe of the original samples with their assigned clusters based on predictions make by an instance of scikit learn’s implementation of KMeans clustering.

Parameters
  • model (sklearn.cluster.KMeans) – The model to extract the cluster specific information from

  • X (pandas dataframe) – The data to which the Kmeans object has been fitted

Returns

df – A dataframe with k rows, where k is the number of examples in X and 2 columns of the data points in X and their corresponding predicted label

Return type

pandas dataframe

Examples

>>> # Importing packages
>>> from sklearn.cluster import KMeans
>>> from sklearn import datasets
>>> import pandas as pd
>>> import sktidy
>>> # Extracting data and traning the clustering algorithm
>>> df = datasets.load_iris(return_X_y = True, as_frame = True)[0]
>>> kmeans_clusterer = KMeans()
>>> kmeans_clusterer.fit(df)
>>> # Getting cluster assignment for each data point
>>> augment_kmeans(model = kmeans_clusterer, X = df)
sktidy.sktidy.augment_lr(model, X, y)[source]

Adds two columns to the original data of the scikit learn’s linear regression model. This includes predictions and residuals.

Parameters
  • model (sklearn.linear_model.LinearRegression object) – The fitted sklearn LinearRegression model

  • X (pandas.core.frame.DataFrame) – A dataframe of explanatory variables to predict on. Shaped n observations by m features.

  • y (pandas.core.series.Series) – A pandas series of response variables to predict on. Shaped n observations by 1.

Returns

df – A dataframe with the original data plus two additional columns for predictions and residuals. Shaped n observations by m features + 2.

Return type

pandas.core.frame.DataFrame

Examples

>>> # Importing packages
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn import datasets
>>> import pandas as pd
>>> import sktidy
>>> # Extracting data and traning the linear regression model
>>> X = datasets.load_iris(return_X_y = True, as_frame = True)[0]
>>> y = datasets.load_iris(return_X_y = True, as_frame = True)[1]
>>> lr_model = LinearRegression()
>>> lr_model.fit(X,y)
>>> # Getting the tidy df of linear regression model output
>>> augment_lr(model = lr_model,X = X,y = y)
sktidy.sktidy.tidy_kmeans(model, X)[source]

Return a tidy df of cluster information for a kmeans clustering algorithm

This function delivers diagnostic information about each cluster defined by an instance of scikit learn’s implementation of kmeans clustering including total intertia in each cluster, cluster center, and total number of points associated with each cluster.

Parameters
  • model (sklearn.cluster.KMeans) –

  • model to extract the cluster specific information from. (The) –

  • X (pandas dataframe) – The data to which the Kmeans object has been fitted

Returns

df – A dataframe with k rows, where k is the number of clusters and 3 columns,describing respectively the center of the cluster, the sum of inertia of the cluster, and the number of associated data points in a cluster.

Return type

pandas dataframe

Examples

>>> # Importing packages
>>> from sklearn.cluster import DBSCAN, KMeans
>>> from sklearn import datasets
>>> import pandas as pd
>>> import sktidy
>>> # Extracting data and training the clustering algorithm
>>> df = datasets.load_iris(return_X_y = True, as_frame = True)[0]
>>> kmeans_clusterer = KMeans()
>>> kmeans_clusterer.fit(df)
>>> # Getting the tidy df of cluster information
>>> tidy_kmeans(model = kmeans_clusterer, X = df)
sktidy.sktidy.tidy_lr(model, X, y)[source]

Returns a tidy dataframe for sklearn LinearRegression model with feature names, coefficients/intercept and p-values

Parameters
  • model (sklearn.linear_model.LinearRegression) – The fitted sklearn LinearRegression model

  • X (pandas.core.frame.DataFrame) – The feature pandas dataframe to which the LinearRegression object was fitted with m rows and n columns

  • y (pandas.core.series.Series) – The target pandas Series to which the LinearRegression object was fitted with m rows

Returns

df – A pandas dataframe with n+1 rows, where n is the number of columns(features) in the input dataframe X that was fitted to the model and 3 columns, describing feature names, coefficients/intercept and p-values

Return type

pandas.core.frame.DataFrame

Examples

>>> from sklearn.linear_model import LinearRegression
>>> from sklearn import datasets
>>> import pandas as pd
>>> import sktidy
>>> # Load data and traning the linear regression model
>>> X = datasets.load_iris(return_X_y = True, as_frame = True)[0]
>>> y = datasets.load_iris(return_X_y = True, as_frame = True)[1]
>>> my_lr = LinearRegression()
>>> my_lr.fit(X,y)
>>> # Get tidy output for the trained sklearn LinearRegression model
>>> tidy_lr(model = my_lr, X = X, y = y)

Module contents