Python: module inchlib

inchlib_clust

#coding: utf-8

Modules

argparse
copy
csv
fastcluster
.html">scipy.cluster.hierarchy
json
numpy
os
sklearn.preprocessing
re
scipy
sklearn
scipy.spatial
urllib2

Classes



Cluster
Dendrogram

class Cluster

    Class for data clustering

Methods defined here:

__cluster_columns__(self, column_distance, column_linkage)

__impute_missing_values__(self, data)

__init__(self)

__reorder_data__(self, data, order)

__return_missing_values__(self, data, missing_values_indexes)

cluster_data(self, row_distance='euclidean', row_linkage='single', axis='row', column_distance='euclidean', column_linkage='ward')
Performs clustering according to the given parameters. @datatype - numeric/binary @row_distance/column_distance - see. DISTANCES variable @row_linkage/column_linkage - see. LINKAGES variable @axis - row/both

normalize_data(self, feature_range=(0, 1), write_original=False)
Normalizes data to a scale from 0 to 1. When write_original is set to True, the normalized data will be clustered, but original data will be written to the heatmap.

read_csv(self, filename, delimiter=',', header=False, missing_value=False, datatype='numeric')
Reads data from the CSV file

read_data(self, rows, header=False, missing_value=False, datatype='numeric')
Reads data in a form of list of lists (tuples)

class Dendrogram

    Class which handles the generation of cluster heatmap format of clustered data. As an input it takes a Cluster instance with clustered data.

Methods defined here:

__add_column_metadata_to_data__(self)

__adjust_node_counts__(self)

__check_column_metadata_length__(self)

__compress_data__(self)

__connect_additional_data_to_data__(self, additional_data, compressed_value)

__connect_metadata_to_data__(self)

__get_cluster_heatmap__(self, write_data)

__get_column_dendrogram__(self)

__get_distance_treshold__(self, cluster_count)

__get_most_frequent__(self, col)

__init__(self, clustering)

__read_alternative_data__(self, alternative_data)

__read_alternative_data_file__(self, alternative_data_file, delimiter)

__read_metadata__(self, metadata, header)

__read_metadata_file__(self, metadata_file, delimiter, header)

__reorder_alternative_data__(self, alternative_data)

add_alternative_data(self, alternative_data, header, alternative_data_compressed_value)
Adds alternative data in a form of list of lists (tuples).

add_alternative_data_from_file(self, alternative_data_file, delimiter, header, alternative_data_compressed_value)
Adds alternative_data from csv file.

add_column_metadata(self, column_metadata, header=True)
Adds column metadata in a form of list of lists (tuples). Column metadata doesn't have header row, first item in each row is used as label instead

add_column_metadata_from_file(self, column_metadata_file, delimiter=',', header=True)
Adds column metadata from csv file. Column metadata doesn't have header.

add_metadata(self, metadata, header=True, metadata_compressed_value='median')
Adds metadata in a form of list of lists (tuples). Metadata_compressed_value specifies the resulted value when the data are compressed (median/mean/frequency)

add_metadata_from_file(self, metadata_file, delimiter, header=True, metadata_compressed_value='median')
Adds metadata from csv file. Metadata_compressed_value specifies the resulted value when the data are compressed (median/mean/frequency)

create_cluster_heatmap(self, compress=False, compressed_value='median', write_data=True)
Creates cluster heatmap representation in inchlib format. By setting compress parameter to True you can cut the dendrogram in a distance to decrease the row size of the heatmap to specified count. When compressing the type of the resulted value of merged rows is given by the compressed_value parameter (median, mean). When the metadata are nominal (text values) the most frequent is the result after compression. By setting write_data to False the data features won't be present in the resulting format.

export_cluster_heatmap_as_html(self, htmldir='.')
Export simple HTML page with embedded cluster heatmap and dependencies to given directory.

export_cluster_heatmap_as_json(self, filename=None)
Returns cluster heatmap in a JSON format or exports it to the file specified by the filename parameter.

Data

DISTANCES = {'binary': ['dice', 'hamming', 'jaccard', 'kulsinski', 'matching', 'rogerstanimoto', 'russellrao', 'sokalmichener', 'sokalsneath', 'yule'], 'numeric': ['braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'euclidean', 'mahalanobis', 'minkowski', 'seuclidean', 'sqeuclidean']}
LINKAGES = ['single', 'complete', 'average', 'centroid', 'ward', 'median', 'weighted']
RAW_LINKAGES = ['ward', 'centroid']
print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)

Data
		DISTANCES = {'binary': ['dice', 'hamming', 'jaccard', 'kulsinski', 'matching', 'rogerstanimoto', 'russellrao', 'sokalmichener', 'sokalsneath', 'yule'], 'numeric': ['braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'euclidean', 'mahalanobis', 'minkowski', 'seuclidean', 'sqeuclidean']} LINKAGES = ['single', 'complete', 'average', 'centroid', 'ward', 'median', 'weighted'] RAW_LINKAGES = ['ward', 'centroid'] print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)