Data Analysis & Visualization

Here you can find the different data analysis and visualization functions in the package.

Visualization Utilities

pyBiodatafuse.analyzer.utils.plot_pie_chart(template_df: DataFrame, fig_size: tuple = (10, 10)) <module 'matplotlib.pyplot' from '/home/docs/checkouts/readthedocs.org/user_builds/pybiodatafuse/envs/stable/lib/python3.9/site-packages/matplotlib/pyplot.py'>[source]

Plot a pie chart.

Parameters:
  • template_df – A dataframe with two columns: “label” and “value”

  • fig_size – A tuple with the size of the figure

Returns:

A pie chart

pyBiodatafuse.analyzer.utils.plot_hbarplot_chart(template_df: DataFrame, x_label: str = 'Label', y_label: str = 'Value', fig_size: tuple = (10, 10)) <module 'matplotlib.pyplot' from '/home/docs/checkouts/readthedocs.org/user_builds/pybiodatafuse/envs/stable/lib/python3.9/site-packages/matplotlib/pyplot.py'>[source]

Plot a bar plot.

Parameters:
  • template_df – A dataframe with two columns: “label” and “value”

  • x_label – The x-axis label

  • y_label – The y-axis label

  • fig_size – A tuple with the size of the figure

Returns:

A bar plot

pyBiodatafuse.analyzer.utils.plotly_pie_chart(template_df: DataFrame, fig_size: tuple = (10, 10)) <module 'plotly.express' from '/home/docs/checkouts/readthedocs.org/user_builds/pybiodatafuse/envs/stable/lib/python3.9/site-packages/plotly/express/__init__.py'>[source]

Plot a pie chart using Plotly.

Parameters:
  • template_df – A dataframe with two columns: “label” and “value”

  • fig_size – A tuple with the size of the figure

Returns:

A plotly pie chart

pyBiodatafuse.analyzer.utils.plotly_barplot_chart(template_df: DataFrame, x_label: str = 'Label', y_label: str = 'Value', fig_size: tuple = (10, 10)) <module 'plotly.express' from '/home/docs/checkouts/readthedocs.org/user_builds/pybiodatafuse/envs/stable/lib/python3.9/site-packages/plotly/express/__init__.py'>[source]

Plot a bar plot using Plotly.

Parameters:
  • template_df – A dataframe with two columns: “label” and “value”

  • x_label – The x-axis label

  • y_label – The y-axis label

  • fig_size – A tuple with the size of the figure

Returns:

A bar plot

Literature Explorer

pyBiodatafuse.analyzer.explorer.literature.get_wikidata_gene_literature(bridgedb_df: DataFrame) Dict[str, Set[str]][source]

Get PubMed articles linked to a gene or its encoded protein.

Parameters:

bridgedb_df – BridgeDb output for creating the list of gene ids to query

Returns:

a dictionary with the NCBI gene id as the key and the PMIDs as the value.

Patent Explorer

pyBiodatafuse.analyzer.explorer.patent.get_patent_from_pubchem(bridgedb_df: DataFrame) dict[source]

Get patent data summary from PubChem compounds.

The output is the following: {CID: [“US: X”, “EP: X”, “WO: X”, “Others: X”]} :param bridgedb_df: A dataframe with the BridgeDb or Pubchem harmonized output :returns: A dictionary with the PubChem Compound ID as key and the patent counts as value

Graph Summary

class pyBiodatafuse.analyzer.summarize.BioGraph(graph=None, graph_path=None, graph_format='pickle', disease_df=None)[source]

Bases: MultiDiGraph

BioGraph class to analyze the graph.

Initialize the BioGraph class.

Parameters:
  • graph – networkx graph object

  • graph_path – path to the graph file

  • graph_format – format of the graph file

  • disease_df – disease dataframe to build the graph

Raises:

ValueError – if graph_format is not ‘pickle’ or ‘gml’

get_graph_summary() str[source]

Display graph summary.

count_nodes_by_type(plot: bool = False, interactive: bool = False) DataFrame | None[source]

Count the differnent nodes type in the graph.

count_edge_by_type(plot: bool = False, interactive: bool = False) DataFrame | None[source]

Count the different edge types in the graph.

count_nodes_by_data_source(plot: bool = False) DataFrame | None[source]

Get the count of nodes by data source.

count_edge_by_data_source(plot: bool = False) DataFrame | None[source]

Get the count of edges by data source.

get_all_nodes_by_labels() Dict[str, Any][source]

Get all nodes with their label type.

get_all_nodes_by_type(label: str) list[source]

Get all nodes by specific label type.

get_nodes_by_label(label: str) list | None[source]

Get all nodes by specific label type.

get_publications_for_genes()[source]

Get publications for genes.

get_patents_for_compounds()[source]

Get patents for compounds.

node_in_graph(node_type: str)[source]

Check if the node is in the graph.

get_source_interactions()[source]

Get interactions of a source.

get_chemical_metatdata()[source]

Get metadata of a chemical.

get_subgraph(node_types: list)[source]

Get subgraph of the graph.

Parameters:

node_types – list of node types

Raises:

AssertionError – if node type not in the graph

Returns:

subgraph with the given node types