2 - Colab

Shortcuts

  • Show shortcuts: ctrl + m, h
  • Insert code cell above: ctrl + m, a
  • Insert code cell below: ctrl + m, b
  • Delete cell/selection: ctrl + m, d
  • Convert to code cell: ctrl + m, y
  • Convert to markdown cell:: ctrl + m, m

More

  • render pandas dataframes into interactive tables: %load_ext google.colab.data_table

Download Files from Colab

from google.colab import files
files.download('<file_to_download>')

Upload Files to Colab

from google.colab import files
uploaded = files.upload()

Mount Google Drive

from google.colab import drive
drive.mount('/gdrive')
ls -la /gdrive

3 - Conda

Manage Environments

  • create environment: conda create --name <new_env_name>
  • create environment with python 3.9: conda create --name <new_env_name> python=3.9
  • activate environment: conda activate <env_name>
  • deactivate (leave) environment: conda deactivate
  • list available environments: conda info --envs
  • remove environment: conda remove --name <env_name> --all

Updates

Other Commands

Rename Conda Environment

Rename <src_env> to <target_env>:

conda create --name <target_env> --clone <src_env>
conda remove --name <src_env> --all

Installation

Conda installation on Linux

  • download Miniconda (not Anaconda): https://conda.io/en/latest/miniconda.html#windows-installers
  • download the 64 bit Miniconda3 for the highest Python version of your architecture
  • change install file to executable: chmod +x Miniconda3-latest-Linux-x86_64.sh
  • start installation: ./Miniconda3-latest-Linux-x86_64.sh
  • use default settings
  • log out and back in to activate new settings

Windows Install

  • download Miniconda (not Anaconda): https://conda.io/en/latest/miniconda.html#windows-installers
  • download Miniconda3 for the highest Python version
  • preferably the 64 bit version
  • proxy setup
    • add the following content to the .condarc file
    • located at C:\Users\<User>
    • <user> and <pass> are optional
    • some https settings use the http protocol and not https
proxy_servers:
  http: http://[<user>:<pass>@]corp.com:8080
  https: https://[<user>:<pass>@]corp.com:8080

5 - Docstrings

Description

Python docstrings can be written in many different formats. An overview of different methods can be found in the following Stack Overflow entry: What is the standard Python docstring format?

It seems to be clever to use the docstring format of scipy and numpy since these packages are very popular. A guide to the numpy docstring format is here: numpydoc docstring guide and here is a Restructured Text (reST) syntax CheatSheet for Sphinx.

6 - Exceptions

Most Important Exceptions

  • TypeError: Raised when an operation or function is applied to an object of inappropriate type. The associated value is a string giving details about the type mismatch.
  • NotImplementedError: In user defined base classes, abstract methods should raise this exception when they require derived classes to override the method, or while the class is being developed to indicate that the real implementation still needs to be added.
  • ValueError: Raised when an operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError.

Logging Exceptions

try:
    something()
except Exception:
    logger.error("something bad happened", exc_info=True)

also see https://www.loggly.com/blog/exceptional-logging-of-exceptions-in-python/

7 - Iterate in Python

Iterate keys of dict

d = {'x': 1, 'y': 2, 'z': 3}
for key in d:
    print(key, 'corresponds to', d[key])

Iterate keys and values of dict:

d = {'x': 1, 'y': 2, 'z': 3}
for key, value in d.items():
    print(key, 'corresponds to', value)

8 - Joblib

Commands

9 - Jupyter & JupyterLab

Install JupyterLab

  • conda install jupyterlab nb_conda_kernels

View Jupyter Notebook online

This can be done here: https://nbviewer.jupyter.org/

Add Conda Environment to Jupyter Lab

python -m ipykernel install --user --name <conda_env_name> --display-name "<conda_env_name>"

Environment settings for Jupyter

Environment settings for Jupyter are not read from .bashrc. You have to specify them in a .py file at ~/.ipython/profile_default/startup/

For example:

echo -e "import os\n\nos.environ[\"SOME_URL\"] = \"http://mlflow.company.tld:5000\"" > ~/.ipython/profile_default/startup/set_env.py

Install Jupyter on Server for Remote Access

Clean the Trash

When you use the “File Browser” of Jupyter Lab to delete files they are not deleted but moved to ~/.local/share/Trash. Clean that folder to delete them.

11 - Pandas

Create Dataframe

data = {"col1": [1, 2], "col2": [3, 4]}
df = pd.DataFrame(data=data)

Load and Save

  • save to CSV: df.to_csv("path_or_buffer")
  • save to CSV (without row names / index): df.to_csv("path_or_buffer", index=False)
  • load from CSV:
df = pd.read_csv(
    "path_or_buffer",
    sep=";",
    encoding="us-ascii",
    usecols=col_list,
    nrows=number_of_rows_to_read,
    low_memory=False,
    quoting=csv.QUOTE_NONE,
)

Display Data

  • count values in column (without NaN values): df["col_name"].value_counts()
  • count values in column (with NaN values): df["col_name"].value_counts(dropna=False)

Delete Data

  • delete column inline
    • df.drop("column_name", axis=1, inplace=True)
    • column_name can also be a list of str
  • remove rows on condition: df.drop(df[df["col_name"] == condition].index, inplace=True)
  • remove duplicates
    • keep first (inplace): df.drop_duplicates(inplace=True, keep="first")
    • only consider certain columns to identify duplicates, keep first (inplace): df.drop_duplicates(list_of_cols, inplace=True, keep="first")

Modify Data

  • sort Data
    • low to high values: df.sort_values("column_name", inplace=True)
    • high to low values: df.sort_values("column_name", ascending=False, inplace=True)

Combine Data

Stack two Dataframes

Never forget to ignore_index or you have duplicate index values and bad things might happen later!

df = pd.concat([df_01, df_02], ignore_index=True)

Display Settings

Examples for display settings:

pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.max_colwidth", None)

Filter nan Values

nan == nan is always false. That is why we can not use == to check for nan-values. Use pd.isnull(obj : scalar or array-like) instead or isnull(). Examples:

df.loc[pd.isnull(df["col"])]
df[df["col"].isnull()]

Other

  • rename columns: df.rename(columns={"a": "x"}, inplace=True)

12 - PIP

Install Packages

List Packages

  • list outdated packages: pip list -o
  • list packages in requirements.txt format: pip list --format freeze

Other Commands

  • delete package cache: pip cache purge

Install and update Packages from a File

For pip you can create so called requirements files. These files just list one package per line. Packages from this file can be installes with pip install -r <requirements.txt> and updatet with pip install -r <requirements.txt> -U. The update only makes sence when you do not specify a version number with the package.

Since pip does not support an “update all” mechanism this is a good way to install and update the needed packages.

To add a package from GIT just add git+<https_git_clone_link> instead of the normal package name.

Build PyPI Packages

14 - tqdm

Usage

Simple usage:

from tqdm import tqdm

for i in tqdm(range(10000)):
    # ...

Manual usage:

from tqdm import tqdm

with tqdm(total=100) as pbar:
    for i in range(10):
        sleep(0.1)
        pbar.update(10)

15 - Typing

Types

  • Any: Special type indicating an unconstrained type.
  • Optional: Optional type.
  • Dict
    • Dict[str, int]
    • Dict[str, List[int]]
  • Callable
    • Callable[[str], Dict[str, List[int]]]

Example

    def __init__(
        self,
        tokenizer_func: Callable[[str], Dict[str, List[int]]],
        augmentation_func: Callable[[str], str],
        train_data_sampling_callback: Callable[[List[str]], List[int]] = None,
    ):