1. Requirements
    1. Exercise data
    2. Conda & Mamba
    3. Docker
      1. Basic usage
  2. Optional reading
    1. Git
    2. Snakemake

This section provides the requisites for the exercises, as well as some optional tools to accelerate your analysis.

Requirements

Exercise data

Excerise data can be access at https://github.com/yanhui09/MAC2023-extra. download

Conda & Mamba

conda_mamba

Conda is a package manager that allows you to install, run, and update packages and their dependencies. It is a very useful tool to manage your analysis environment.

Mamba is a reimplementation of the conda package manager in C++. It is much faster than conda, and is recommended for large-scale analysis.

To speed up the analysis, mamba is introduced here as a fast alternative for conda.

Assuming you have installed conda in previous sessions, here we only introduce how to install mamba. If you still want to use conda, you can replace mamba with conda in usage.

To install mamba, please refer to the official document.

Here we recommend fresh install rather conda install. It’s easy to install with Mambaforge distribution. Find the right distribution for your system, download and install it.

For X86_64 Linux platform, you can use the following command to install mamba:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh

Docker

docker

Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.

You can consider it as a virtual machine, but it is much faster and lighter than virtual machine.

To install docker, please refer to the official document.

Due to the software dependencies (e.g., singularity, seqkit), LACA and NART are built and tested on linux/amd64 only. If you are using other systems, please use docker to run the pipelines.

Basic usage

To run a docker container, you simply need two steps:

  1. Pull the image from Docker Hub.
  2. Run a container with the downloaded image.

Here we use LACA as an example.

To pull a docker image, you can use the following command:

docker pull yanhui09/laca

To start a docker docker, you need to mount your data directory, e.g., pwd, to the /home in the container.

Assuming your data is in the current directory pwd, you can run a docker container:

docker run -it -v `pwd`:/home --privileged yanhui09/laca

To exit a docker container, you can use Ctrl + D or exit command.

Optional reading

Git

git

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. It is very useful to manage your code and analysis.

Here we mainly use git clone to download the exercise data.

git clone --depth 1 https://github.com/yanhui09/MAC2023-extra.git

Snakemake

snakemake

Snakemake is a workflow management system that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern specification language in python style.

LACA and NART are two pipelines based on snakemake. If you are familiar with snakemake, you will have a better understanding of the phylosophy behind these pipelines.

You can read more about snakemake in the official document.


© 2023 Yan Hui. Released under the CC BY-SA license