This section provides the requisites for the exercises, as well as some optional tools to accelerate your analysis.
Requirements
Exercise data
Excerise data can be access at https://github.com/yanhui09/MAC2023-extra.
Conda & Mamba
Conda is a package manager that allows you to install, run, and update packages and their dependencies. It is a very useful tool to manage your analysis environment.
Mamba is a reimplementation of the conda
package manager in C++. It is much faster than conda
, and is recommended for large-scale analysis.
To speed up the analysis, mamba
is introduced here as a fast alternative for conda
.
Assuming you have installed conda
in previous sessions, here we only introduce how to install mamba
. If you still want to use conda
, you can replace mamba
with conda
in usage.
To install mamba
, please refer to the official document.
Here we recommend fresh install rather conda install. It’s easy to install with Mambaforge distribution. Find the right distribution for your system, download and install it.
For X86_64
Linux platform, you can use the following command to install mamba
:
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh
Docker
Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries and configuration files; they can communicate with each other through well-defined channels.
You can consider it as a virtual machine, but it is much faster and lighter than virtual machine.
To install docker
, please refer to the official document.
Due to the software dependencies (e.g.,
singularity
,seqkit
),LACA
andNART
are built and tested onlinux/amd64
only. If you are using other systems, please usedocker
to run the pipelines.
Basic usage
To run a docker
container, you simply need two steps:
- Pull the image from Docker Hub.
- Run a container with the downloaded image.
Here we use LACA
as an example.
To pull a docker
image, you can use the following command:
docker pull yanhui09/laca
To start a docker
docker, you need to mount your data directory, e.g., pwd
, to the /home
in the container.
Assuming your data is in the current directory pwd
, you can run a docker
container:
docker run -it -v `pwd`:/home --privileged yanhui09/laca
To exit a docker
container, you can use Ctrl + D
or exit
command.
Optional reading
Git
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. It is very useful to manage your code and analysis.
Here we mainly use git clone
to download the exercise data.
git clone --depth 1 https://github.com/yanhui09/MAC2023-extra.git
Snakemake
Snakemake is a workflow management system that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern specification language in python style.
LACA and NART are two pipelines based on snakemake
. If you are familiar with snakemake, you will have a better understanding of the phylosophy behind these pipelines.
You can read more about snakemake
in the official document.