# Simulate data

# Step 0. Install `msprime`

[msprime download instructions](https://msprime.readthedocs.io/en/stable/installation.html).

Some people might need to activate conda environment via `conda config --set auto_activate_base True`. You can turn it off once simulation is done by executing `conda config --set auto_activate_base False`.


### Step 1. Simulate data in terminal
#
## To activate this environment, use
##
##     $ source activate msprime-env
##
## To deactivate an active environment, use
##
##     $ conda deactivate

```
python3 msprime_script.py 1002000 10000 10000000 2e-8 2e-8 2020 > /mnt/AppRun/msprime/full3.vcf
```

Arguments: 
+ Number of haplotypes = 1002000
+ Effective population size = 10000 ([source](https://www.the-scientist.com/the-nutshell/ancient-humans-more-diverse-43556))
+ Sequence length = 10 million (same as Beagle 5's choice)
+ Rrecombination rate = 2e-8 (default)
+ mutation rate = 2e-8 (default)
+ seed = 2020
