IMAX2/3 Docs/Tutorials
Download IMAX2/3
- IMAX3 document(jpn) IMAX3 document(eng)
- IMAX2 document(jpn) IMAX2 document(eng)
- IMAX2/3 all-in-one kit including document, compiler, simulator, examples, FPGA bin-files, and Vivado-projects (42GB in total)
- IMAX2/3 suppremental kit for CentOS (300MB)
Introduction to IMAX3: Amazing Dataflow-Centric Gen4-CGLA(non-CGRA) (CGLA:Coarse Grained Linear Array)
Introductive slides with synthesizable notes
Expertized slides with synthesizable notes
IMAX2 Kit
ZCU102 (8 units) ... Vivado project is included.
- 250MHz, IMAX2 8 cores, 320 operations / 4 cycles, Cache/core 128KB
- 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA, Dual addr-synchronizer
- proj-arm64/fpga/README-ZCU102
- proj-arm64/fpga/ZCU102-step4000-20221020_IP.tgz
- proj-arm64/fpga/ZCU102-step4000-20221020.tgz
- proj-arm64/fpga/ZCU102-step4000-20221020.img.gz
- linux# zcat ZCU102-step4000-20221020.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
- linux# mount /dev/mmcblk0p2 /mnt
- linux# replace root-password in /mnt/etc/shadow
- linux# umount /mnt
- zcu102# insert SDcard
- zcu102# boot from SDcard
- zcu102# create users
- zcu102% extract proj-arm64.tgz (NFS is recommendation)
- zcu102% proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma-8st (matrix multiplication)
ZCU111 (16 units)
- 250MHz, IMAX2 16 cores, 640 operations / 4 cycles, Cache/core 128KB
- 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA, Dual addr-synchronizer
- proj-arm64/fpga/ZCU111-step4000-20220301.img.gz
- linux# zcat ZCU111-step4000-20220301.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
- linux# mount /dev/mmcblk0p2 /mnt
- linux# replace root-password in /mnt/etc/shadow
- linux# umount /mnt
- zcu111# insert SDcard
- zcu111# boot from SDcard
- zcu111# create users
- zcu111% extract proj-arm64.tgz (NFS is recommendation)
- zcu111% proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma-16st (matrix multiplication)
ZU19EG (16 units) ... Vivado project is included.
- 250MHz, IMAX2 16 cores, 640 operations / 4 cycles, Cache/core 128KB
- 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA, Dual addr-synchronizer
- proj-arm64/fpga/README-ZU19EG
- proj-arm64/fpga/ZU19EG-step4000-20241111_IP.tgz
- proj-arm64/fpga/ZU19EG-step4000-20241111.tgz
- proj-arm64/fpga/ZU19EG-step4000-20241111.img.gz
- linux# zcat ZU19EG-step4000-20241111.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
- linux# mount /dev/mmcblk0p2 /mnt
- linux# replace root-password in /mnt/etc/shadow
- linux# umount /mnt
- zu19eg# insert SDcard
- zu19eg# boot from SDcard (dhcp)
- linux% ssh -Y [email protected] (Xwindow)
- zu19eg% zcat proj-arm64.tgz|tar xpf -
- zu19eg% sudo proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma-16st (matrix-mult)
- passwd: temppwd
- localhost:11.0: Cannot open display
- zu19eg% cp ~/.Xauthority /tmp/111
- zu19eg% sudo cp /tmp/111 /root/.Xauthority
- zu19eg% sudo proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma-16st (retry)
- <<<ORIG>>>
- usec: ARM:2098589 DRAIN:0 CONF:0 REGV:0 RANGE:0 LOAD:0 EXEC:0 total:2098589 (usec)
- <<<IMAX>>>
- usec: ARM:426 DRAIN:1224 CONF:105 REGV:1041 RANGE:663 LOAD:14861 EXEC:24324 total:42647 (usec)
- zu19eg% cd proj-arm64/sample/mm_cnn_lf
- zu19eg% make -f Makefile-zynq.emax6+dma mm-zynq.emax6+dma-16st (how to make)
ZCU102+VU440 (64/128/192/256/512 units) ... Vivado project is included.
- 130MHz, IMAX2 64-512 cores, 2560-20480 operations / 4 cycles, Cache/core 64KB
- 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA, Dual addr-synchronizer
- proj-arm64/fpga/README-VU440
- proj-arm64/fpga/ZCU102-step4000-20201010.img.gz
- proj-arm64/fpga/VU440-step4000-20221020.tgz
- proj-arm64/fpga/VU440-step4000-20221020-V24.1-78.125+78.125+48+260+130+48-CRYPTO-SPU.bin
- vu440# connect with zcu102 (see figure)
- vu440# write VU440-step4000-20221020-V24.1-78.125+78.125+48+260+130+48-CRYPTO-SPU.bin to SDcard
- vu440# insert SDcard
- linux# zcat ZCU102-step4000-20201010.img.gz | dd bs=64k of=/dev/mmcblk0 (16GB SDcard)
- linux# mount /dev/mmcblk0p2 /mnt
- linux# replace root-password in /mnt/etc/shadow
- linux# umount /mnt
- zcu102# insert SDcard
- zcu102# boot from SDcard
- zcu102# create users
- zcu102% extract proj-arm64.tgz (NFS is recommendation)
- zcu102% sudo proj-arm64/sample/mm_cnn_lf/mm-zynq.emax6+dma (matrix multiplication)
IMAX3 Kit
VMK180 (32 units) ... Vivado project is included.
- 180MHz, IMAX3 32 cores, 1280 operations / 4 cycles, Cache/core 64KB
- 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA, Dual addr-synchronizer
- proj-arm64/fpga/README-VPK180
- proj-arm64/fpga/VMK180-step4000-20230410_IP.tgz
- proj-arm64/fpga/VMK180-step4000-20230410.tgz
- proj-arm64/fpga/VMK180-step4000-20230410.img.gz
- linux# zcat VMK180-step4000-20230410.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# mount /dev/mmcblk0p2 /mnt
- linux# replace root-password in /mnt/etc/shadow
- linux# umount /mnt
- vmk180# insert SDcard
- vmk180# boot from SDcard
- vmk180# create users
- vmk180% extract proj-arm64.tgz
- vmk180% proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma-32st (matrix multiplication)
VMK180 (32*2 units) ... Vivado project is included.
- 180MHz, IMAX3 64 cores, 2560 operations / 4 cycles, Cache/core 64KB
- 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA, Dual addr-synchronizer
- proj-arm64/fpga/README-VPK180
- proj-arm64/fpga/VMK180-step4200-MASTER.tgz
- proj-arm64/fpga/VMK180-step4200-SLAVE.tgz
- proj-arm64/fpga/VMK180-step4200-MASTER.img.gz(**NEW**)
- proj-arm64/fpga/VMK180-step4200-SLAVE.img.gz(**NEW**)
- linux# zcat VMK180-step4200-MASTER.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# zcat VMK180-step4200-SLAVE.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# mount /dev/mmcblk0p2 /mnt
- linux# replace root-password in /mnt/etc/shadow
- linux# umount /mnt
- vmk180# connect two boards w/ QSFP28-AOC cable
- vmk180# insert SDcard
- vmk180# boot from SDcard
- vmk180# create users
- vmk180% extract proj-arm64.tgz
- vmk180% proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma-32st (matrix multiplication)
- vmk180% proj-arm64/sample/test/test025-acap.emax7+dma-32st (dual matrix multiplication)
- vmk180% cd proj-arm64/sample/tsim (MNIST/CIFAR10)
- vmk180% ./tsim-vmk180.emax7+dma -x -i -r -I0 -C1 -F1 (MNIST conv1+fc inference)
- vmk180% ./tsim-vmk180.emax7+dma -x -t -I0 -C1 -F1 (MNIST conv1+fc training)
- vmk180% ./tsim-vmk180.emax7+dma -x -i -r -I0 -C3 -F1 (MNIST conv3+fc inference)
- vmk180% ./tsim-vmk180.emax7+dma -x -t -I0 -C3 -F1 (MNIST conv3+fc training)
- vmk180% ./tsim-vmk180.emax7+dma -x -i -r -I1 -C6 -F2 (CIFAR10 conv6+fc2 inference)
- vmk180% ./tsim-vmk180.emax7+dma -x -t -I1 -C6 -F2 (CIFAR10 conv6+fc2 training)
VPK180 (64*2 units)
- 170MHz, IMAX3 128 cores, 5120 operations / 4 cycles, Cache/core 512KB
- 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA, Dual addr-synchronizer
- proj-arm64/fpga/README-VPK180
- proj-arm64/fpga/VPK180-step4000-20240930_IP.tgz
- proj-arm64/fpga/VPK180-step4000-20240930.tgz
VPK180 (64*8 units)
- 170MHz, IMAX3 512 cores, 20480 operations / 4 cycles, Cache/core 512KB
- 32-load/8-store, quad-sparse-load, 3-cascaded octa-int/media, octa-single-float FMA, 32-stochastic FMA, Dual addr-synchronizer
- proj-arm64/fpga/README-VPK180
- proj-arm64/fpga/VPK180-step4800-MASTER.tgz
- proj-arm64/fpga/VPK180-step4800-SLAVE.tgz
- proj-arm64/fpga/alice120-step4800-master.img.gz(**NEW**)
- proj-arm64/fpga/alice122-step4800-slave1.img.gz(**NEW**)
- proj-arm64/fpga/alice124-step4800-slave2.img.gz(**NEW**)
- proj-arm64/fpga/alice126-step4800-slave3.img.gz(**NEW**)
- linux# zcat alice120-step4800-master.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# zcat alice122-step4800-slave1.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# zcat alice124-step4800-slave2.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# zcat alice126-step4800-slave3.img.gz | dd bs=64k of=/dev/mmcblk0 (32GB SDcard)
- linux# mount /dev/mmcblk0p2 /mnt
- linux# replace root-password in /mnt/etc/shadow
- linux# umount /mnt
- vmk180# connect four boards w/ QSFPDD-DAC cable
- vpk180# insert SDcard
- vpk180# boot from SDcard
- vpk180# create users
- vpk180% extract proj-arm64.tgz
- vpk180% sudo proj-arm64/sample/mm_cnn_lf/mm-acap.emax7+dma (matrix multiplication)
- vpk180% sudo proj-arm64/sample/test/test025-acap.emax7+dma (dual matrix multiplication)
- vpk180% cd proj-arm64/sample/tsim (MNIST/CIFAR10)
- vpk180% ./tsim-acap.emax7+dma -x -i -r -I0 -C1 -F1 (MNIST conv*1+fc inference)
- vpk180% ./tsim-acap.emax7+dma -x -t -I0 -C1 -F1 (MNIST conv*1+fc training)
- vpk180% ./tsim-acap.emax7+dma -x -i -r -I0 -C3 -F1 (MNIST conv*3+fc inference)
- vpk180% ./tsim-acap.emax7+dma -x -t -I0 -C3 -F1 (MNIST conv*3+fc training)
- vpk180% ./tsim-acap.emax7+dma -x -i -r -I1 -C6 -F2 (CIFAR10 conv6+fc2 inference)
- vpk180% ./tsim-acap.emax7+dma -x -t -I1 -C6 -F2 (CIFAR10 conv6+fc2 training)