<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Tensorflow on My Blog</title><link>/tags/tensorflow/</link><description>Recent content in Tensorflow on My Blog</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sat, 10 Mar 2018 00:00:00 +0000</lastBuildDate><atom:link href="/tags/tensorflow/index.xml" rel="self" type="application/rss+xml"/><item><title>horovod</title><link>/2018/03/10/horovod/</link><pubDate>Sat, 10 Mar 2018 00:00:00 +0000</pubDate><guid>/2018/03/10/horovod/</guid><description>&lt;!-- toc --&gt;
&lt;p&gt;[TOC]&lt;/p&gt;
&lt;h1 id="官方介绍"&gt;官方介绍&lt;/h1&gt;
&lt;p&gt;Horovod is a distributed training framework for TensorFlow, Keras, and PyTorch. The goal of Horovod is to make distributed Deep Learning fast and easy to use.&lt;/p&gt;
&lt;p&gt;官方测试效果&lt;/p&gt;
&lt;p&gt;&lt;img alt="training" loading="lazy" src="/2018/03/10/horovod/horovod.png"&gt;&lt;/p&gt;
&lt;h1 id="running-horovod"&gt;Running Horovod&lt;/h1&gt;
&lt;p&gt;The example commands below show how to run distributed training. See the &lt;a href="https://github.com/uber/horovod/blob/master/docs/running.md"&gt;Running Horovod&lt;/a&gt; page for more instructions, including RoCE/InfiniBand tweaks and tips for dealing with hangs.&lt;/p&gt;
&lt;h2 id="1-单机4卡"&gt;1. 单机4卡:&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# docker
nvidia-docker run -it 172.16.10.10:5000/horovod:0.12.1-tf1.8.0-py3.5
mpirun -np 4 -H localhost:4 python keras_mnist_advanced.py
# singularity
singularity shell --nv /scratch/containers/ubuntu.simg
mpirun -np 4 -H localhost:4 python keras_mnist_advanced.py
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="2-多机多卡"&gt;2. 多机多卡:&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;$ mpirun -np 16 \
-H server1:4,server2:4,server3:4,server4:4 \
...
python train.py
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="3--完整-docker-使用horovod"&gt;3. &lt;a href="https://github.com/uber/horovod/blob/master/docs/docker.md"&gt;完整 Docker 使用horovod &lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;​&lt;/p&gt;</description></item></channel></rss>