· elixir

Elixir - app startup and runit

We’ve gone through a few posts and have a working application now. But if we’re going to fire this up on many nodes, we’ll have to automate the startup. The other problem is that we want the storage node to have the code running on it, but not be running any workers. We’ll setup some configuration for all of this and add it to the startup.

Code is at https://github.com/tjheeta/elixir_web_crawler. Checkout step-5.

git clone https://github.com/tjheeta/elixir_web_crawler.git
git checkout step-5

First things first, we want to ditch the iex console and just get the VM running. We have defined the default application in mix.exs. So we modify elixir_web_crawler.ex and define a start parameter which fires up start_link for the Worker:

$ cat lib/elixir_web_crawler.ex 
defmodule ElixirWebCrawler do
  use Application
  def start(_type, _args) do
    ElixirWebCrawler.Worker.start_link
  end
end

This isn’t quite enough for it to fire up at startup, we need to specify the app in mix.exs also. Here’s the diff:

diff --git a/mix.exs b/mix.exs
index 4fd68bb..6c24c28 100644
--- a/mix.exs
+++ b/mix.exs
@@ -13,7 +13,9 @@ defmodule ElixirWebCrawler.Mixfile do
   #
   # Type `mix help compile.app` for more information
   def application do
-    [applications: [:logger, :eredis, :mix, :confort, :ibrowse, :ssl]]
+    [applications: [:logger, :eredis, :mix, :confort, :ibrowse, :ssl],
+          mod: {ElixirWebCrawler, []}
+    ]
   end

And finally, we need to modify startup.sh to not use elixir –no-halt instead of iex.

diff --git a/ansible/roles/crawler_worker/templates/startup.j2 b/ansible/roles/crawler_worker/templates/startup.j2
index f2a0f4f..b844f83 100644
--- a/ansible/roles/crawler_worker/templates/startup.j2
+++ b/ansible/roles/crawler_worker/templates/startup.j2
@@ -6,5 +6,5 @@ cp {{ homedir }}/main.conf  ${APPDIR}/config/
 cd ${APPDIR}
 mix deps.get
 mix deps.compile
-iex  --name "node@{{ ansible_default_ipv4['address'] }}" -S mix
-#elixir  --name "node@{{ ansible_default_ipv4['address'] }}" -S mix
+#iex  --name "node@{{ ansible_default_ipv4['address'] }}" -S mix
+elixir --no-halt --name "node@{{ ansible_default_ipv4['address'] }}" -S mix

Now startup.sh will start the application without any intervention and we can start it up with runit. For those that don’t know about runit, it’s a process supervisor that was inspired by djb’s daemontools. It monitors whatever is executed by its run script. Note that if the run script exits, it will restart it. We will create the directory runit_crawlr in which we will place the run script. The directory format goes like this:

erlang@worker1:~$ find runit_crawlr/ -type f
runit_crawlr/log/run
runit_crawlr/run

# This directory is then symlinked to /etc/service/
erlang@worker1:~$ ls -l /etc/service
lrwxrwxrwx 1 root root 25 Dec  7 16:21 runit_crawlr -> /home/erlang/runit_crawlr

More information about this can be found at the runit website and how to use svc to handle the process itself. We’ve modified the ansible playbook to create these files and directories. The logs are in /etc/service/runit_crawlr/log/main/current and we can see that it exits after downloading 100 items. Then it keeps restarting and exiting, but that’s sort of how it goes as a safety measure against running errant code. If you want to adjust the parameters, feel free to modify ansible/playbook.yml . By default, the storage node does not start the crawler, and on the workers there are 3 concurrent loops which stop after downloading 100 items from a website which is kept track by redis. Eventually, we will be bound by IO on the storage node, and we’ll have to add multiple nodes and shard the data.

tl;dr - create a start function in the main application, adjust mix.exs, and use elixir –no-halt with runit to get a permanently running app.

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket