· elixir

Elixir - Saving files on a remote node

This is where erlang/elixir really shine. If I had decided to let’s say, write the workers with python and use rabbitmq to handle the messages, I’d have to setup some sort of ssh or ftp or sftp or nfs transport. Or something else. And that would just be painful. However, as demonstrated in an earlier post where erlang nodes are connected together, there is some magic happenning. That magic includes that any code can easily be called on a remote node seamlessly.

Code is at https://github.com/tjheeta/elixir_web_crawler. Checkout step-3.

git clone https://github.com/tjheeta/elixir_web_crawler.git
git checkout step-3

Let’s say we want to write a save function that takes two variables, the url and the body.

def save_file(url, body) do
  path = generate_path(url)
  File.mkdir_p(Path.dirname(path))
  {:ok, file} = File.open path, [:write]
  IO.binwrite file, :zlib.gzip(body)
  File.close(file)
  {:ok}
end

Now we can also make a remote save function, in this case we are using the master_ip that has already been setup and we’re assuming that the module name is ElixirWebCrawler.File.

def save_remote(url, body) do
  ip = Confort.get(:master_ip)
  node_name = :"node@#{ip}"
  {:ok} = :rpc.call(node_name, ElixirWebCrawler.File, :save_file, [url, body])
end

I cannot express how much easier this is than setting up some sort of transport. The read functions are slightly more complex as they need to check if the file is local or not as we’re gzipping them. Essentially, it will check if the file has been saved locally, but if it isn’t, it will fetch the file from the master.

File.ex - a module to read and save files remotely and locally

defmodule ElixirWebCrawler.File do
  def generate_path(url) do
    uri = URI.parse(to_string(url))
    # could have malicious urls with ../../../, but this is demo code
    path = Path.join("/home/erlang/dl", uri.host)
    path = cond do
      uri.path == nil ->
        Path.join(path, "index.html")
      uri.path ->
        tmp = Path.join(path,uri.path)
        if String.ends_with? uri.path, "/" do
          tmp = Path.join(tmp, "index.html")
        end
        tmp
    end
    path = path <> ".gz"
    path
  end

  def save_remote(url, body) do
    ip = Confort.get(:master_ip)
    node_name = :"node@#{ip}"
    {:ok} = :rpc.call(node_name, ElixirWebCrawler.File, :save_file, [ url, body])
  end

  def save_file(url, body) do
    path = generate_path(url)
    File.mkdir_p(Path.dirname(path))
    {:ok, file} = File.open path, [:write]
    IO.binwrite file, :zlib.gzip(body)
    File.close(file)
    IO.puts "SAVED #{url} to #{path}"
    {:ok}
  end

  def read_remote(url) do
    ip = Confort.get(:master_ip)
    node_name = :"node@#{ip}"
    {:ok, body} = :rpc.call(node_name, ElixirWebCrawler.File, :read_file, [ url ])
    {:ok, body}
  end

  def read_file(url) do
    # this will crash unless it returns {:ok, body}
    # make sure it's in a processing queue somewhere 
    path = generate_path(url)
    case File.read path do
      {:ok, body} ->
        status=:ok
        value=:zlib.gunzip(body)
      {:error, body} ->
        if Node.self != :"node@#{Confort.get(:master_ip)}" do
          {:ok, body} = read_remote(url)
          status=:ok
          value=body
        else
          status=:error
          value="File doesn't exist on master"
        end
    end
    {status,value}
  end
end

Before testing this out, let’s make sure we’ve started the app on the storage node also and make sure the ip in config/main.conf is that of the storage node.

Testing the local save:

iex(node@10.0.3.179)1> ElixirWebCrawler.File.save_file("http://test.com/local.html", "12345")
SAVED http://test.com/local.html to /home/erlang/dl/test.com/local.html.gz
{:ok}
iex(node@10.0.3.179)4> ElixirWebCrawler.File.read_file("http://test.com/local.html") 
{:ok, "12345"}

The remote save:

iex(node@10.0.3.179)2> ElixirWebCrawler.File.save_remote("http://test.com/remote.html", "saved on remote node")
SAVED http://test.com/remote.html to /home/erlang/dl/test.com/remote.html.gz
{:ok}
iex(node@10.0.3.179)3> ElixirWebCrawler.File.read_file("http://test.com/remote.html")                        
{:ok, "saved on remote node"}

And finally, a non-existent file:

iex(node@10.0.3.179)5> ElixirWebCrawler.File.read_file("http://test.com/nonexist.html")
** (MatchError) no match of right hand side value: {:error, "File doesn't exist on master"}
    (elixir_web_crawler) lib/elixir_web_crawler/file.ex:42: ElixirWebCrawler.File.read_remote/1
    (elixir_web_crawler) lib/elixir_web_crawler/file.ex:56: ElixirWebCrawler.File.read_file/1

This code is a little more complicated as we’ve allowed the file to be saved both locally and on the master node. If we decided to just save on the central node, the code would be much shorter. Note that the erlang network assumes it is completely secure, however, as it is, it’s only 65 lines and we’ve saved a great deal of time by not having to set up a method to transfer the files.

tl;dr - rpc allows us to save files on the remote node

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket