Exile is an alternative to [ports](https://hexdocs.pm/elixir/Port.html) for running external programs. It provides back-pressure, non-blocking io, and tries to fix ports issues.
Exile is built around the idea of having demand-driven, asynchronous interaction with external process. Think of streaming a video through `ffmpeg` to serve a web request. Exile internally uses NIF. See [Rationale](#rationale) for details. It also provides stream abstraction for interacting with an external program. For example, getting audio out of a stream is as simple as
See `Exile.stream!/2` module doc for more details about handling stderr and other options.
`Exile.stream!/2` is a convenience wrapper around `Exile.Process`. Prefer using `Exile.stream!` over using `Exile.Process` directly.
Exile requires OTP v22.1 and above.
Exile is based on NIF, please know consequence of that before using Exile. For basic use cases use [ExCmd](https://github.com/akash-akya/ex_cmd) instead.
+ iex> input_stream = Stream.repeatedly(fn -> "A" end)
+ iex> binary =
+ ...> Exile.stream!(~w(cat), input: input_stream, ignore_epipe: true) # we need to ignore epipe since we are terminating the program before the input completes
+ ...> |> Stream.take(2) # we must limit since the input stream is infinite
+ ...> |> Enum.into("")
+ iex> is_binary(binary)
+ true
+ iex> "AAAAA" <> _ = binary
+ ```
+
+ Run a command with input Collectable
+
+ ```
+ # Exile calls the callback with a sink where the process can push the data
+ iex> Exile.stream!(~w(cat), input: fn sink ->
+ ...> Stream.map(1..10, fn num -> "#{num} " end)
+ ...> |> Stream.into(sink) # push to the external process
+ ...> |> Stream.run()
+ ...> end)
+ ...> |> Stream.take(100) # we must limit since the input stream is infinite
+ ...> |> Enum.into("")
+ "1 2 3 4 5 6 7 8 9 10 "
+ ```
+
+ When the command wait for the input stream to close
+
+ ```
+ # base64 command wait for the input to close and writes data to stdout at once
+ For more details about stream API, see `Exile.stream!/2`.
+
+ For more details about inner working, please check `Exile.Process`
+ documentation.
+
+
## Rationale
Existing approaches
#### Port
Port is the default way of executing external commands. This is okay when you have control over the external program's implementation and the interaction is minimal. Port has several important issues.
* it can end up creating [zombie process](https://hexdocs.pm/elixir/Port.html#module-zombie-operating-system-processes)
* cannot selectively close stdin. This is required when the external programs act on EOF from stdin
* it sends command output as a message to the beam process. This does not put back pressure on the external program and leads exhausting VM memory
#### Middleware based solutions
Libraries such as [Porcelain](https://github.com/alco/porcelain/), [Erlexec](https://github.com/saleyn/erlexec), [Rambo](https://github.com/jayjun/rambo), etc. solves the first two issues associated with ports - zombie process and selectively closing STDIN. But not the third issue - having back-pressure. At a high level, these libraries solve port issues by spawning an external middleware program which in turn spawns the program we want to run. Internally uses port for reading the output and writing input. Note that these libraries are solving a different subset of issues and have different functionality, please check the relevant project page for details.
* no back-pressure
* additional os process (middleware) for every execution of your program
* in few cases such as porcelain user has to install this external program explicitly
* might not be suitable when the program requires constant communication between beam process and external program
On the plus side, unlike Exile, bugs in the implementation does not bring down whole beam VM.
This is my other stab at solving back pressure on the external program issue. It implements a demand-driven protocol using [odu](https://github.com/akash-akya/odu) to solve this. Since ExCmd is also a port based solution, concerns previously mentioned applies to ExCmd too.
## Exile
Internally Exile uses non-blocking asynchronous system calls to interact with the external process. It does not use port's message based communication instead does raw stdio using NIF. Uses asynchronous system calls for IO. Most of the system calls are non-blocking, so it should not block the beam schedulers. Makes use of dirty-schedulers for IO.
**Highlights**
* Back pressure
* no middleware program
* no additional os process. No performance/resource cost
* no need to install any external command
* tries to handle zombie process by attempting to clean up external process. *But* as there is no middleware involved with exile, so it is still possible to endup with zombie process if program misbehave.
* stream abstraction
* selectively consume stdout and stderr streams
If you are running executing huge number of external programs **concurrently** (more than few hundred) you might have to increase open file descriptors limit (`ulimit -n`)
Non-blocking io can be used for other interesting things. Such as reading named pipe (FIFO) files. `Exile.stream!(~w(cat data.pipe))` does not block schedulers, so you can open hundreds of fifo files unlike default `file` based io.
#### TODO
* add benchmarks results
### 🚨 Obligatory NIF warning
As with any NIF based solution, bugs or issues in Exile implementation **can bring down the beam VM**. But NIF implementation is comparatively small and mostly uses POSIX system calls. Also, spawned external processes are still completely isolated at OS level.
If all you want is to run a command with no communication, then just sticking with `System.cmd` is a better.
### License
Copyright (c) 2020 Akash Hiremath.
Exile source code is released under Apache License 2.0. Check [LICENSE](LICENSE.md) for more information.
diff --git a/c_src/exile.c b/c_src/exile.c
index ed493e3..5e1645c 100644
--- a/c_src/exile.c
+++ b/c_src/exile.c
@@ -1,351 +1,361 @@
#include "erl_nif.h"
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include "utils.h"
#ifdef ERTS_DIRTY_SCHEDULERS
#define USE_DIRTY_IO ERL_NIF_DIRTY_JOB_IO_BOUND
#else
#define USE_DIRTY_IO 0
#endif
static const int UNBUFFERED_READ = -1;
static const int PIPE_BUF_SIZE = 65535;
static const int FD_CLOSED = -1;
static ERL_NIF_TERM ATOM_TRUE;
static ERL_NIF_TERM ATOM_FALSE;
static ERL_NIF_TERM ATOM_OK;
static ERL_NIF_TERM ATOM_ERROR;
static ERL_NIF_TERM ATOM_UNDEFINED;
static ERL_NIF_TERM ATOM_INVALID_FD;
static ERL_NIF_TERM ATOM_SELECT_CANCEL_ERROR;
static ERL_NIF_TERM ATOM_EAGAIN;
+static ERL_NIF_TERM ATOM_EPIPE;
-static ERL_NIF_TERM ATOM_SIGKILL;
static ERL_NIF_TERM ATOM_SIGTERM;
+static ERL_NIF_TERM ATOM_SIGKILL;
+static ERL_NIF_TERM ATOM_SIGPIPE;
static void close_fd(int *fd) {
if (*fd != FD_CLOSED) {
close(*fd);
*fd = FD_CLOSED;
}
}
static int cancel_select(ErlNifEnv *env, int *fd) {
int ret;
if (*fd != FD_CLOSED) {
ret = enif_select(env, *fd, ERL_NIF_SELECT_STOP, fd, NULL, ATOM_UNDEFINED);
+ iex> input_stream = Stream.repeatedly(fn -> "A" end)
+ iex> binary =
+ ...> Exile.stream!(~w(cat), input: input_stream, ignore_epipe: true) # we need to ignore epipe since we are terminating the program before the input completes
+ ...> |> Stream.take(2) # we must limit since the input stream is infinite
+ ...> |> Enum.into("")
+ iex> is_binary(binary)
+ true
+ iex> "AAAAA" <> _ = binary
+ ```
+
+ Run a command with input Collectable
+
+ ```
+ # Exile calls the callback with a sink where the process can push the data
+ iex> Exile.stream!(~w(cat), input: fn sink ->
+ ...> Stream.map(1..10, fn num -> "#{num} " end)
+ ...> |> Stream.into(sink) # push to the external process
+ ...> |> Stream.run()
+ ...> end)
+ ...> |> Stream.take(100) # we must limit since the input stream is infinite
+ ...> |> Enum.into("")
+ "1 2 3 4 5 6 7 8 9 10 "
+ ```
+
+ When the command wait for the input stream to close
+
+ ```
+ # base64 command wait for the input to close and writes data to stdout at once
- If the input in a function with arity 1, Exile will call that function with a `Collectable` as the argument. The function must *push* input to this collectable. Return value of the function is ignored.
+ If the input in a function with arity 1, Exile will call that function
+ with a `Collectable` as the argument. The function must *push* input to this
+ collectable. Return value of the function is ignored.
- By defaults no input will be given to the command
+ By defaults no input is sent to the command.
- * `exit_timeout` - Duration to wait for external program to exit after completion before raising an error. Defaults to `:infinity`
+ * `exit_timeout` - Duration to wait for external program to exit after completion
+ (when stream ends). Defaults to `:infinity`
- * `max_chunk_size` - Maximum size of each iodata chunk emitted by stream. Chunk size will be variable depending on the amount of data available at that time. Defaults to 65535
+ * `max_chunk_size` - Maximum size of iodata chunk emitted by the stream.
+ Chunk size can be less than the `max_chunk_size` depending on the amount of
+ data available to be read. Defaults to `65_535`
- * `use_stderr` - When set to true, stream will contain stderr output along with stdout output. Element of the stream will be of the form `{:stdout, iodata}` or `{:stderr, iodata}` to differentiate different streams. Defaults to false. See example below
+ * `enable_stderr` - When set to true, output stream will contain stderr data along
+ with stdout. Stream data will be of the form `{:stdout, iodata}` or `{:stderr, iodata}`
+ to differentiate different streams. Defaults to false. See example below
+ * `ignore_epipe` - When set to true, reader can exit early without raising error.
+ Typically writer gets `EPIPE` error on write when program terminate prematurely.
+ With `ignore_epipe` set to true this error will be ignored. This can be used to
+ match UNIX shell default behaviour. EPIPE is the error raised when the reader finishes
+ the reading and close output pipe before command completes. Defaults to `false`.
- All other options are passed to `Exile.Process.start_link/2`
+ Remaining options are passed to `Exile.Process.start_link/2`
- `Exile.stream!/1` should be preferred over using this. Use this only if you need more control over the life-cycle of IO streams and OS process.
+ Use `Exile.stream!/1` over using this. Use this only if you are
+ familiar with life-cycle and need more control of the IO streams
+ and OS process.
## Comparison with Port
- * it is demand driven. User explicitly has to `read` the command output, and the progress of the external command is controlled using OS pipes. Exile never load more output than we can consume, so we should never experience memory issues
+ * it is demand driven. User explicitly has to `read` the command
+ output, and the progress of the external command is controlled
+ using OS pipes. Exile never load more output than we can consume,
+ so we should never experience memory issues
+
* it can close stdin while consuming output
- * tries to handle zombie process by attempting to cleanup external process. Note that there is no middleware involved with exile so it is still possible to endup with zombie process.
- * selectively consume stdout and stderr streams
- Internally Exile uses non-blocking asynchronous system calls to interact with the external process. It does not use port's message based communication, instead uses raw stdio and NIF. Uses asynchronous system calls for IO. Most of the system calls are non-blocking, so it should not block the beam schedulers. Make use of dirty-schedulers for IO
+ * tries to handle zombie process by attempting to cleanup
+ external process. Note that there is no middleware involved
+ with exile so it is still possible to endup with zombie process.
+
+ * selectively consume stdout and stderr
+
+ Internally Exile uses non-blocking asynchronous system calls
+ to interact with the external process. It does not use port's
+ message based communication, instead uses raw stdio and NIF.
+ Uses asynchronous system calls for IO. Most of the system
+ calls are non-blocking, so it should not block the beam
+ schedulers. Make use of dirty-schedulers for IO
+
+ ## Introduction
+
+ `Exile.Process` is a process based wrapper around the external
+ process. It is similar to `port` as an entity but the interface is
+ different. All communication with the external process must happen
+ via `Exile.Process` interface.
+
+ Exile process life-cycle tied to external process and owners. All
+ system resources such are open file-descriptors, external process
+ are cleaned up when the `Exile.Process` dies.
+
+ ### Owner
+
+ Each `Exile.Process` has an owner. And it will be the process which
+ created it (via `Exile.Process.start_link/2`). Process owner can not
+ be changed.
+
+ Owner process will be linked to the `Exile.Process`. So when the
+ exile process is dies abnormally the owner will be killed too or
+ visa-versa. Owner process should avoid trapping the exit signal, if
+ you want avoid the caller getting killed, create a separate process
+ as owner to run the command and monitor that process.
+
+ Only owner can get the exit status of the command, using
+ `Exile.Process.await_exit/2`. All exile processes **MUST** be
+ awaited. Exit status or reason is **ALWAYS** sent to the owner. It
+ is similar to [`Task`](https://hexdocs.pm/elixir/Task.html). If the
+ owner exit without `await_exit`, the exile process will be killed,
+ but if the owner continue without `await_exit` then the exile
+ process will linger around till the process exit.
+ iex> :timer.sleep(500) # wait for the reader and writer to change pipe owner, otherwise `await_exit` will close the pipes before we change pipe owner
+ iex> Process.await_exit(p, :infinity) # let the reader and writer take indefinite time to finish
Starts external program using `cmd_with_args` with options `opts`
- `cmd_with_args` must be a list containing command with arguments. example: `["cat", "file.txt"]`.
+ `cmd_with_args` must be a list containing command with arguments.
+ example: `["cat", "file.txt"]`.
### Options
+
* `cd` - the directory to run the command in
- * `env` - a list of tuples containing environment key-value. These can be accessed in the external program
- * `use_stderr` - when set to true, exile connects stderr stream for the consumption. Defaults to false. Note that when set to true stderr must be consumed to avoid external program from blocking
+
+ * `env` - a list of tuples containing environment key-value.
+ These can be accessed in the external program
+
+ * `enable_stderr` - when set to true, Exile connects stderr
+ pipe for the consumption. Defaults to false. Note that when set
+ to true stderr must be consumed to avoid external program from blocking.
+
+ Caller of the process will be the owner owner of the Exile Process.
+ And default owner of all opened pipes.
+
+ Please check module documentation for more details
"""
- @type process :: pid
@spec start_link(nonempty_list(String.t()),
cd: String.t(),
env: [{String.t(), String.t()}],
- use_stderr: boolean()
- ) :: {:ok, process} | {:error, any()}
+ enable_stderr: boolean()
+ ) :: {:ok, t} | {:error, any()}
def start_link(cmd_with_args, opts \\ []) do
opts = Keyword.merge(@default_opts, opts)
- with {:ok, args} <- normalize_args(cmd_with_args, opts) do
- GenServer.start(__MODULE__, args)
+ case Exec.normalize_exec_args(cmd_with_args, opts) do