Page MenuHomePhorge

No OneTemporary

Size
21 KB
Referenced Files
None
Subscribers
None
diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index 938a33a..a0a2b81 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -1,31 +1,32 @@
image: elixir:1.7
variables:
MIX_ENV: test
+ GIT_SUBMODULE_STRATEGY: recursive
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- deps
- _build
stages:
- test
- publish
before_script:
- mix local.hex --force
- mix local.rebar --force
- mix deps.get
- mix compile --force
lint:
stage: test
script:
- mix format --check-formatted
unit-testing:
stage: test
coverage: '/(\d+\.\d+\%) \| Total/'
script:
- mix test --trace --preload-modules --cover
diff --git a/lib/myhtmlex.ex b/lib/myhtmlex.ex
index f5e76e7..d612716 100644
--- a/lib/myhtmlex.ex
+++ b/lib/myhtmlex.ex
@@ -1,176 +1,176 @@
defmodule Myhtmlex do
@moduledoc """
A module to decode html into a tree structure.
Based on [Alexander Borisov's myhtml](https://github.com/lexborisov/myhtml),
this binding gains the properties of being html-spec compliant and very fast.
## Example
iex> Myhtmlex.decode("<h1>Hello world</h1>")
{"html", [], [{"head", [], []}, {"body", [], [{"h1", [], ["Hello world"]}]}]}
Benchmark results (Nif calling mode) on various file sizes on a 2,5Ghz Core i7:
Settings:
duration: 1.0 s
## FileSizesBench
[15:28:42] 1/3: github_trending_js.html 341k
[15:28:46] 2/3: w3c_html5.html 131k
[15:28:48] 3/3: wikipedia_hyperlink.html 97k
Finished in 7.52 seconds
## FileSizesBench
benchmark name iterations average time
wikipedia_hyperlink.html 97k 1000 1385.86 µs/op
w3c_html5.html 131k 1000 2179.30 µs/op
github_trending_js.html 341k 500 5686.21 µs/op
## Configuration
The module you are calling into is always `Myhtmlex` and depending on your application configuration,
it chooses between the underlying implementations `Myhtmlex.Safe` (default) and `Myhtmlex.Nif`.
Erlang interoperability is a tricky mine-field.
You can call into C directly using native implemented functions (Nif). But this comes with the risk,
that if anything goes wrong within the C implementation, your whole VM will crash.
No more supervisor cushions for here on, just violent crashes.
That is why the default mode of operation keeps your VM safe and happy.
If you need ultimate parsing speed, or you can simply tolerate VM-level crashes, read on.
### Call into C-Node (default)
This is the default mode of operation.
If your application cannot tolerate VM-level crashes, this option allows you to gain the best of both worlds.
The added overhead is client/server communications, and a worker OS-process that runs next to your VM under VM supervision.
You do not have to do anything to start the worker process, everything is taken care of within the library.
If you are not running in distributed mode, your VM will automatically be assigned a `sname`.
The worker OS-process stays alive as long as it is under VM-supervision. If your VM goes down, the OS-process will die by itself.
If the worker OS-process dies for some reason, your VM stays unaffected and will attempt to restart it seamlessly.
### Call into Nif
If your application is aiming for ultimate parsing speed, and in the worst case can tolerate VM-level crashes, you can call directly into the Nif.
1. Require myhtmlex without runtime
in your `mix.exs`
def deps do
[
{:myhtmlex, ">= 0.0.0", runtime: false}
]
end
2. Configure the mode to `Myhtmlex.Nif`
e.g. in `config/config.exs`
config :myhtmlex, mode: Myhtmlex.Nif
3. Bonus: You can [open up in-memory references to parsed trees](https://hexdocs.pm/myhtmlex/Myhtmlex.html#open/1), without parsing + mapping erlang terms in one go
"""
- @type tag() :: String.t | atom()
- @type attr() :: {String.t, String.t}
+ @type tag() :: String.t() | atom()
+ @type attr() :: {String.t(), String.t()}
@type attr_list() :: [] | [attr()]
- @type comment_node() :: {:comment, String.t}
- @type comment_node3() :: {:comment, [], String.t}
- @type tree() :: {tag(), attr_list(), tree()}
- | {tag(), attr_list(), nil}
- | comment_node()
- | comment_node3()
+ @type comment_node() :: {:comment, String.t()}
+ @type comment_node3() :: {:comment, [], String.t()}
+ @type tree() ::
+ {tag(), attr_list(), tree()}
+ | {tag(), attr_list(), nil}
+ | comment_node()
+ | comment_node3()
@type format_flag() :: :html_atoms | :nil_self_closing | :comment_tuple3
defp module() do
Application.get_env(:myhtmlex, :mode, Myhtmlex.Nif)
end
@doc """
Returns a tree representation from the given html string.
## Examples
iex> Myhtmlex.decode("<h1>Hello world</h1>")
{"html", [], [{"head", [], []}, {"body", [], [{"h1", [], ["Hello world"]}]}]}
iex> Myhtmlex.decode("<span class='hello'>Hi there</span>")
{"html", [],
[{"head", [], []},
{"body", [], [{"span", [{"class", "hello"}], ["Hi there"]}]}]}
iex> Myhtmlex.decode("<body><!-- a comment --!></body>")
{"html", [], [{"head", [], []}, {"body", [], [comment: " a comment "]}]}
iex> Myhtmlex.decode("<br>")
{"html", [], [{"head", [], []}, {"body", [], [{"br", [], []}]}]}
"""
- @spec decode(String.t) :: tree()
+ @spec decode(String.t()) :: tree()
def decode(bin) do
decode(bin, format: [])
end
@doc """
Returns a tree representation from the given html string.
This variant allows you to pass in one or more of the following format flags:
* `:html_atoms` uses atoms for known html tags (faster), binaries for everything else.
* `:nil_self_closing` uses `nil` to designate self-closing tags and void elements.
For example `<br>` is then being represented like `{"br", [], nil}`.
See http://w3c.github.io/html-reference/syntax.html#void-elements for a full list of void elements.
* `:comment_tuple3` uses 3-tuple elements for comments, instead of the default 2-tuple element.
## Examples
iex> Myhtmlex.decode("<h1>Hello world</h1>", format: [:html_atoms])
{:html, [], [{:head, [], []}, {:body, [], [{:h1, [], ["Hello world"]}]}]}
iex> Myhtmlex.decode("<br>", format: [:nil_self_closing])
{"html", [], [{"head", [], []}, {"body", [], [{"br", [], nil}]}]}
iex> Myhtmlex.decode("<body><!-- a comment --!></body>", format: [:comment_tuple3])
{"html", [], [{"head", [], []}, {"body", [], [{:comment, [], " a comment "}]}]}
iex> html = "<body><!-- a comment --!><unknown /></body>"
iex> Myhtmlex.decode(html, format: [:html_atoms, :nil_self_closing, :comment_tuple3])
{:html, [],
[{:head, [], []},
{:body, [], [{:comment, [], " a comment "}, {"unknown", [], nil}]}]}
"""
- @spec decode(String.t, format: [format_flag()]) :: tree()
+ @spec decode(String.t(), format: [format_flag()]) :: tree()
def decode(bin, format: flags) do
module().decode(bin, flags)
end
@doc """
Returns a reference to an internally parsed myhtml_tree_t. (Nif only!)
"""
- @spec open(String.t) :: reference()
+ @spec open(String.t()) :: reference()
def open(bin) do
Myhtmlex.Nif.open(bin)
end
@doc """
Returns a tree representation from the given reference. See `decode/1` for example output. (Nif only!)
"""
@spec decode_tree(reference()) :: tree()
def decode_tree(ref) do
Myhtmlex.Nif.decode_tree(ref)
end
@doc """
Returns a tree representation from the given reference. See `decode/2` for options and example output. (Nif only!)
"""
@spec decode_tree(reference(), format: [format_flag()]) :: tree()
def decode_tree(ref, format: flags) do
Myhtmlex.Nif.decode_tree(ref, flags)
end
end
-
diff --git a/lib/myhtmlex/nif.ex b/lib/myhtmlex/nif.ex
index da88abd..f698026 100644
--- a/lib/myhtmlex/nif.ex
+++ b/lib/myhtmlex/nif.ex
@@ -1,27 +1,26 @@
defmodule Myhtmlex.Nif do
@moduledoc false
- @on_load { :init, 0 }
+ @on_load {:init, 0}
- app = Mix.Project.config[:app]
+ app = Mix.Project.config()[:app]
def init do
path = :filename.join(:code.priv_dir(unquote(app)), 'myhtmlex')
:ok = :erlang.load_nif(path, 0)
end
def decode(bin)
def decode(_), do: exit(:nif_library_not_loaded)
def decode(bin, flags)
def decode(_, _), do: exit(:nif_library_not_loaded)
def open(bin)
def open(_), do: exit(:nif_library_not_loaded)
def decode_tree(tree)
def decode_tree(_), do: exit(:nif_library_not_loaded)
def decode_tree(tree, flags)
def decode_tree(_, _), do: exit(:nif_library_not_loaded)
end
-
diff --git a/lib/myhtmlex/safe.ex b/lib/myhtmlex/safe.ex
index 062aeb4..98b98d5 100644
--- a/lib/myhtmlex/safe.ex
+++ b/lib/myhtmlex/safe.ex
@@ -1,35 +1,39 @@
defmodule Myhtmlex.Safe do
@moduledoc false
use Application
- app = Mix.Project.config[:app]
-
+ app = Mix.Project.config()[:app]
defp random_sname, do: :crypto.strong_rand_bytes(4) |> Base.encode16(case: :lower)
defp sname, do: :"myhtmlex_#{random_sname()}"
def start(_type, _args) do
import Supervisor.Spec
- unless Node.alive? do
- Nodex.Distributed.up
+
+ unless Node.alive?() do
+ Nodex.Distributed.up()
end
+
myhtml_worker = Path.join(:code.priv_dir(unquote(app)), "myhtml_worker")
+
children = [
- worker(Nodex.Cnode, [%{exec_path: myhtml_worker, sname: sname()}, [name: Myhtmlex.Safe.Cnode]])
+ worker(Nodex.Cnode, [
+ %{exec_path: myhtml_worker, sname: sname()},
+ [name: Myhtmlex.Safe.Cnode]
+ ])
]
+
Supervisor.start_link(children, strategy: :one_for_one, name: Myhtmlex.Safe.Supervisor)
end
def decode(bin) do
decode(bin, [])
end
def decode(bin, flags) do
{:ok, res} = Nodex.Cnode.call(Myhtmlex.Safe.Cnode, {:decode, bin, flags})
res
end
-
end
-
diff --git a/mix.exs b/mix.exs
index 11e8025..cbd2595 100644
--- a/mix.exs
+++ b/mix.exs
@@ -1,106 +1,110 @@
defmodule Myhtmlex.Mixfile do
use Mix.Project
def project do
[
app: :myhtmlex,
version: "0.2.1",
elixir: "~> 1.5",
deps: deps(),
package: package(),
- compilers: [:myhtmlex_make] ++ Mix.compilers,
- build_embedded: Mix.env == :prod,
- start_permanent: Mix.env == :prod,
+ compilers: [:myhtmlex_make] ++ Mix.compilers(),
+ build_embedded: Mix.env() == :prod,
+ start_permanent: Mix.env() == :prod,
name: "Myhtmlex",
description: """
A module to decode HTML into a tree,
porting all properties of the underlying
library myhtml, being fast and correct
in regards to the html spec.
""",
docs: docs()
]
end
def package do
[
maintainers: ["Lukas Rieder"],
licenses: ["GNU LGPL"],
links: %{
"Github" => "https://github.com/Overbryd/myhtmlex",
"Issues" => "https://github.com/Overbryd/myhtmlex/issues",
"MyHTML" => "https://github.com/lexborisov/myhtml"
},
files: [
"lib",
"c_src",
"priv/.gitignore",
"test",
"Makefile",
"Makefile.Darwin",
"Makefile.Linux",
"mix.exs",
"README.md",
"LICENSE"
]
]
end
def application do
[
extra_applications: [:logger],
mod: {Myhtmlex.Safe, []},
# used to detect conflicts with other applications named processes
registered: [Myhtmlex.Safe.Cnode, Myhtmlex.Safe.Supervisor],
env: [
mode: Myhtmlex.Safe
]
]
end
defp deps do
[
# documentation helpers
{:ex_doc, ">= 0.0.0", only: :dev},
# benchmarking helpers
{:benchfella, "~> 0.3.0", only: :dev},
# cnode helpers
- {:nodex, git: "https://github.com/rinpatch/nodex", ref: "12ca7a2c5b5791f1e847d73ed646cf006d4c8ca8"}
+ {:nodex,
+ git: "https://github.com/rinpatch/nodex", ref: "12ca7a2c5b5791f1e847d73ed646cf006d4c8ca8"}
]
end
defp docs do
[
main: "Myhtmlex"
]
end
end
defmodule Mix.Tasks.Compile.MyhtmlexMake do
@artifacts [
"priv/myhtmlex.so",
"priv/myhtml_worker"
]
def run(_) do
- if match? {:win32, _}, :os.type do
- IO.warn "Windows is not yet a target."
+ if match?({:win32, _}, :os.type()) do
+ IO.warn("Windows is not yet a target.")
exit(1)
else
- {result, _error_code} = System.cmd("make",
- @artifacts,
- stderr_to_stdout: true,
- env: [{"MIX_ENV", to_string(Mix.env)}]
- )
- IO.binwrite result
+ {result, _error_code} =
+ System.cmd(
+ "make",
+ @artifacts,
+ stderr_to_stdout: true,
+ env: [{"MIX_ENV", to_string(Mix.env())}]
+ )
+
+ IO.binwrite(result)
end
+
:ok
end
def clean() do
{result, _error_code} = System.cmd("make", ["clean"], stderr_to_stdout: true)
- Mix.shell.info result
+ Mix.shell().info(result)
:ok
end
end
-
diff --git a/test/myhtmlex.nif_test.exs b/test/myhtmlex.nif_test.exs
index 5c77a0e..c93126d 100644
--- a/test/myhtmlex.nif_test.exs
+++ b/test/myhtmlex.nif_test.exs
@@ -1,26 +1,29 @@
defmodule Myhtmlex.NifTest do
use MyhtmlexSharedTests, module: Myhtmlex.Nif
test "parse a larger file (131K)" do
html = File.read!("bench/github_trending_js.html")
ref = Myhtmlex.open(html)
assert is_reference(ref)
assert is_tuple(Myhtmlex.decode_tree(ref))
end
test "open" do
ref = Myhtmlex.open(~s'<dif class="a"></div><div class="b"></div>')
assert is_reference(ref)
end
test "open and decode_tree" do
ref = Myhtmlex.open(~s'text node')
assert is_reference(ref)
- assert {:html, [], [
- {:head, [], []},
- {:body, [], [
- "text node"
- ]}
- ]} = Myhtmlex.decode_tree(ref, format: [:html_atoms])
+
+ assert {:html, [],
+ [
+ {:head, [], []},
+ {:body, [],
+ [
+ "text node"
+ ]}
+ ]} = Myhtmlex.decode_tree(ref, format: [:html_atoms])
end
end
diff --git a/test/myhtmlex.safe_test.exs b/test/myhtmlex.safe_test.exs
index 0b60e53..0054815 100644
--- a/test/myhtmlex.safe_test.exs
+++ b/test/myhtmlex.safe_test.exs
@@ -1,8 +1,7 @@
defmodule Myhtmlex.SafeTest do
use MyhtmlexSharedTests, module: Myhtmlex.Safe
test "doesn't segfault when <!----> is encountered" do
assert {"html", _attrs, _children} = Myhtmlex.decode("<div> <!----> </div>")
end
end
-
diff --git a/test/myhtmlex_shared_tests.ex b/test/myhtmlex_shared_tests.ex
index 83b83f2..7ba8d79 100644
--- a/test/myhtmlex_shared_tests.ex
+++ b/test/myhtmlex_shared_tests.ex
@@ -1,119 +1,152 @@
defmodule MyhtmlexSharedTests do
defmacro __using__(opts) do
module = Keyword.fetch!(opts, :module)
+
quote do
use ExUnit.Case
doctest Myhtmlex
setup_all(_) do
Application.put_env(:myhtmlex, :mode, unquote(module))
:ok
end
test "builds a tree, formatted like mochiweb by default" do
- assert {"html", [], [
- {"head", [], []},
- {"body", [], [
- {"br", [], []}
- ]}
- ]} = Myhtmlex.decode("<br>")
+ assert {"html", [],
+ [
+ {"head", [], []},
+ {"body", [],
+ [
+ {"br", [], []}
+ ]}
+ ]} = Myhtmlex.decode("<br>")
end
test "builds a tree, html tags as atoms" do
- assert {:html, [], [
- {:head, [], []},
- {:body, [], [
- {:br, [], []}
- ]}
- ]} = Myhtmlex.decode("<br>", format: [:html_atoms])
+ assert {:html, [],
+ [
+ {:head, [], []},
+ {:body, [],
+ [
+ {:br, [], []}
+ ]}
+ ]} = Myhtmlex.decode("<br>", format: [:html_atoms])
end
test "builds a tree, nil self closing" do
- assert {"html", [], [
- {"head", [], []},
- {"body", [], [
- {"br", [], nil},
- {"esi:include", [], nil}
- ]}
- ]} = Myhtmlex.decode("<br><esi:include />", format: [:nil_self_closing])
+ assert {"html", [],
+ [
+ {"head", [], []},
+ {"body", [],
+ [
+ {"br", [], nil},
+ {"esi:include", [], nil}
+ ]}
+ ]} = Myhtmlex.decode("<br><esi:include />", format: [:nil_self_closing])
end
test "builds a tree, multiple format options" do
- assert {:html, [], [
- {:head, [], []},
- {:body, [], [
- {:br, [], nil}
- ]}
- ]} = Myhtmlex.decode("<br>", format: [:html_atoms, :nil_self_closing])
+ assert {:html, [],
+ [
+ {:head, [], []},
+ {:body, [],
+ [
+ {:br, [], nil}
+ ]}
+ ]} = Myhtmlex.decode("<br>", format: [:html_atoms, :nil_self_closing])
end
test "attributes" do
- assert {:html, [], [
- {:head, [], []},
- {:body, [], [
- {:span, [{"id", "test"}, {"class", "foo garble"}], []}
- ]}
- ]} = Myhtmlex.decode(~s'<span id="test" class="foo garble"></span>', format: [:html_atoms])
+ assert {:html, [],
+ [
+ {:head, [], []},
+ {:body, [],
+ [
+ {:span, [{"id", "test"}, {"class", "foo garble"}], []}
+ ]}
+ ]} =
+ Myhtmlex.decode(~s'<span id="test" class="foo garble"></span>',
+ format: [:html_atoms]
+ )
end
test "single attributes" do
- assert {:html, [], [
- {:head, [], []},
- {:body, [], [
- {:button, [{"disabled", "disabled"}, {"class", "foo garble"}], []}
- ]}
- ]} = Myhtmlex.decode(~s'<button disabled class="foo garble"></span>', format: [:html_atoms])
+ assert {:html, [],
+ [
+ {:head, [], []},
+ {:body, [],
+ [
+ {:button, [{"disabled", "disabled"}, {"class", "foo garble"}], []}
+ ]}
+ ]} =
+ Myhtmlex.decode(~s'<button disabled class="foo garble"></span>',
+ format: [:html_atoms]
+ )
end
test "text nodes" do
- assert {:html, [], [
- {:head, [], []},
- {:body, [], [
- "text node"
- ]}
- ]} = Myhtmlex.decode(~s'<body>text node</body>', format: [:html_atoms])
+ assert {:html, [],
+ [
+ {:head, [], []},
+ {:body, [],
+ [
+ "text node"
+ ]}
+ ]} = Myhtmlex.decode(~s'<body>text node</body>', format: [:html_atoms])
end
test "broken input" do
- assert {:html, [], [
- {:head, [], []},
- {:body, [], [
- {:a, [{"<", "<"}], [" asdf"]}
- ]}
- ]} = Myhtmlex.decode(~s'<a <> asdf', format: [:html_atoms])
+ assert {:html, [],
+ [
+ {:head, [], []},
+ {:body, [],
+ [
+ {:a, [{"<", "<"}], [" asdf"]}
+ ]}
+ ]} = Myhtmlex.decode(~s'<a <> asdf', format: [:html_atoms])
end
test "namespaced tags" do
- assert {:html, [], [
- {:head, [], []},
- {:body, [], [
- {"svg:svg", [], [
- {"svg:path", [], []},
- {"svg:a", [], []}
- ]}
- ]}
- ]} = Myhtmlex.decode(~s'<svg><path></path><a></a></svg>', format: [:html_atoms])
+ assert {:html, [],
+ [
+ {:head, [], []},
+ {:body, [],
+ [
+ {"svg:svg", [],
+ [
+ {"svg:path", [], []},
+ {"svg:a", [], []}
+ ]}
+ ]}
+ ]} = Myhtmlex.decode(~s'<svg><path></path><a></a></svg>', format: [:html_atoms])
end
test "custom namespaced tags" do
- assert {:html, [], [
- {:head, [], []},
- {:body, [], [
- {"esi:include", [], nil}
- ]}
- ]} = Myhtmlex.decode(~s'<esi:include />', format: [:html_atoms, :nil_self_closing])
+ assert {:html, [],
+ [
+ {:head, [], []},
+ {:body, [],
+ [
+ {"esi:include", [], nil}
+ ]}
+ ]} =
+ Myhtmlex.decode(~s'<esi:include />', format: [:html_atoms, :nil_self_closing])
end
test "html comments" do
- assert {:html, [], [
- {:head, [], []},
- {:body, [], [
- comment: " a comment "
- ]}
- ]} = Myhtmlex.decode(~s'<body><!-- a comment --></body>', format: [:html_atoms])
+ assert {:html, [],
+ [
+ {:head, [], []},
+ {:body, [],
+ [
+ comment: " a comment "
+ ]}
+ ]} = Myhtmlex.decode(~s'<body><!-- a comment --></body>', format: [:html_atoms])
end
- end # quote
- end # defmacro __using__
+ end
-end
+ # quote
+ end
+ # defmacro __using__
+end

File Metadata

Mime Type
text/x-diff
Expires
Thu, Nov 28, 11:44 PM (1 d, 12 h)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
41116
Default Alt Text
(21 KB)

Event Timeline