Page MenuHomePhorge

No OneTemporary

Size
4 KB
Referenced Files
None
Subscribers
None
diff --git a/Makefile b/Makefile
index 52e4991..1977293 100644
--- a/Makefile
+++ b/Makefile
@@ -1,48 +1,54 @@
# SPDX-FileCopyrightText: 2017-2019 myhtmlex authors <https://github.com/Overbryd/myhtmlex>
# SPDX-FileCopyrightText: 2019-2022 Pleroma Authors <https://pleroma.social>
# SPDX-License-Identifier: LGPL-2.1-only
MIX = mix
CMAKE = cmake
CFLAGS ?= -g -O2 -pedantic -Wcomment -Wextra -Wno-old-style-declaration -Wall
# set erlang include path
ERLANG_PATH = $(shell erl -eval 'io:format("~s", [lists:concat([code:root_dir(), "/erts-", erlang:system_info(version)])])' -s init stop -noshell)
CNODE_CFLAGS += -I$(ERLANG_PATH)/include
# expecting lexbor as a submodule in c_src/
# that way we can pin a version and package the whole thing in hex
-# hex does not allow for non-app related dependencies.
LXB_PATH = c_src/lexbor
LXB_AR = $(LXB_PATH)/liblexbor_static.a
-LXB_CFLAGS = -I$(LXB_PATH)/source
-LXB_LDFLAGS = $(LXB_AR)
+ifeq ($(WITH_SYSTEM_LEXBOR),1)
+ LXB_CFLAGS =
+ LXB_LDFLAGS = -llexbor
+ LXB_DEPS =
+else
+ LXB_CFLAGS = -I$(LXB_PATH)/source
+ LXB_LDFLAGS = $(LXB_AR)
+ LXB_DEPS = $(LXB_AR)
+endif
# C-Node
ERL_INTERFACE = $(wildcard $(ERLANG_PATH)/../lib/erl_interface-*)
CNODE_CFLAGS += -I$(ERL_INTERFACE)/include
CNODE_LDFLAGS += -L$(ERL_INTERFACE)/lib -lei -lpthread
.PHONY: all
all: priv/fasthtml_worker
$(LXB_AR): $(LXB_PATH)
# Sadly, build components separately seems to sporadically fail
cd $(LXB_PATH); \
CFLAGS='$(CFLAGS)' \
cmake -DLEXBOR_BUILD_SEPARATELY=OFF -DLEXBOR_BUILD_SHARED=OFF
$(MAKE) -C $(LXB_PATH)
-priv/fasthtml_worker: c_src/fasthtml_worker.c $(LXB_AR)
+priv/fasthtml_worker: c_src/fasthtml_worker.c $(LXB_DEPS)
mkdir -p priv
$(CC) -std=c99 $(CFLAGS) $(CNODE_CFLAGS) $(LXB_CFLAGS) -o $@ $< $(LDFLAGS) $(CNODE_LDFLAGS) $(LXB_LDFLAGS)
clean: clean-myhtml
$(RM) -r priv/myhtmlex*
$(RM) priv/fasthtml_worker
$(RM) myhtmlex-*.tar
$(RM) -r package-test
clean-myhtml:
$(MAKE) -C $(LXB_PATH) clean
diff --git a/README.md b/README.md
index b603eae..572c826 100644
--- a/README.md
+++ b/README.md
@@ -1,34 +1,42 @@
<!--
SPDX-FileCopyrightText: 2017-2019 myhtmlex authors <https://github.com/Overbryd/myhtmlex>
SPDX-FileCopyrightText: 2019-2022 Pleroma Authors <https://pleroma.social>
SPDX-License-Identifier: LGPL-2.1-only
-->
# FastHTML
A C Node wrapping [lexbor](https://github.com/lexbor/lexbor).
Primarily used with [FastSanitize](https://git.pleroma.social/pleroma/fast_sanitize).
* Available as a hex package: `{:fast_html, "~> 2.0"}`
* [Documentation](https://hexdocs.pm/fast_html/fast_html.html)
+## Compiling
+- GNU Make
+- C Compiler
+- Erlang 22.0+ with development headers
+- (optional) [lexbor](https://github.com/lexbor/lexbor) 2.2.0+
+
+If you want to use a system installation of lexbor, you can set `WITH_SYSTEM_LEXBOR=1` during compilation time. By default it will used the vendored version present at `c_src/lexbor`.
+
## Benchmarks
The following table provides median times it takes to decode a string to a tree for html parsers that can be used from Elixir. Benchmarks were conducted on a machine with an `AMD Ryzen 9 3950X (32) @ 3.500GHz` CPU and 32GB of RAM. The `mix fast_html.bench` task can be used for running the benchmark by yourself.
| File/Parser | fast_html (Port) | mochiweb_html (erlang) | html5ever (Rust NIF) | Myhtmlex (NIF)¹ |
|----------------------|--------------------|------------------------|----------------------|----------------|
| document-large.html (6.9M) | 125.12 ms | 1778.34 ms | 395.21 ms | 327.17 ms |
| document-small.html (25K)| 0.50 ms | 2.76 ms | 1.72 ms | 1.19 ms |
| fragment-large.html (33K)| 0.93 ms | 4.78 ms | 2.34 ms | 2.15 ms |
| fragment-small.html² (757B)| 44.60 μs | 42.13 μs | 43.58 μs | 289.71 μs |
Full benchmark output can be seen in [this snippet](https://git.pleroma.social/pleroma/elixir-libraries/fast_html/snippets/3128)
1. Myhtmlex has a C-Node mode, but it wasn't benchmarked here because it segfaults on `document-large.html`
2. The slowdown on `fragment-small.html` is due to Port overhead. Unlike html5ever and Myhtmlex in NIF mode, `fast_html` has the parser process isolated and communicates with it over stdio, so even if a fatal crash in the parser happens, it won't bring down the entire VM.
## Contribution / Bug Reports
* Please make sure you do `git submodule update` after a checkout/pull
* The project aims to be fully tested

File Metadata

Mime Type
text/x-diff
Expires
Wed, Nov 27, 12:44 AM (1 d, 12 h)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
40517
Default Alt Text
(4 KB)

Event Timeline