Skip to content

Rewrite the lexer and parser in human_encoding#354

Open
ivanlele wants to merge 1 commit intoBlockstreamResearch:masterfrom
ivanlele:refactor/logos-based-parser
Open

Rewrite the lexer and parser in human_encoding#354
ivanlele wants to merge 1 commit intoBlockstreamResearch:masterfrom
ivanlele:refactor/logos-based-parser

Conversation

@ivanlele
Copy link
Copy Markdown
Contributor

This PR replaces the toolkit used in the human_encoding module with a logos lexer and a custom parser.

@ivanlele
Copy link
Copy Markdown
Contributor Author

ivanlele commented Mar 20, 2026

We used CI to scan one of our projects for license compliance, and found that a dependency of rust-simplicity has an incompatible license with the CC0 license used by rust-simplicity:
https://github.com/kamadorueda/santiago/blob/main/Cargo.toml.license

@ivanlele
Copy link
Copy Markdown
Contributor Author

@apoelstra, In src/human_parsing/README.md, I saw that we already planned to move away from this toolkit, so it shouldn't be a major issue? License compliance might currently block few of our releases

Copy link
Copy Markdown
Contributor

@delta1 delta1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 10b9730; ran tests locally

@apoelstra
Copy link
Copy Markdown
Collaborator

@ivanlele this is not a derivative work and we are not shipping any part of our dependencies without their license. We are allowed to use GPL 3 dependencies in a CC0 project.

Is there any other motivation for this 1500-line diff that adds new dependencies and which touches code that, as you say, we don't intend to use or expand?

@ivanlele
Copy link
Copy Markdown
Contributor Author

@apoelstra I’m not arguing that we are allowed to use GPL3 dependencies in our CC0 project. The concern is more about potential inconvenience for a user who wants to keep their code private and unknowingly uses the human_encoding module, in which case the GPL3 license would apply as it would be included in their distributing binary. human_encoding is part of the default public API, which can fasilitate such cases.

Perhaps it would make sense to put the human_encoding module behind a feature flag and add a warning to prevent accidental usage, or change it inner work, as this PR does.

as you say, we don't intend to use or expand?

I didn’t say that. human_encoding is useful for tests and potentially for prototyping combinators. What I was referring to in the README is the intention to eventually switch the underlying toolkit, and not intent to to discard or archive it.

@ivanlele
Copy link
Copy Markdown
Contributor Author

the intention to eventually switch the underlying toolkit

Which was motivation for creating this PR

@apoelstra
Copy link
Copy Markdown
Collaborator

Perhaps it would make sense to put the human_encoding module behind a feature flag and add a warning to prevent accidental usage, or change it inner work, as this PR does.

I wouldn't mind feature-gating it. We should not put a 'GPL3" warning on it though. People who want to be antisocial can pay their own lawyers to do the technical work.

I also wouldn't mind going in another direction and just relicensing this whole project as GPLv3.

@apoelstra
Copy link
Copy Markdown
Collaborator

concept ack 10b9730. I read through the code and other than the lockfile noise I'm happy with this:

  • logos is pretty low-dep (nearly as low-dependency as santiago) and no worse as far as macro dependence; the logos code is at least readable since it's made out of attributes on enum attributes, rather than making us implement the parser inside a giant macro which is throwing closures all over the place
  • ...relatedly, this hand-rolled parser is much nicer than the santiago one and (probably) has better error messages
  • it does not increase our MSRV at all

I would also like to feature-gate this whole module since it's kinda niche and involves a whole extra dependency. We can do that in a followup if you want.

@ivanlele ivanlele force-pushed the refactor/logos-based-parser branch from 10b9730 to 56a2e06 Compare March 26, 2026 23:24
@ivanlele
Copy link
Copy Markdown
Contributor Author

I would also like to feature-gate this whole module since it's kinda niche and involves a whole extra dependency. We can do that in a followup if you want.

Sure, I'll roll this into the next PR. human_encoding should be an appropriate name for this feature, I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants