Ren: a lightweight data-exchange text format

greggirwin · on Oct 20, 2017

Author of the Ren page being discussed here. Didn't expect to see it on HN. :^)

The Ren site (ren-data.org) redirects there, is quite old now, and was a playground and experimental area. Hence the empty links and such.

In addition to Bolek's Humanistic repo, I had set up https://github.com/Ren-data/Ren to discuss ideas. Ren is, effectively, Redbol (Red+Rebol). One of the initial goals was to define a subset of values and normalize the syntax (Rebol never formalized its format spec), which could be shared across Redbol langs as they evolved and went in different directions. And also as a bridge for loaders in other languages. JSON has taught us that a small spec is important. The balance between simplicity and expressive value types is key.

It may come back to life at some point, but my time was better spent elsewhere for a while. I'm focused on Red now (red-lang.org). It has a native bridge feature for embedding in other langs (https://doc.red-lang.org/en/libred.html), along with a lot more. I believe there is still value in formalizing the grammar so others can create their own implementations, but it's not a priority at this time. In the meantime, you can find the active Red community at https://gitter.im/red/red to get more information and examples of what it looks like in use.

Cheers.

pdfernhout · on Oct 21, 2017

For base types, rather than a set of specific data types which is limited, how about having a standard way to indicate a type and data, like "number:12" and "rational:12/17" and "string:foo" and "date:2017-10-20" and "bitmap-hex:cafebabea2b9c0..."? Then let the reader parse it however they want given their desired interpretation of the type tag.

There could be a base set of standard types and ideally an organizational process to add more standard types -- like mimetypes. Mimetypes also might be considered included by default perhaps like "application/json; charset=utf-8:[1, 2, 3]").

It is more characters to include a type for each primitive, but it is more expandable. Likely these files will mostly be generated and read by code anyway, with humans just looking at them now and then for debugging.

If that string version seems too cluttery, another option is something like: rational:12/17 and string:foo and string:"this has spaces".

Or given possible confusion with maps and colons, another option is rational/12/17 and string/foo and string/"this has spaces" and complex/−1+3i and real:30.564 and "application/json; charset=utf-8"/[1,2,3] and maybe even xml/<foo>bar\ baz</foo> and javascript/console.log("hello") and so on.

greggirwin · on Oct 21, 2017

That would be a very different format. A big part of the format is that the lexical forms allow you to write things as you normally would, to another human. That does impose limits, but is part of the fundamental design.

iainmerrick · on Oct 20, 2017

The very first entry in the FAQ ("Why is none used instead of null?") isn't very convincing! It says null "it isn't friendly to normal people who might be given configuration or message files to edit," and gives the example:

  Children: none
  Opinions: none

That certainly looks pleasant and human-readable, but a bit of a nightmare to interpret! Couldn't it just as well be "Children: 0" or "Children: []"? If the idea is to let non-technical people edit configuration files, the code reading the file will have to be very flexible and forgiving.

(Edit to add: maybe that's something you mostly get for free in REBOL? It would be a major headache in most other languages though)

rebolek · on Oct 20, 2017

Yup, the right reason is, that `none` is free in Rebol/Red. Ren is basically nothing else than Rebol/Red syntax under different name. I was of the designers of Ren, it’s been few years, but it’s cool to see it on HN.

rgchris · on Oct 20, 2017

Note that Ren-C (a fork of Rebol that is only tangentially related to Ren) uses `blank` or literal `_`.

The language concept is a bit more fundamental than config files. Rebol (and Ren-C and cousin Red) is a fully homoiconic language that itself has tools for parsing and interpreting DSLs ('Dialects' to use the language's nomenclature) that use the same language rules. In essence you do indeed get the lexer for free.

Would recommend a wee read of: http://blog.hostilefork.com/why-rebol-red-parse-cool/

I've argued that Ren is somewhat redundant as it fills the same space that Rebol does though Rebol lacks a formal specification at this time.

greggirwin · on Oct 20, 2017

Which Rebol fills this space, R2 or R3? The idea was also not loadable Redbol compatibility (which would be nice of course), but general utility. So Ren adds new types and changes others, where it thinks that is best.

hood_syntax · on Oct 20, 2017

Valid points about the ambiguity in that example, but I do believe 'none' is easier to parse than 'null' for the average person. I'm assuming that none is meant to represent, like null "no input was given" not "an input of zero/an empty list was given".

mh-cx · on Oct 20, 2017

But in that case, wouldn't “unknown“ be even better? I'm not a native english speaker, though, so I may miss some fine nuances in the meaning of “none“.

hood_syntax · on Oct 20, 2017

Disclaimer: this is just my personal views, so I may be missing something here.

'unknown' is almost like saying "We don't know what should be here", while 'none' is closer to saying "We didn't get passed any information here".

Ultimately, I feel like 'unknown' is more specific, it conveys intent beyond that of 'none'. Such as, it could imply we actually don't know what kind of data 'Opinions' should hold, rather than that there merely happens to be no data to put there.

greggirwin · on Oct 20, 2017

We can throw out options for a long time, as there is no perfect choice. Ren went with `none` as it is standard Rebol and Rebol's designer thought about it long and hard.

jpfed · on Oct 20, 2017

As a native English speaker, I agree that "unknown" is better than "none".

always_good · on Oct 20, 2017

I would hope that a native English speaker knows the difference between

Cookies in the jar: none

and

Cookies in the jar: unknown

greggirwin · on Oct 20, 2017

Unknown is a nice word for this, yes. And while we don't focus exclusively on being terse in the Redbol world, "unknown" is a lot more work to type than "none", harder to scan visually as well.

vorotato · on Oct 20, 2017

also "Some | None" is a common functional programming idiom.

amyjess · on Oct 20, 2017

It's worth mentioning that Python refers to null as None.

greggirwin · on Oct 20, 2017

Noted. Thanks.

21 · on Oct 20, 2017

I'm annoyed when a text format doesn't support comments. Especially a Human one :)

Sometimes you want to comment a section out of a JSON without deleting it. Other times you want to annotate some generated JSON.

And because this is a common need, you have ad-hoc unofficial solutions which are not supported by all parsers.

taeric · on Oct 21, 2017

In the book Programming Pearls, one of the sections is on "provenance". One of the tips shared was someone that put at the header of every generated file, the command that was used to generate it. I absolutely love the idea and have wanted to do it in places I generate files before. JSON completely kills this, though. I don't want to encode the bloody command, just put a little comment "this file generated on DATE by COMMAND". Grr...

alecthomas · on Oct 21, 2017

Ironically, that is exactly why JSON doesn't have comments: https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...

taeric · on Oct 21, 2017

Strictly, that is different. He was against parser directives. And offered a solution for the case I gave.

Quite frankly, I think this is an area he made the wrong choice on. Which is fine, but still annoying. Like checked exceptions. Logic for the choice was sound, if misguided. End result sucks.

greggirwin · on Oct 21, 2017

I agree. I was sad to see comment support removed from the JSON spec.

bananicorn · on Oct 20, 2017

In case anyone wonders - red-lang[0] and rebol[1] both use this notation.

[0]http://www.red-lang.org/ [1]http://www.rebol.com/

9214 · on Oct 20, 2017

mirror site: http://red.github.io/

protonfish · on Oct 20, 2017

I wish this (and JSON) wouldn't be ambiguous about number type (integers vs. floating point.) There are very different types of data in my opinion.

zitterbewegung · on Oct 20, 2017

A numeric tower could be useful. https://en.wikipedia.org/wiki/Numerical_tower

vorotato · on Oct 20, 2017

C R Q Z N Complex, Real, Rational, Integer, Natural

greggirwin · on Oct 20, 2017

This comment has generated a lot of interest, and rightly so. The thing to remember is that Ren (and JSON) are data exchange formats, not in-memory representations. If you need more control than their numeric syntax supports, metadata is your friend. Just include that where needed. That doesn't mean every endpoint will honor or treat them the same, of course.

specialist · on Oct 20, 2017

"...are data exchange formats, not in-memory representations."

Real world values are still typed.

"If you need more control ... metadata is your friend."

aka implicit user typedefs. Explicit is better.

greggirwin · on Oct 21, 2017

Explicit is different, not better.

neoeldex · on Oct 20, 2017

They are different data types, but integer is also ambiguous, did you mean signed or unsigned, 32 or 64 bit. Its about your use case, and when your parsing something with loose typing into stricter types, make sure you validate accordingly. But being able to specify the exact type kind of defeats the purpose of an interchangeable data format.

dragonwriter · on Oct 20, 2017

Not really; those int types differ in range, but while that matters for a schema format, it doesn't really for an interchange format.

OTOH, exact decimals (of which integers are a subset) differ fundamentally in meaning from limited precision binary (or decimal, though that's more rarely encountered) floating point approximations. Which matters in interchange as well as schema.

protonfish · on Oct 20, 2017

Yes, the specific type of integer and floating-point can vary, but how operators work and display formatting should be consistent. For example, Modulo should be integer-specific, and the concept of precision only applies to floating-point numbers.

cabalamat · on Oct 20, 2017

> integer is also ambiguous, did you mean signed or unsigned, 32 or 64 bit

These are all integers. There only needs to be one integer type for all of: 0 6 -6 8000000000000000000000. Python gets this right.

jerf · on Oct 20, 2017

Now 90%+ of the programming language communities have a bar to implement your specification, because they have to emit a weird type out of the deserializations that will break code in weird ways, and in the case of static languages, you may have created a deserialization format that always emits BigInts (several words of memory and possible a mandatory indirection based on implementation) and will thus be unable to compete on the benchmarks against the serialization formats that specified integer sizes. And your serialization format is that much closer to withering on the vine with no usage.

Or you can go the JSON route, mumble your way through the numbers spec, let every language do its own thing, and tell the handful of people seriously interested in moving integers too large to be precisely specified by a 64-bit float to encode as strings or something and stop worrying everyone else with the complexity....

thaumasiotes · on Oct 21, 2017

> in the case of static languages, you may have created a deserialization format that always emits BigInts

I don't follow this one. If your language, static or otherwise, is capable of turning this JSON:

    "100"

into a string, and this:

into a number of whatever type, then it's also capable of turning this hypothetical input:

into one type of integer, and this:

    100000000000000000000000000000000000000000000000000000000

into another type of integer.

Yaggo · on Oct 21, 2017

Because JSON is a text format, nobody prevents you from encoding floats like this:

{ "int": 1, "float": 1.0 }

Of course it's up to the parser implementation to interpret 1.0 as a float.

The spec is very relaxed about the number type:

https://tools.ietf.org/html/rfc7159#section-6

_1qd4 · on Oct 20, 2017

I disagree. For 99% of applications, a number is a number is a number. Integers and floating points are a leaky abstraction, I expect the language/compiler to handle these.

amyjess · on Oct 20, 2017

The ideal solution is to parse all numbers as a decimal or rational.

(one of my favorite things about Perl 6 is that non-integer literals are rationals by default; floating-point is only used if the literal is specified in scientific notation)

dmm · on Oct 20, 2017

Try passing around 64bit integers in json. The majority of json implementations only implement double float numbers and will mangle your 64bit ints. The usual solution is passing int64s as strings, nasty.

colanderman · on Oct 20, 2017

Those implementations are as broken as a database that stores ZIP codes as integers.

dmm · on Oct 20, 2017

Hit F12 on chrome and enter the following:

    JSON.parse('[9223372036854775805]')[0] === JSON.parse('[9223372036854775806]')[0]

You should get:

    <- true

Is chrome's implementation broken? Those are both numbers that a signed int64 can store precisely but a double float cannot.

colanderman · on Oct 20, 2017

Yes, the JavaScript JSON parser is broken (in design), in that it silently loses data/precision of JSON numbers. JSON numbers are not "int64s" or "double floats", they are arbitrary-precision decimal values. [1]

Postgres is the only popular system with built-in JSON functionality that I know of to correctly round-trip JSON numeric data. Python comes close but fails for any number with a decimal point or in E-notation.

One can argue that the JSON spec is too permissive (and I would disagree, though I'm somewhat a purist). Or a pedant could note that the JSON spec doesn't actually say whether any of the digits in a number are considered to carry information (and does in fact note that many parsers are faulty). But it's unfortunately true that most popular JSON parsers fail to round-trip valid JSON data due to flawed design.

[1] http://json.org/

dragonwriter · on Oct 20, 2017

> Or a pedant could note that the JSON spec doesn't actually say whether any of the digits in a number are considered to carry information

RFC 7159 not only specifies that all of the digits carry information, it specified exactly what information they carry.

However, it also expressly permits implementations to limit the range and precision of numbers accepted, recommending (but not requiring) range and precision at least equivalent to IEEE 754 float64 be supported.

colanderman · on Oct 20, 2017

> RFC 7159 not only specifies that all of the digits carry information, it specified exactly what information they carry.

Can you quote? I don't see where it does, except by reference to common knowledge.

But I should clarify. There are several inefficiencies in JSON numeric representation which may or may not be considered significant by an application. The ones I can think of, in order from "obviously not" to "well, maybe":

1. the case of "e" vs. "E"

2. the optional "+" sign after "e" or "E"

3. presence/absence of decimal point and/or E-notation (shouldn't make a difference, but does in many parsers, such as Python's)

4. the value of the exponent itself (e.g. 3.14 vs. 314e-2)

5. "-" sign in front of any number with a value of 0 (IEEE floats and ones-complement integers do have a negative zero)

6. excess trailing 0 digits after the decimal place (may be used to represent significant figures in scientific applications)

7. digits of lesser significance (obviously the most contentious)

JavaScript ignores all but #5. Python ignores all but #3, and due to #3, sometimes #5 and #7. PostgreSQL `json` ignores none; `jsonb` ignores all but #6 and #7. Personally I would draw the line between #4 and #5. But neither the RFC nor the ECMA spec tell us.

hyperpape · on Oct 20, 2017

One of the multiple JSON specs indicates that implementations are free to introduce limits on precision or range of numbers: https://tools.ietf.org/html/rfc7159#section-3.

colanderman · on Oct 20, 2017

I concede that "broken" is too strong a word. "Subpar" would be more accurate. Like a 7-bit MIME gateway in an 8-bit world.

dragonwriter · on Oct 20, 2017

No, JavaScript simply has inadequate numeric support.

Chrome’s implementation is fine (within spec) and makes the natural choice for a general purpose JSON implementation for JS.

imron · on Oct 20, 2017

Yes, I think everyone is in agreement that Javascript is broken.

_1qd4 · on Oct 20, 2017

That's not a JSON problem, that's a problem with whatever language you're using to consume the JSON. Back to my point, you as a developer shouldn't have to worry about this at all. You should be able to add numbers to JSON and retrieve them without it harassing you. To a human, 123, 0.123, 12345789347590837452987543758973459734 are all the same "type". Why is it that you can save your int64s as strings but not as numbers? It's a stupid leak on the part of the language if you have to use this workaround.

burntsushi · on Oct 20, 2017

> Why is it that you can save your int64s as strings but not as numbers? It's a stupid leak on the part of the language if you have to use this workaround.

Javascript, for example, has this problem because everything is just a "number" and the number data type is a IEEE-754 double precision float. A double precision float can't represent all int64 values, so transmitting an int64 via JSON to Javascript and using the standard JSON parser is lossy. You could say that Javascript screwed up by making number have a specific precision, and you might be right. Python, for example, gives you seamless arbitrary precision arithmetic.

But the decision to use an arbitrary precision type for all numbers in every language is definitely not appropriate. Arbitrary precision has trade offs, and sometimes it's important to work with machine integers.

colanderman · on Oct 20, 2017

The correct solution is to only convert a JSON value to an integer (or string, or array, or dictionary/object) when the application explicitly requests it. 99% of the time, 64-bit integers are not used by a JS app directly, but rather are passed right back to the backend they came from. If they were never converted to JS numbers in the first place, there can be no precision loss.

"Deep embedding" of JSON data into the host language is the cause of this issue. Recognition that JSON is a separate data type and does not always have an obvious encoding in the native language (including JSON arrays, objects, and null, even if they look like native arrays, objects, and null when you squint at them) is key.

The only widespread JSON implementation I know of to get this right is Postgres's: JSON values live as the "json" or "jsonb" type until you explicitly convert them. No data loss is incurred, with the exception that "jsonb" normalizes E-notation, conflates 0 and -0, drops duplicate object keys, and normalizes their order. Even numeric significant 0 digits are preserved.

iainmerrick · on Oct 20, 2017

I'd put it at more like 50% of applications.

hbex5 · on Oct 20, 2017

It looks fairly similar to EDN used by Clojure too. Similar, but not, of course, the same.

https://learnxinyminutes.com/docs/edn/

greggirwin · on Oct 20, 2017

Some similarities, but different in that EDN has a tagging approach for some types, and a more limited set of literal value formats.

frou_dh · on Oct 21, 2017

It's a shame that EDN never got over the hump in terms of adoption outside of the Clojure community. At least that's my impression.

4lch3m1st · on Oct 20, 2017

The syntax itself reminds me a lot of Lisp (Scheme, specifically) but this is probably on purpose. I wonder if it can be used on Lisps the same way JSON is used in JS.

9214 · on Oct 20, 2017

check out Red and Rebol languages:

http://red-lang.org/

http://red.github.io/

http://rebol.com/

I think first version of Rebol interpreter was written in Scheme ;)

faitswulff · on Oct 20, 2017

A different kind of pedantry: which is it, 仁 or 人 ? Both appear on the page and the former seems more fitting.

imron · on Oct 20, 2017

When I saw the headline my first reaction was to think "hah I bet it's called Ren because it's a data-exchange format for people", then I saw 仁 and thought - ah that's a much better choice.

greggirwin · on Oct 20, 2017

Thanks for the input. The idiograms were found based on the Ren name and meaning, but I'm not a native Chinese speaker, so will defer to those who know, and make things clear.

rebolek · on Oct 20, 2017

We agreed on 人, not sure where 仁 came from.

imron · on Oct 20, 2017

> not sure where 仁 came from

It means humaneness, benevolence, kindness, and is pronounced exactly the same as 人 which means person/people.

greggirwin · on Oct 20, 2017

Without a clear choice (as an English speaker) as to which might be more exact in translation, I think I pulled 仁 in because it also looks like the opening of a block, which is fundamental in Ren.

vesinisa · on Oct 20, 2017

Easiest way to understand the notation would be to see some examples, but the "Test files" link leads to nowhere.

draegtun · on Oct 20, 2017

Here's a link to a another example - https://github.com/humanistic/REN

kiliancs · on Oct 20, 2017

The first thing in a small screen an example. On larger screens it still appears at the top, but floating to the right of the "Goals" section.

mamcx · on Oct 20, 2017

When click into the resources links all open in blank pages?

greggirwin · on Oct 20, 2017

Apologies for that. A lot were just for fun/future ideas. Many links to existing resources on other formats do work.

perryprog · on Oct 20, 2017

Not for me.

doodpants · on Oct 20, 2017

The links in the first column under References all have "http://" as their destination. The links in the other columns seem to be fine.

Also, the links under Implementations all result in a 404 page.

jepler · on Oct 20, 2017

It's cute that the spec believes money is denoted by "$".

evv · on Oct 20, 2017

I'm confused.. why is this cute?

Of course "$" is used as the symbol for several currencies, but its not clear if you have a better symbol in mind for the generic concept of money.

https://en.wikipedia.org/wiki/$_(disambiguation)

jepler · on Oct 20, 2017

Sarcastically cute. It appears very parochial (in the sense of "limited or narrow outlook") to think that having a single "currency type" is useful, or that using a "$" to denote it is appropriate. While "$" is used by at least one important world currency, I am pretty sure that more people use ¥ than $ by a long shot, and users of € are about as plentiful as people who spend in USD.

(The idea that in Rebol one might write EUR$1.00 to denote the value that would usually be written 1.00€ is also pretty horrible)

greggirwin · on Oct 20, 2017

@jepler,

1) Do you agree that a notation should support currency values? That is, they are useful to identify in data, as atomic units?

2) If so, do you agree there should be a single standard symbol and lexical form used to identify them? Because if we don't do that, we have to support every localized notation, correct?

3) If you agree to both of those questions, what symbol do you suggest? ¤ is generic, but not on any keyboard layout I know of. Also, see Chris's note about ASCII priority.

jepler · on Oct 20, 2017

No, I can honestly say I've never worked on a software project that dealt directly with currency values. Personally, in the kinds of projects I do, explicit support for units of measurement would be more beneficial: you'd like to use the type system to detect where units from different systems are mixed (e.g., km+mi) and behave appropriately by performing a conversion; or to detect where inappropriate units are mixed (e.g., kg+hz) and signal an error (at project build time if possible!)

With that background in mind, I imagine the scenario here you have two data in a Ren document which are both the literal $1.00, but one is actually 1.00USD and the other is actually 1.00EUR: it doesn't prevent errors (for instance, when you want to perform an operation like + on the two data), because you still don't know what the data means. You have gained very little over just using the literal 1.00 instead.

So if I were making a proposal I'd be tempted to suggest a syntax like [1.00 USD], and maybe even giving up one of the remaining sigils ^[1.00 USD] if it is important to raise to being a special element in the syntax of a Ren file. Now that you're saying what you mean, you can use the same syntax for all units: ^[1.00 kg m -2] (1 kilogram per square meter), ^[1.00 V hz -.5] (1 volt per square-root-hertz, a typical units specification of noise in opamps).

greggirwin · on Oct 20, 2017

No need for special syntax. ^ is already the escape character. Just use blocks. Though a `unit` syntax has been brought up. The notation would use path syntax, but start with an number instead of a word. Frink, of course, is the king of languages in this regard.

And while this may be more beneficial in your work, a lot of software does have to deal with money, where it's important not to use floating point, but BCD or something else.

juki · on Oct 20, 2017

> ¤ is generic, but not on any keyboard layout I know of.

Finnish (/ Swedish) keyboards have ¤ on `<Shift> 4` ($ is `<Alt Gr> 4`).

greggirwin · on Oct 20, 2017

Thanks! Good to know.

XR0CSWV3h3kZWg · on Oct 20, 2017

The use of $ is far larger than the US.

US: 325,365,189

Canada: 35,151,728

Taiwan: 23,550,077

Australia: 24,688,400

Ecuador: 16,385,068

Hong Kong: 7,374,900

El salvador: 6,344,722

Singapore: 5,607,300

New Zealand: 4,826,660

Liberia: 4,503,000

Jamaica: 2,881,355

Namibia: 2,113,077

East Timor: 1,167,242

Belize: 387,879

Micronesia: 104,937

Marshal Islands: 53,066

Palau: 21,503

Caribbean Netherlands: 25,019

460,551,122 in total. Which is still less than 50% of China's 1.4B

jepler · on Oct 20, 2017

Yes, I tried to be careful in what I stated, but here's what I meant: "$" is the everyday currency symbol of far fewer people than "¥". "USD" is the everyday currency of only about as many people as "EUR"

XR0CSWV3h3kZWg · on Oct 20, 2017

Yeah well done, when I first looked at the list of countries I figured they might add up to ~1B, but they didn't even come close!

It's easy to forget just how much bigger China is.

greggirwin · on Oct 20, 2017

What do you suggest? ¤ is really the only other option, but $ seems much more universal, if you can choose only one.

rgchris · on Oct 20, 2017

For the most part, Ren is delimited by ASCII characters. Rebol permitted a three-letter code to denote currency: USD$10 GBP$15

lmm · on Oct 20, 2017

In another context it might be cute; with the weeaboo name it's rather horrifying.

pnathan · on Oct 20, 2017

Another go-round, trying to solve the same problem as TOML.

greggirwin · on Oct 20, 2017

Not so. From the TOML page: "TOML aims to be a minimal configuration file format."

Ren is intended to be a general purpose data exchange format.

pnathan · on Oct 26, 2017

welp, I'm wrong. So it goes. Better reading in the future. :-/

I'll just leave my comment above to not destroy the comment chain.