Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ren: a lightweight data-exchange text format (pointillistic.com)
92 points by bananicorn on Oct 20, 2017 | hide | past | favorite | 88 comments


Author of the Ren page being discussed here. Didn't expect to see it on HN. :^)

The Ren site (ren-data.org) redirects there, is quite old now, and was a playground and experimental area. Hence the empty links and such.

In addition to Bolek's Humanistic repo, I had set up https://github.com/Ren-data/Ren to discuss ideas. Ren is, effectively, Redbol (Red+Rebol). One of the initial goals was to define a subset of values and normalize the syntax (Rebol never formalized its format spec), which could be shared across Redbol langs as they evolved and went in different directions. And also as a bridge for loaders in other languages. JSON has taught us that a small spec is important. The balance between simplicity and expressive value types is key.

It may come back to life at some point, but my time was better spent elsewhere for a while. I'm focused on Red now (red-lang.org). It has a native bridge feature for embedding in other langs (https://doc.red-lang.org/en/libred.html), along with a lot more. I believe there is still value in formalizing the grammar so others can create their own implementations, but it's not a priority at this time. In the meantime, you can find the active Red community at https://gitter.im/red/red to get more information and examples of what it looks like in use.

Cheers.


For base types, rather than a set of specific data types which is limited, how about having a standard way to indicate a type and data, like "number:12" and "rational:12/17" and "string:foo" and "date:2017-10-20" and "bitmap-hex:cafebabea2b9c0..."? Then let the reader parse it however they want given their desired interpretation of the type tag.

There could be a base set of standard types and ideally an organizational process to add more standard types -- like mimetypes. Mimetypes also might be considered included by default perhaps like "application/json; charset=utf-8:[1, 2, 3]").

It is more characters to include a type for each primitive, but it is more expandable. Likely these files will mostly be generated and read by code anyway, with humans just looking at them now and then for debugging.

If that string version seems too cluttery, another option is something like: rational:12/17 and string:foo and string:"this has spaces".

Or given possible confusion with maps and colons, another option is rational/12/17 and string/foo and string/"this has spaces" and complex/−1+3i and real:30.564 and "application/json; charset=utf-8"/[1,2,3] and maybe even xml/<foo>bar\ baz</foo> and javascript/console.log("hello") and so on.

Or maybe pipes? Like: rational|12/17 and string|foo and string|"this has spaces" and complex|−1+3i and real|30.564 and "application/json; charset=utf-8"|[1,2,3] and maybe even xml|<foo>bar\ baz</foo> and javascript|console.log("hello") and so on.


That would be a very different format. A big part of the format is that the lexical forms allow you to write things as you normally would, to another human. That does impose limits, but is part of the fundamental design.


The very first entry in the FAQ ("Why is none used instead of null?") isn't very convincing! It says null "it isn't friendly to normal people who might be given configuration or message files to edit," and gives the example:

  Children: none
  Opinions: none
That certainly looks pleasant and human-readable, but a bit of a nightmare to interpret! Couldn't it just as well be "Children: 0" or "Children: []"? If the idea is to let non-technical people edit configuration files, the code reading the file will have to be very flexible and forgiving.

(Edit to add: maybe that's something you mostly get for free in REBOL? It would be a major headache in most other languages though)


Yup, the right reason is, that `none` is free in Rebol/Red. Ren is basically nothing else than Rebol/Red syntax under different name. I was of the designers of Ren, it’s been few years, but it’s cool to see it on HN.


Note that Ren-C (a fork of Rebol that is only tangentially related to Ren) uses `blank` or literal `_`.

The language concept is a bit more fundamental than config files. Rebol (and Ren-C and cousin Red) is a fully homoiconic language that itself has tools for parsing and interpreting DSLs ('Dialects' to use the language's nomenclature) that use the same language rules. In essence you do indeed get the lexer for free.

Would recommend a wee read of: http://blog.hostilefork.com/why-rebol-red-parse-cool/

I've argued that Ren is somewhat redundant as it fills the same space that Rebol does though Rebol lacks a formal specification at this time.


Which Rebol fills this space, R2 or R3? The idea was also not loadable Redbol compatibility (which would be nice of course), but general utility. So Ren adds new types and changes others, where it thinks that is best.


Valid points about the ambiguity in that example, but I do believe 'none' is easier to parse than 'null' for the average person. I'm assuming that none is meant to represent, like null "no input was given" not "an input of zero/an empty list was given".


But in that case, wouldn't “unknown“ be even better? I'm not a native english speaker, though, so I may miss some fine nuances in the meaning of “none“.


Disclaimer: this is just my personal views, so I may be missing something here.

'unknown' is almost like saying "We don't know what should be here", while 'none' is closer to saying "We didn't get passed any information here".

Ultimately, I feel like 'unknown' is more specific, it conveys intent beyond that of 'none'. Such as, it could imply we actually don't know what kind of data 'Opinions' should hold, rather than that there merely happens to be no data to put there.


We can throw out options for a long time, as there is no perfect choice. Ren went with `none` as it is standard Rebol and Rebol's designer thought about it long and hard.


As a native English speaker, I agree that "unknown" is better than "none".


I would hope that a native English speaker knows the difference between

Cookies in the jar: none

and

Cookies in the jar: unknown


Unknown is a nice word for this, yes. And while we don't focus exclusively on being terse in the Redbol world, "unknown" is a lot more work to type than "none", harder to scan visually as well.


also "Some | None" is a common functional programming idiom.


It's worth mentioning that Python refers to null as None.


Noted. Thanks.


I'm annoyed when a text format doesn't support comments. Especially a Human one :)

Sometimes you want to comment a section out of a JSON without deleting it. Other times you want to annotate some generated JSON.

And because this is a common need, you have ad-hoc unofficial solutions which are not supported by all parsers.


In the book Programming Pearls, one of the sections is on "provenance". One of the tips shared was someone that put at the header of every generated file, the command that was used to generate it. I absolutely love the idea and have wanted to do it in places I generate files before. JSON completely kills this, though. I don't want to encode the bloody command, just put a little comment "this file generated on DATE by COMMAND". Grr...


Ironically, that is exactly why JSON doesn't have comments: https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...


Strictly, that is different. He was against parser directives. And offered a solution for the case I gave.

Quite frankly, I think this is an area he made the wrong choice on. Which is fine, but still annoying. Like checked exceptions. Logic for the choice was sound, if misguided. End result sucks.


I agree. I was sad to see comment support removed from the JSON spec.


In case anyone wonders - red-lang[0] and rebol[1] both use this notation.

[0]http://www.red-lang.org/ [1]http://www.rebol.com/



I wish this (and JSON) wouldn't be ambiguous about number type (integers vs. floating point.) There are very different types of data in my opinion.


A numeric tower could be useful. https://en.wikipedia.org/wiki/Numerical_tower


C R Q Z N Complex, Real, Rational, Integer, Natural


This comment has generated a lot of interest, and rightly so. The thing to remember is that Ren (and JSON) are data exchange formats, not in-memory representations. If you need more control than their numeric syntax supports, metadata is your friend. Just include that where needed. That doesn't mean every endpoint will honor or treat them the same, of course.


"...are data exchange formats, not in-memory representations."

Real world values are still typed.

"If you need more control ... metadata is your friend."

aka implicit user typedefs. Explicit is better.


Explicit is different, not better.


They are different data types, but integer is also ambiguous, did you mean signed or unsigned, 32 or 64 bit. Its about your use case, and when your parsing something with loose typing into stricter types, make sure you validate accordingly. But being able to specify the exact type kind of defeats the purpose of an interchangeable data format.


Not really; those int types differ in range, but while that matters for a schema format, it doesn't really for an interchange format.

OTOH, exact decimals (of which integers are a subset) differ fundamentally in meaning from limited precision binary (or decimal, though that's more rarely encountered) floating point approximations. Which matters in interchange as well as schema.


Yes, the specific type of integer and floating-point can vary, but how operators work and display formatting should be consistent. For example, Modulo should be integer-specific, and the concept of precision only applies to floating-point numbers.


> integer is also ambiguous, did you mean signed or unsigned, 32 or 64 bit

These are all integers. There only needs to be one integer type for all of: 0 6 -6 8000000000000000000000. Python gets this right.


Now 90%+ of the programming language communities have a bar to implement your specification, because they have to emit a weird type out of the deserializations that will break code in weird ways, and in the case of static languages, you may have created a deserialization format that always emits BigInts (several words of memory and possible a mandatory indirection based on implementation) and will thus be unable to compete on the benchmarks against the serialization formats that specified integer sizes. And your serialization format is that much closer to withering on the vine with no usage.

Or you can go the JSON route, mumble your way through the numbers spec, let every language do its own thing, and tell the handful of people seriously interested in moving integers too large to be precisely specified by a 64-bit float to encode as strings or something and stop worrying everyone else with the complexity....


> in the case of static languages, you may have created a deserialization format that always emits BigInts

I don't follow this one. If your language, static or otherwise, is capable of turning this JSON:

    "100"
into a string, and this:

    100
into a number of whatever type, then it's also capable of turning this hypothetical input:

    100
into one type of integer, and this:

    100000000000000000000000000000000000000000000000000000000
into another type of integer.


Because JSON is a text format, nobody prevents you from encoding floats like this:

{ "int": 1, "float": 1.0 }

Of course it's up to the parser implementation to interpret 1.0 as a float.

The spec is very relaxed about the number type:

https://tools.ietf.org/html/rfc7159#section-6


I disagree. For 99% of applications, a number is a number is a number. Integers and floating points are a leaky abstraction, I expect the language/compiler to handle these.


The ideal solution is to parse all numbers as a decimal or rational.

(one of my favorite things about Perl 6 is that non-integer literals are rationals by default; floating-point is only used if the literal is specified in scientific notation)


Try passing around 64bit integers in json. The majority of json implementations only implement double float numbers and will mangle your 64bit ints. The usual solution is passing int64s as strings, nasty.


Those implementations are as broken as a database that stores ZIP codes as integers.


Hit F12 on chrome and enter the following:

    JSON.parse('[9223372036854775805]')[0] === JSON.parse('[9223372036854775806]')[0]
You should get:

    <- true
Is chrome's implementation broken? Those are both numbers that a signed int64 can store precisely but a double float cannot.


Yes, the JavaScript JSON parser is broken (in design), in that it silently loses data/precision of JSON numbers. JSON numbers are not "int64s" or "double floats", they are arbitrary-precision decimal values. [1]

Postgres is the only popular system with built-in JSON functionality that I know of to correctly round-trip JSON numeric data. Python comes close but fails for any number with a decimal point or in E-notation.

One can argue that the JSON spec is too permissive (and I would disagree, though I'm somewhat a purist). Or a pedant could note that the JSON spec doesn't actually say whether any of the digits in a number are considered to carry information (and does in fact note that many parsers are faulty). But it's unfortunately true that most popular JSON parsers fail to round-trip valid JSON data due to flawed design.

[1] http://json.org/


> Or a pedant could note that the JSON spec doesn't actually say whether any of the digits in a number are considered to carry information

RFC 7159 not only specifies that all of the digits carry information, it specified exactly what information they carry.

However, it also expressly permits implementations to limit the range and precision of numbers accepted, recommending (but not requiring) range and precision at least equivalent to IEEE 754 float64 be supported.


> RFC 7159 not only specifies that all of the digits carry information, it specified exactly what information they carry.

Can you quote? I don't see where it does, except by reference to common knowledge.

But I should clarify. There are several inefficiencies in JSON numeric representation which may or may not be considered significant by an application. The ones I can think of, in order from "obviously not" to "well, maybe":

1. the case of "e" vs. "E"

2. the optional "+" sign after "e" or "E"

3. presence/absence of decimal point and/or E-notation (shouldn't make a difference, but does in many parsers, such as Python's)

4. the value of the exponent itself (e.g. 3.14 vs. 314e-2)

5. "-" sign in front of any number with a value of 0 (IEEE floats and ones-complement integers do have a negative zero)

6. excess trailing 0 digits after the decimal place (may be used to represent significant figures in scientific applications)

7. digits of lesser significance (obviously the most contentious)

JavaScript ignores all but #5. Python ignores all but #3, and due to #3, sometimes #5 and #7. PostgreSQL `json` ignores none; `jsonb` ignores all but #6 and #7. Personally I would draw the line between #4 and #5. But neither the RFC nor the ECMA spec tell us.


One of the multiple JSON specs indicates that implementations are free to introduce limits on precision or range of numbers: https://tools.ietf.org/html/rfc7159#section-3.


I concede that "broken" is too strong a word. "Subpar" would be more accurate. Like a 7-bit MIME gateway in an 8-bit world.


No, JavaScript simply has inadequate numeric support.

Chrome’s implementation is fine (within spec) and makes the natural choice for a general purpose JSON implementation for JS.


Yes, I think everyone is in agreement that Javascript is broken.


That's not a JSON problem, that's a problem with whatever language you're using to consume the JSON. Back to my point, you as a developer shouldn't have to worry about this at all. You should be able to add numbers to JSON and retrieve them without it harassing you. To a human, 123, 0.123, 12345789347590837452987543758973459734 are all the same "type". Why is it that you can save your int64s as strings but not as numbers? It's a stupid leak on the part of the language if you have to use this workaround.


> Why is it that you can save your int64s as strings but not as numbers? It's a stupid leak on the part of the language if you have to use this workaround.

Javascript, for example, has this problem because everything is just a "number" and the number data type is a IEEE-754 double precision float. A double precision float can't represent all int64 values, so transmitting an int64 via JSON to Javascript and using the standard JSON parser is lossy. You could say that Javascript screwed up by making number have a specific precision, and you might be right. Python, for example, gives you seamless arbitrary precision arithmetic.

But the decision to use an arbitrary precision type for all numbers in every language is definitely not appropriate. Arbitrary precision has trade offs, and sometimes it's important to work with machine integers.


The correct solution is to only convert a JSON value to an integer (or string, or array, or dictionary/object) when the application explicitly requests it. 99% of the time, 64-bit integers are not used by a JS app directly, but rather are passed right back to the backend they came from. If they were never converted to JS numbers in the first place, there can be no precision loss.

"Deep embedding" of JSON data into the host language is the cause of this issue. Recognition that JSON is a separate data type and does not always have an obvious encoding in the native language (including JSON arrays, objects, and null, even if they look like native arrays, objects, and null when you squint at them) is key.

The only widespread JSON implementation I know of to get this right is Postgres's: JSON values live as the "json" or "jsonb" type until you explicitly convert them. No data loss is incurred, with the exception that "jsonb" normalizes E-notation, conflates 0 and -0, drops duplicate object keys, and normalizes their order. Even numeric significant 0 digits are preserved.


I'd put it at more like 50% of applications.


It looks fairly similar to EDN used by Clojure too. Similar, but not, of course, the same.

https://learnxinyminutes.com/docs/edn/


Some similarities, but different in that EDN has a tagging approach for some types, and a more limited set of literal value formats.


It's a shame that EDN never got over the hump in terms of adoption outside of the Clojure community. At least that's my impression.


The syntax itself reminds me a lot of Lisp (Scheme, specifically) but this is probably on purpose. I wonder if it can be used on Lisps the same way JSON is used in JS.


check out Red and Rebol languages:

http://red-lang.org/

http://red.github.io/

http://rebol.com/

I think first version of Rebol interpreter was written in Scheme ;)


A different kind of pedantry: which is it, 仁 or 人 ? Both appear on the page and the former seems more fitting.


When I saw the headline my first reaction was to think "hah I bet it's called Ren because it's a data-exchange format for people", then I saw 仁 and thought - ah that's a much better choice.


Thanks for the input. The idiograms were found based on the Ren name and meaning, but I'm not a native Chinese speaker, so will defer to those who know, and make things clear.


We agreed on 人, not sure where 仁 came from.


> not sure where 仁 came from

It means humaneness, benevolence, kindness, and is pronounced exactly the same as 人 which means person/people.


Without a clear choice (as an English speaker) as to which might be more exact in translation, I think I pulled 仁 in because it also looks like the opening of a block, which is fundamental in Ren.


Easiest way to understand the notation would be to see some examples, but the "Test files" link leads to nowhere.


Here's a link to a another example - https://github.com/humanistic/REN


The first thing in a small screen an example. On larger screens it still appears at the top, but floating to the right of the "Goals" section.


When click into the resources links all open in blank pages?


Apologies for that. A lot were just for fun/future ideas. Many links to existing resources on other formats do work.


Not for me.


The links in the first column under References all have "http://" as their destination. The links in the other columns seem to be fine.

Also, the links under Implementations all result in a 404 page.


It's cute that the spec believes money is denoted by "$".


I'm confused.. why is this cute?

Of course "$" is used as the symbol for several currencies, but its not clear if you have a better symbol in mind for the generic concept of money.

https://en.wikipedia.org/wiki/$_(disambiguation)


Sarcastically cute. It appears very parochial (in the sense of "limited or narrow outlook") to think that having a single "currency type" is useful, or that using a "$" to denote it is appropriate. While "$" is used by at least one important world currency, I am pretty sure that more people use ¥ than $ by a long shot, and users of € are about as plentiful as people who spend in USD.

(The idea that in Rebol one might write EUR$1.00 to denote the value that would usually be written 1.00€ is also pretty horrible)


@jepler,

1) Do you agree that a notation should support currency values? That is, they are useful to identify in data, as atomic units?

2) If so, do you agree there should be a single standard symbol and lexical form used to identify them? Because if we don't do that, we have to support every localized notation, correct?

3) If you agree to both of those questions, what symbol do you suggest? ¤ is generic, but not on any keyboard layout I know of. Also, see Chris's note about ASCII priority.


No, I can honestly say I've never worked on a software project that dealt directly with currency values. Personally, in the kinds of projects I do, explicit support for units of measurement would be more beneficial: you'd like to use the type system to detect where units from different systems are mixed (e.g., km+mi) and behave appropriately by performing a conversion; or to detect where inappropriate units are mixed (e.g., kg+hz) and signal an error (at project build time if possible!)

With that background in mind, I imagine the scenario here you have two data in a Ren document which are both the literal $1.00, but one is actually 1.00USD and the other is actually 1.00EUR: it doesn't prevent errors (for instance, when you want to perform an operation like + on the two data), because you still don't know what the data means. You have gained very little over just using the literal 1.00 instead.

So if I were making a proposal I'd be tempted to suggest a syntax like [1.00 USD], and maybe even giving up one of the remaining sigils ^[1.00 USD] if it is important to raise to being a special element in the syntax of a Ren file. Now that you're saying what you mean, you can use the same syntax for all units: ^[1.00 kg m -2] (1 kilogram per square meter), ^[1.00 V hz -.5] (1 volt per square-root-hertz, a typical units specification of noise in opamps).


No need for special syntax. ^ is already the escape character. Just use blocks. Though a `unit` syntax has been brought up. The notation would use path syntax, but start with an number instead of a word. Frink, of course, is the king of languages in this regard.

And while this may be more beneficial in your work, a lot of software does have to deal with money, where it's important not to use floating point, but BCD or something else.


> ¤ is generic, but not on any keyboard layout I know of.

Finnish (/ Swedish) keyboards have ¤ on `<Shift> 4` ($ is `<Alt Gr> 4`).


Thanks! Good to know.


The use of $ is far larger than the US.

US: 325,365,189

Canada: 35,151,728

Taiwan: 23,550,077

Australia: 24,688,400

Ecuador: 16,385,068

Hong Kong: 7,374,900

El salvador: 6,344,722

Singapore: 5,607,300

New Zealand: 4,826,660

Liberia: 4,503,000

Jamaica: 2,881,355

Namibia: 2,113,077

East Timor: 1,167,242

Belize: 387,879

Micronesia: 104,937

Marshal Islands: 53,066

Palau: 21,503

Caribbean Netherlands: 25,019

460,551,122 in total. Which is still less than 50% of China's 1.4B


Yes, I tried to be careful in what I stated, but here's what I meant: "$" is the everyday currency symbol of far fewer people than "¥". "USD" is the everyday currency of only about as many people as "EUR"


Yeah well done, when I first looked at the list of countries I figured they might add up to ~1B, but they didn't even come close!

It's easy to forget just how much bigger China is.


What do you suggest? ¤ is really the only other option, but $ seems much more universal, if you can choose only one.


For the most part, Ren is delimited by ASCII characters. Rebol permitted a three-letter code to denote currency: USD$10 GBP$15


In another context it might be cute; with the weeaboo name it's rather horrifying.


Another go-round, trying to solve the same problem as TOML.


Not so. From the TOML page: "TOML aims to be a minimal configuration file format."

Ren is intended to be a general purpose data exchange format.


welp, I'm wrong. So it goes. Better reading in the future. :-/

I'll just leave my comment above to not destroy the comment chain.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: