Author of the Ren page being discussed here. Didn't expect to see it on HN. :^)
The Ren site (ren-data.org) redirects there, is quite old now, and was a playground and experimental area. Hence the empty links and such.
In addition to Bolek's Humanistic repo, I had set up https://github.com/Ren-data/Ren to discuss ideas. Ren is, effectively, Redbol (Red+Rebol). One of the initial goals was to define a subset of values and normalize the syntax (Rebol never formalized its format spec), which could be shared across Redbol langs as they evolved and went in different directions. And also as a bridge for loaders in other languages. JSON has taught us that a small spec is important. The balance between simplicity and expressive value types is key.
It may come back to life at some point, but my time was better spent elsewhere for a while. I'm focused on Red now (red-lang.org). It has a native bridge feature for embedding in other langs (https://doc.red-lang.org/en/libred.html), along with a lot more. I believe there is still value in formalizing the grammar so others can create their own implementations, but it's not a priority at this time. In the meantime, you can find the active Red community at https://gitter.im/red/red to get more information and examples of what it looks like in use.
For base types, rather than a set of specific data types which is limited, how about having a standard way to indicate a type and data, like "number:12" and "rational:12/17" and "string:foo" and "date:2017-10-20" and "bitmap-hex:cafebabea2b9c0..."? Then let the reader parse it however they want given their desired interpretation of the type tag.
There could be a base set of standard types and ideally an organizational process to add more standard types -- like mimetypes. Mimetypes also might be considered included by default perhaps like "application/json; charset=utf-8:[1, 2, 3]").
It is more characters to include a type for each primitive, but it is more expandable. Likely these files will mostly be generated and read by code anyway, with humans just looking at them now and then for debugging.
If that string version seems too cluttery, another option is something like: rational:12/17 and string:foo and string:"this has spaces".
Or given possible confusion with maps and colons, another option is rational/12/17 and string/foo and string/"this has spaces" and complex/−1+3i and real:30.564 and "application/json; charset=utf-8"/[1,2,3] and maybe even xml/<foo>bar\ baz</foo> and javascript/console.log("hello") and so on.
Or maybe pipes? Like: rational|12/17 and string|foo and string|"this has spaces" and complex|−1+3i and real|30.564 and "application/json; charset=utf-8"|[1,2,3] and maybe even xml|<foo>bar\ baz</foo> and javascript|console.log("hello") and so on.
That would be a very different format. A big part of the format is that the lexical forms allow you to write things as you normally would, to another human. That does impose limits, but is part of the fundamental design.
The very first entry in the FAQ ("Why is none used instead of null?") isn't very convincing! It says null "it isn't friendly to normal people who might be given configuration or message files to edit," and gives the example:
Children: none
Opinions: none
That certainly looks pleasant and human-readable, but a bit of a nightmare to interpret! Couldn't it just as well be "Children: 0" or "Children: []"? If the idea is to let non-technical people edit configuration files, the code reading the file will have to be very flexible and forgiving.
(Edit to add: maybe that's something you mostly get for free in REBOL? It would be a major headache in most other languages though)
Yup, the right reason is, that `none` is free in Rebol/Red. Ren is basically nothing else than Rebol/Red syntax under different name. I was of the designers of Ren, it’s been few years, but it’s cool to see it on HN.
Note that Ren-C (a fork of Rebol that is only tangentially related to Ren) uses `blank` or literal `_`.
The language concept is a bit more fundamental than config files. Rebol (and Ren-C and cousin Red) is a fully homoiconic language that itself has tools for parsing and interpreting DSLs ('Dialects' to use the language's nomenclature) that use the same language rules. In essence you do indeed get the lexer for free.
Which Rebol fills this space, R2 or R3? The idea was also not loadable Redbol compatibility (which would be nice of course), but general utility. So Ren adds new types and changes others, where it thinks that is best.
Valid points about the ambiguity in that example, but I do believe 'none' is easier to parse than 'null' for the average person. I'm assuming that none is meant to represent, like null "no input was given" not "an input of zero/an empty list was given".
But in that case, wouldn't “unknown“ be even better? I'm not a native english speaker, though, so I may miss some fine nuances in the meaning of “none“.
Disclaimer: this is just my personal views, so I may be missing something here.
'unknown' is almost like saying "We don't know what should be here", while 'none' is closer to saying "We didn't get passed any information here".
Ultimately, I feel like 'unknown' is more specific, it conveys intent beyond that of 'none'. Such as, it could imply we actually don't know what kind of data 'Opinions' should hold, rather than that there merely happens to be no data to put there.
We can throw out options for a long time, as there is no perfect choice. Ren went with `none` as it is standard Rebol and Rebol's designer thought about it long and hard.
Unknown is a nice word for this, yes. And while we don't focus exclusively on being terse in the Redbol world, "unknown" is a lot more work to type than "none", harder to scan visually as well.
In the book Programming Pearls, one of the sections is on "provenance". One of the tips shared was someone that put at the header of every generated file, the command that was used to generate it. I absolutely love the idea and have wanted to do it in places I generate files before. JSON completely kills this, though. I don't want to encode the bloody command, just put a little comment "this file generated on DATE by COMMAND". Grr...
Strictly, that is different. He was against parser directives. And offered a solution for the case I gave.
Quite frankly, I think this is an area he made the wrong choice on. Which is fine, but still annoying. Like checked exceptions. Logic for the choice was sound, if misguided. End result sucks.
This comment has generated a lot of interest, and rightly so. The thing to remember is that Ren (and JSON) are data exchange formats, not in-memory representations. If you need more control than their numeric syntax supports, metadata is your friend. Just include that where needed. That doesn't mean every endpoint will honor or treat them the same, of course.
They are different data types, but integer is also ambiguous, did you mean signed or unsigned, 32 or 64 bit.
Its about your use case, and when your parsing something with loose typing into stricter types, make sure you validate accordingly. But being able to specify the exact type kind of defeats the purpose of an interchangeable data format.
Not really; those int types differ in range, but while that matters for a schema format, it doesn't really for an interchange format.
OTOH, exact decimals (of which integers are a subset) differ fundamentally in meaning from limited precision binary (or decimal, though that's more rarely encountered) floating point approximations. Which matters in interchange as well as schema.
Yes, the specific type of integer and floating-point can vary, but how operators work and display formatting should be consistent. For example, Modulo should be integer-specific, and the concept of precision only applies to floating-point numbers.
Now 90%+ of the programming language communities have a bar to implement your specification, because they have to emit a weird type out of the deserializations that will break code in weird ways, and in the case of static languages, you may have created a deserialization format that always emits BigInts (several words of memory and possible a mandatory indirection based on implementation) and will thus be unable to compete on the benchmarks against the serialization formats that specified integer sizes. And your serialization format is that much closer to withering on the vine with no usage.
Or you can go the JSON route, mumble your way through the numbers spec, let every language do its own thing, and tell the handful of people seriously interested in moving integers too large to be precisely specified by a 64-bit float to encode as strings or something and stop worrying everyone else with the complexity....
I disagree. For 99% of applications, a number is a number is a number. Integers and floating points are a leaky abstraction, I expect the language/compiler to handle these.
The ideal solution is to parse all numbers as a decimal or rational.
(one of my favorite things about Perl 6 is that non-integer literals are rationals by default; floating-point is only used if the literal is specified in scientific notation)
Try passing around 64bit integers in json. The majority of json implementations only implement double float numbers and will mangle your 64bit ints. The usual solution is passing int64s as strings, nasty.
Yes, the JavaScript JSON parser is broken (in design), in that it silently loses data/precision of JSON numbers. JSON numbers are not "int64s" or "double floats", they are arbitrary-precision decimal values. [1]
Postgres is the only popular system with built-in JSON functionality that I know of to correctly round-trip JSON numeric data. Python comes close but fails for any number with a decimal point or in E-notation.
One can argue that the JSON spec is too permissive (and I would disagree, though I'm somewhat a purist). Or a pedant could note that the JSON spec doesn't actually say whether any of the digits in a number are considered to carry information (and does in fact note that many parsers are faulty). But it's unfortunately true that most popular JSON parsers fail to round-trip valid JSON data due to flawed design.
> Or a pedant could note that the JSON spec doesn't actually say whether any of the digits in a number are considered to carry information
RFC 7159 not only specifies that all of the digits carry information, it specified exactly what information they carry.
However, it also expressly permits implementations to limit the range and precision of numbers accepted, recommending (but not requiring) range and precision at least equivalent to IEEE 754 float64 be supported.
> RFC 7159 not only specifies that all of the digits carry information, it specified exactly what information they carry.
Can you quote? I don't see where it does, except by reference to common knowledge.
But I should clarify. There are several inefficiencies in JSON numeric representation which may or may not be considered significant by an application. The ones I can think of, in order from "obviously not" to "well, maybe":
1. the case of "e" vs. "E"
2. the optional "+" sign after "e" or "E"
3. presence/absence of decimal point and/or E-notation (shouldn't make a difference, but does in many parsers, such as Python's)
4. the value of the exponent itself (e.g. 3.14 vs. 314e-2)
5. "-" sign in front of any number with a value of 0 (IEEE floats and ones-complement integers do have a negative zero)
6. excess trailing 0 digits after the decimal place (may be used to represent significant figures in scientific applications)
7. digits of lesser significance (obviously the most contentious)
JavaScript ignores all but #5. Python ignores all but #3, and due to #3, sometimes #5 and #7. PostgreSQL `json` ignores none; `jsonb` ignores all but #6 and #7. Personally I would draw the line between #4 and #5. But neither the RFC nor the ECMA spec tell us.
That's not a JSON problem, that's a problem with whatever language you're using to consume the JSON. Back to my point, you as a developer shouldn't have to worry about this at all. You should be able to add numbers to JSON and retrieve them without it harassing you. To a human, 123, 0.123, 12345789347590837452987543758973459734 are all the same "type". Why is it that you can save your int64s as strings but not as numbers? It's a stupid leak on the part of the language if you have to use this workaround.
> Why is it that you can save your int64s as strings but not as numbers? It's a stupid leak on the part of the language if you have to use this workaround.
Javascript, for example, has this problem because everything is just a "number" and the number data type is a IEEE-754 double precision float. A double precision float can't represent all int64 values, so transmitting an int64 via JSON to Javascript and using the standard JSON parser is lossy. You could say that Javascript screwed up by making number have a specific precision, and you might be right. Python, for example, gives you seamless arbitrary precision arithmetic.
But the decision to use an arbitrary precision type for all numbers in every language is definitely not appropriate. Arbitrary precision has trade offs, and sometimes it's important to work with machine integers.
The correct solution is to only convert a JSON value to an integer (or string, or array, or dictionary/object) when the application explicitly requests it. 99% of the time, 64-bit integers are not used by a JS app directly, but rather are passed right back to the backend they came from. If they were never converted to JS numbers in the first place, there can be no precision loss.
"Deep embedding" of JSON data into the host language is the cause of this issue. Recognition that JSON is a separate data type and does not always have an obvious encoding in the native language (including JSON arrays, objects, and null, even if they look like native arrays, objects, and null when you squint at them) is key.
The only widespread JSON implementation I know of to get this right is Postgres's: JSON values live as the "json" or "jsonb" type until you explicitly convert them. No data loss is incurred, with the exception that "jsonb" normalizes E-notation, conflates 0 and -0, drops duplicate object keys, and normalizes their order. Even numeric significant 0 digits are preserved.
The syntax itself reminds me a lot of Lisp (Scheme, specifically) but this is probably on purpose. I wonder if it can be used on Lisps the same way JSON is used in JS.
When I saw the headline my first reaction was to think "hah I bet it's called Ren because it's a data-exchange format for people", then I saw 仁 and thought - ah that's a much better choice.
Thanks for the input. The idiograms were found based on the Ren name and meaning, but I'm not a native Chinese speaker, so will defer to those who know, and make things clear.
Without a clear choice (as an English speaker) as to which might be more exact in translation, I think I pulled 仁 in because it also looks like the opening of a block, which is fundamental in Ren.
Sarcastically cute. It appears very parochial (in the sense of "limited or narrow outlook") to think that having a single "currency type" is useful, or that using a "$" to denote it is appropriate. While "$" is used by at least one important world currency, I am pretty sure that more people use ¥ than $ by a long shot, and users of € are about as plentiful as people who spend in USD.
(The idea that in Rebol one might write EUR$1.00 to denote the value that would usually be written 1.00€ is also pretty horrible)
1) Do you agree that a notation should support currency values? That is, they are useful to identify in data, as atomic units?
2) If so, do you agree there should be a single standard symbol and lexical form used to identify them? Because if we don't do that, we have to support every localized notation, correct?
3) If you agree to both of those questions, what symbol do you suggest? ¤ is generic, but not on any keyboard layout I know of. Also, see Chris's note about ASCII priority.
No, I can honestly say I've never worked on a software project that dealt directly with currency values. Personally, in the kinds of projects I do, explicit support for units of measurement would be more beneficial: you'd like to use the type system to detect where units from different systems are mixed (e.g., km+mi) and behave appropriately by performing a conversion; or to detect where inappropriate units are mixed (e.g., kg+hz) and signal an error (at project build time if possible!)
With that background in mind, I imagine the scenario here you have two data in a Ren document which are both the literal $1.00, but one is actually 1.00USD and the other is actually 1.00EUR: it doesn't prevent errors (for instance, when you want to perform an operation like + on the two data), because you still don't know what the data means. You have gained very little over just using the literal 1.00 instead.
So if I were making a proposal I'd be tempted to suggest a syntax like [1.00 USD], and maybe even giving up one of the remaining sigils ^[1.00 USD] if it is important to raise to being a special element in the syntax of a Ren file. Now that you're saying what you mean, you can use the same syntax for all units: ^[1.00 kg m -2] (1 kilogram per square meter), ^[1.00 V hz -.5] (1 volt per square-root-hertz, a typical units specification of noise in opamps).
No need for special syntax. ^ is already the escape character. Just use blocks. Though a `unit` syntax has been brought up. The notation would use path syntax, but start with an number instead of a word. Frink, of course, is the king of languages in this regard.
And while this may be more beneficial in your work, a lot of software does have to deal with money, where it's important not to use floating point, but BCD or something else.
Yes, I tried to be careful in what I stated, but here's what I meant: "$" is the everyday currency symbol of far fewer people than "¥". "USD" is the everyday currency of only about as many people as "EUR"
The Ren site (ren-data.org) redirects there, is quite old now, and was a playground and experimental area. Hence the empty links and such.
In addition to Bolek's Humanistic repo, I had set up https://github.com/Ren-data/Ren to discuss ideas. Ren is, effectively, Redbol (Red+Rebol). One of the initial goals was to define a subset of values and normalize the syntax (Rebol never formalized its format spec), which could be shared across Redbol langs as they evolved and went in different directions. And also as a bridge for loaders in other languages. JSON has taught us that a small spec is important. The balance between simplicity and expressive value types is key.
It may come back to life at some point, but my time was better spent elsewhere for a while. I'm focused on Red now (red-lang.org). It has a native bridge feature for embedding in other langs (https://doc.red-lang.org/en/libred.html), along with a lot more. I believe there is still value in formalizing the grammar so others can create their own implementations, but it's not a priority at this time. In the meantime, you can find the active Red community at https://gitter.im/red/red to get more information and examples of what it looks like in use.
Cheers.