Adds std.parseYaml #339

groodt · 2019-11-24T08:33:50Z

Adds std.parseYaml to address the YAML aspect of: google/jsonnet#460

CPP jsonnet implemented here: google/jsonnet#888

googlebot · 2019-11-24T08:33:54Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

coveralls · 2019-11-24T08:39:01Z

Coverage decreased (-0.2%) to 77.818% when pulling f4d034e on groodt:groodt-parseYaml into 0959f85 on google:master.

coveralls · 2019-11-24T08:39:01Z

Coverage decreased (-0.06%) to 81.715% when pulling 76fd830 on groodt:groodt-parseYaml into 7d81091 on google:master.

groodt · 2019-11-24T08:53:33Z

@googlebot I signed it!

googlebot · 2019-11-24T08:53:36Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

sbarzowski · 2019-11-24T09:55:38Z

Thanks!

I imagine this pulls in too many dependencies

It certainly seems so. Probably using go-yaml directly instead of k8s.io/apimachinery/pkg/util/yaml would help - which seems to be a wrapper around a wrapper (https://github.com./kubernetes-sigs/yaml) around go-yaml. We may also look into using https://github.com./kubernetes-sigs/yaml if that makes sense. But generally every additional level of indirection increases the chance of running into some unexpected magic.

I think the cpp-jsonnet implementation might be tricky. Would you vendor https://github.com./jbeder/yaml-cpp?

It's certainly harder than on the Go side. There are w few libraries for C++ available and yaml-cpp seems to be a pretty reasonable choice. But I have some reservations about vendoring the whole thing (as in, putting the whole thing in our source tree). And I also have reservations about requiring the existing users to provide a shared library dependency. I'm leaning towards a submodule and an option for CMake "USE_SYSTEM_YAML_CPP". It will require some discussion for sure.

groodt · 2019-11-30T03:19:38Z

@sbarzowski I've switched to https://github.com./kubernetes-sigs/yaml. It brings fewer dependencies along as you suggested.

How would you like the code structured? Into a yaml.go file?

sparkprime · 2019-12-03T14:18:20Z

Do we know if either of the two proposed implementations are fully compliant with the YAML standard?

sbarzowski · 2019-12-03T15:16:47Z

@sparkprime I don't think there exists such thing as "fully compliant YAML implementation". The underlying library in any case is https://github.com./go-yaml/yaml which is the primary implementation in Go which in turn should be reasonably compatible with libyaml.

The additional layers here are not about parsing YAML per se, but mapping between JSON and YAML (see: https://github.com./kubernetes-sigs/yaml/blob/master/yaml.go#L166). I am not aware of any standard for that other than common sense. We may want to standardize it on our side, though.

sbarzowski · 2019-12-03T15:19:37Z

How would you like the code structured? Into a yaml.go file?

Hmmm... sounds good. The builtinParseYAML function should stay in builtins and the helpers could go into yaml.go, that sounds good. We may consider moving other YAML-handling stuff (i.e. serialization there too).

sbarzowski · 2019-12-03T15:24:50Z

Re failing CI:

I think you can fix Bazel by running the command from here: https://github.com./google/go-jsonnet#keeping-the-bazel-files-up-to-date
And as for Go 1.8 I think it's fine to drop support for it - it's been unsupported for a long time now, I've already almost done that once for a different reason.

sh0rez · 2019-12-03T15:42:53Z

A good library for decoding yaml using the json unmarshaller (this is what kubernetes does) is https://github.com./ghodss/yaml

sbarzowski · 2019-12-03T16:03:13Z

If I understand the code correctly, Kubernetes actually uses https://github.com./kubernetes-sigs/yaml (same as we have in this change) which is a fork of https://github.com./ghodss/yaml

groodt · 2019-12-03T21:05:37Z

If I understand the code correctly, Kubernetes actually uses https://github.com./kubernetes-sigs/yaml (same as we have in this change) which is a fork of https://github.com./ghodss/yaml

This is correct I believe. Essentially, behind all the indirection and "Readers", the only useful piece of functionality in use here is this:
https://github.com./kubernetes-sigs/yaml/blob/master/yaml.go#L142

Perhaps future refactoring could even be directly implemented with go-yaml. kubernetes-sigs/yaml doesn't bring much baggage with it, so may not be worth it. Essentially any solution needs to read a YAML file as bytes or string, optionally split at the YAML stream separator (---), unmarshal these individual YAML documents to JSON and accumulate them into an array.

sparkprime · 2019-12-04T13:26:57Z

Fun fact - ghodss was one of the first major Jsonnet adopters about 5 years ago.

re: YAML that is not valid JSON -- my first instinct is to reject it with an error, until we decide to broaden the Jsonnet data model to include such values (in which case we would error only if they were attempted to manifest to JSON).

So some custom logic that is very conservative and converts from "parsed YAML" to "parsed JSON or error" sounds good.

This also helps control divergence between platforms -- keep it simple, etc.

groodt · 2020-08-13T12:34:05Z

For a hypothetical CPP version of parseYAML, would integrating with Rapid YAML be acceptable?

It seems relatively easy to slot into the existing CMAKE as a git submodule? https://github.com./biojppm/rapidyaml#using-ryml-as-cmake-subproject

sbarzowski · 2020-08-13T17:08:49Z

Yes, I think Rapid YAML would be quite a good choice. It is fast and seems to have a reasonable coverage of YAML.

We will need to declare compatibility for parseYAML as "best effort" anyway. It's not possible to cover all the boundary cases. The only alternative I see is to define a supported subset of YAML and validate that, which would be a pretty big project by itself.

In C++ we will need to support not only CMake, but also pure Makefile and Bazel.

tomwilkie · 2020-10-20T19:47:23Z

@sparkprime wondering if there is anything I can do to help unblock this? We're using lots of YAML in the monitoring mixins work, and right now tanka and mixtool have to add a native function.

(Also - hey! hows it going? been a while :- )

groodt · 2021-02-22T10:07:18Z

.gitmodules

@@ -1,3 +1,4 @@
 [submodule "cpp-jsonnet"]
 	path = cpp-jsonnet
-	url = https://github.com./google/jsonnet.git
+	url = https://github.com./groodt/jsonnet.git


For testing purposes. Can be reverted when CPP implementation is merged.

Let's leave this comment open until then. We should definitely revert to the official one before merging.

groodt · 2021-02-22T10:12:39Z

There is now broadly parity between the Go and CPP variants of std.parseYaml.

The CPP PR is ready for review here:
google/jsonnet#888

groodt · 2021-02-22T10:12:48Z

PTAL @sbarzowski

sbarzowski

Looks good. I'm happy with merging this once (a) C++ version is ready (b) we resolve the questions about streams.

sbarzowski · 2021-02-26T14:57:59Z

.gitmodules

@@ -1,3 +1,4 @@
 [submodule "cpp-jsonnet"]
 	path = cpp-jsonnet
-	url = https://github.com./google/jsonnet.git
+	url = https://github.com./groodt/jsonnet.git


Let's leave this comment open until then. We should definitely revert to the official one before merging.

sbarzowski · 2021-02-26T14:59:20Z

builtins.go

+	}
+	s := sval.getGoString()
+
+	isYamlStream := strings.Contains(s, "---")


I wonder if we can get bitten by some boundary case here. Can it be a part of a multi-line comment? How other implementations handle that?

The shotgun parsing is a bit gross, I agree. Anyone adding those characters to a YAML document will be in for a difficult time if they aren't escaping it thoroughly. The .Contains check can probably be made more explicit to check for start of line or end of line characters if that makes you feel more comfortable?

Alternatively, we can adjust the user facing API. The only reason for "sniffing" whether a document is a "yaml stream" or not is to be perhaps more user-friendly by converting a single YAML document to an object {} and a stream of YAML documents into an array []. An alternative user API could be to require a user to choose. e.g. parseYaml that returns a single object {} and parseYamlStream which returns an array.

I see. I think you could achieve the same effect in a cleaner way just by checking len(elems). The elems will only have multiple elements when there are multiple documents. If it's an array, it will still be just one document. Is that correct?

An alternative user API could be to require a user to choose. e.g. parseYaml that returns a single object {} and parseYamlStream which returns an array.

That could work. It's more complicated, but cleaner in a way. I don't have a strong opinion. We can also add parseYamlStream later (which forces returning of an array, even if there is only one document).

I see. I think you could achieve the same effect in a cleaner way just by checking len(elems).

How do we differentiate between a single YAML document with a scalar list vs a YAML stream of documents?

YAML, like JSON, can have a top level sequence or array. Some examples.

// Single doc, scalar array at root std.parseYaml( ||| - {a: 1, b: 2} - {a: 3, b: 4} - {a: 5, b: 6} ||| ) [ { "a": 1, "b": 2 }, { "a": 3, "b": 4 }, { "a": 5, "b": 6 } ]

// Mutli doc YAML stream with document start separators. std.parseYaml( ||| --- {a: 1, b: 2} --- {a: 3, b: 4} --- {a: 5, b: 6} ||| ) [ { "a": 1, "b": 2 }, { "a": 3, "b": 4 }, { "a": 5, "b": 6 } ]

// Single doc, scalar array at root. Indentation instead of { std.parseYaml( ||| - a: 1 b: 2 - a: 3 b: 4 - a: 5 b: 6 ||| ) [ { "a": 1, "b": 2 }, { "a": 3, "b": 4 }, { "a": 5, "b": 6 } ]

Since there can be scalar arrays at the root, checking the len(elems) isn't enough because it isn't possible to tell the difference between a yaml stream with 1 document len(elems) == 1 and a single document with a top-level scalar array of length 1.

The approach used in the current implementation assumes that if you specify a YAML stream using --- you know you will receive an array of 1 or more documents. If you do not specify a YAML stream, you will get whatever YAML object you specified, either a Map or a Seq of length, 0, 1 or more elements.

YAML is complicated! :)

Since there can be scalar arrays at the root, checking the len(elems) isn't enough because it isn't possible to tell the difference between a yaml stream with 1 document len(elems) == 1 and a single document with a top-level scalar array of length 1.

Please correct me if I misunderstand something. I assumed that d.Decode(&elem) reads one document. So len(elems) will be the number of documents in a stream. If it's only one document (even if it's an array) it would read the whole thing. Is that correct?

Is your point that:

--- foo: bar

and

foo: bar

both produce {foo: "bar"}, but the user might have expected the array in the first case? Hnnnn... I think it's a valid concern, because when processing streams, it makes it hard to handle special case of 1-element stream.

Sniffing for --- still seems pretty bad (even if we made sure that it's a line on its own). First, I'm afraid of special cases. Second, IIUC formally it's not a magic symbol for streams, but an explicit start of the document.

Perhaps an explicit parseYamlStream which always returns the array is the way out of this.

@sparkprime @sh0rez Any thoughts about handling YAML streams?

both produce {foo: "bar"}, but the user might have expected the array in the first case? Hnnnn... I think it's a valid concern, because when processing streams, it makes it hard to handle special case of 1-element stream.

Yes, that's my concern. You are correct in that it attempts to read a single document, but should we treat a 1-element stream differently or not is the question? Should the output of the following 2 examples be different or the same?

std.parseYaml( ||| --- foo: bar ||| )

std.parseYaml( ||| foo: bar ||| )

What about this one?

std.parseYaml( ||| --- foo: bar --- wibble: wobble ||| )

std.parseYaml( ||| - foo: bar - wibble: wobble ||| )

I think I can be convinced either way to be honest.

Init parseYaml.

f4d034e

googlebot added the cla: no label Nov 24, 2019

googlebot added cla: yes and removed cla: no labels Nov 24, 2019

Tabs.

21e3b5c

groodt added 3 commits November 30, 2019 14:02

Use kubernetes-sigs/yaml

fc0c0b0

Use kubernetes-sigs/yaml

630dc4a

Use kubernetes-sigs/yaml

a2db0ae

yaml.go

47ade21

groodt changed the title ~~Adds prototype std.parseYaml~~ Adds std.parseYaml Dec 3, 2019

groodt changed the title ~~Adds std.parseYaml~~ Adds std.parseYaml Dec 3, 2019

groodt added 2 commits March 18, 2020 07:57

Merge branch 'master' into groodt-parseYaml

ce7efb2

Updated modules.

973d881

sparkprime mentioned this pull request Apr 2, 2020

Importing external YAML files google/jsonnet#790

Closed

Merge branch 'master' into groodt-parseYaml

e5872ce

groodt added 4 commits August 13, 2020 21:46

Fix missed conflicts.

aacec92

Fix missed conflicts.

2384297

Fix missed conflicts.

73e417e

Fix missed conflicts.

5648104

sh0rez mentioned this pull request Aug 15, 2020

feat: Import post-processing #420

Open

sbarzowski mentioned this pull request Jan 8, 2021

std.parseYaml google/jsonnet#460

Closed

groodt mentioned this pull request Jan 26, 2021

Adds std.parseYaml google/jsonnet#888

Closed

groodt added 3 commits February 22, 2021 20:04

Merge branch 'master' into groodt-parseYaml

a5ea27a

Fix go compilation.

561f9a1

CPP parity.

713877e

groodt commented Feb 22, 2021

View reviewed changes

go fmt

f2a5e25

groodt added 4 commits February 22, 2021 21:17

golint

87616db

go mod tidy

aa08bfb

Update submodules

482e950

Remove EOL Go 1.11 and 1.12 from CI

76fd830

sbarzowski approved these changes Feb 26, 2021

View reviewed changes

olorin37 mentioned this pull request Apr 12, 2021

RFE: parseYAML databricks/sjsonnet#91

Closed

sbarzowski added 4 commits May 20, 2021 13:35

Revert submodule change back to upstream

40787e1

Merge branch 'master' into groodt-parseYaml

5ae48fb

Fix the bad merge

8c1f0d7

Deps fix

e109647

sbarzowski merged commit e6a9581 into google:master May 20, 2021

JeppeKlitgaard mentioned this pull request May 29, 2021

[feature] Computed Configuration files espanso/espanso#679

Closed

CertainLach mentioned this pull request Jul 12, 2021

Implement std.parseYaml CertainLach/jrsonnet#51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds std.parseYaml #339

Adds std.parseYaml #339

groodt commented Nov 24, 2019 •

edited

Loading

googlebot commented Nov 24, 2019

coveralls commented Nov 24, 2019

coveralls commented Nov 24, 2019 •

edited

Loading

groodt commented Nov 24, 2019

googlebot commented Nov 24, 2019

sbarzowski commented Nov 24, 2019

groodt commented Nov 30, 2019 •

edited

Loading

sparkprime commented Dec 3, 2019

sbarzowski commented Dec 3, 2019

sbarzowski commented Dec 3, 2019

sbarzowski commented Dec 3, 2019 •

edited

Loading

sh0rez commented Dec 3, 2019

sbarzowski commented Dec 3, 2019 •

edited

Loading

groodt commented Dec 3, 2019 •

edited

Loading

sparkprime commented Dec 4, 2019

groodt commented Aug 13, 2020

sbarzowski commented Aug 13, 2020 •

edited

Loading

tomwilkie commented Oct 20, 2020

groodt Feb 22, 2021

sbarzowski Feb 26, 2021

groodt commented Feb 22, 2021

groodt commented Feb 22, 2021

sbarzowski left a comment

sbarzowski Feb 26, 2021

sbarzowski Feb 26, 2021

groodt Mar 2, 2021 •

edited

Loading

sbarzowski Mar 3, 2021

groodt Mar 4, 2021

sbarzowski Mar 4, 2021

groodt Mar 4, 2021

Adds std.parseYaml #339

Adds std.parseYaml #339

Conversation

groodt commented Nov 24, 2019 • edited Loading

googlebot commented Nov 24, 2019

What to do if you already signed the CLA

Individual signers

Corporate signers

coveralls commented Nov 24, 2019

coveralls commented Nov 24, 2019 • edited Loading

groodt commented Nov 24, 2019

googlebot commented Nov 24, 2019

sbarzowski commented Nov 24, 2019

groodt commented Nov 30, 2019 • edited Loading

sparkprime commented Dec 3, 2019

sbarzowski commented Dec 3, 2019

sbarzowski commented Dec 3, 2019

sbarzowski commented Dec 3, 2019 • edited Loading

sh0rez commented Dec 3, 2019

sbarzowski commented Dec 3, 2019 • edited Loading

groodt commented Dec 3, 2019 • edited Loading

sparkprime commented Dec 4, 2019

groodt commented Aug 13, 2020

sbarzowski commented Aug 13, 2020 • edited Loading

tomwilkie commented Oct 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

groodt commented Feb 22, 2021

groodt commented Feb 22, 2021

sbarzowski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

groodt Mar 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

groodt commented Nov 24, 2019 •

edited

Loading

coveralls commented Nov 24, 2019 •

edited

Loading

groodt commented Nov 30, 2019 •

edited

Loading

sbarzowski commented Dec 3, 2019 •

edited

Loading

sbarzowski commented Dec 3, 2019 •

edited

Loading

groodt commented Dec 3, 2019 •

edited

Loading

sbarzowski commented Aug 13, 2020 •

edited

Loading

groodt Mar 2, 2021 •

edited

Loading