Skip to content

Hive: Add tests for HiveCatalog no-arg constructor, setConf, and initialize#3252

Merged
rdblue merged 1 commit into
apache:masterfrom
samredai:hivecatalogtests
Oct 15, 2021
Merged

Hive: Add tests for HiveCatalog no-arg constructor, setConf, and initialize#3252
rdblue merged 1 commit into
apache:masterfrom
samredai:hivecatalogtests

Conversation

@samredai

@samredai samredai commented Oct 8, 2021

Copy link
Copy Markdown
Contributor

This PR is tied to issue #3251 and just adds a few tests around constructing a HiveCatalog. A fix was added recently that handles a NPE when using the no-arg HiveCatalog constructor and these tests may help catch this in the future.

@github-actions github-actions Bot added the hive label Oct 8, 2021
catalog.initialize("test", Collections.<String, String>emptyMap());

Mockito.verify(catalog, Mockito.times(1)).makeCachedClientPool(any(), any());
Assert.assertEquals(catalog.name(), "test");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you trying to test? It looks like this only tests that a cached client pool is made. I guess that demonstrates that the initialization completed successfully?

I think what you want to test is that creating a HiveCatalog and initializing it without calling setConf will work. Isn't that the case that broke for users? If so, then I'm surprised that setConf is called in this test.

@samredai samredai Oct 8, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular test was just to add some coverage generally to the initialize method but you're right, I am missing coverage for the actual case from that thread which was really the toString() method so I'll add a test for that as well. Playing around with the 0.12.x branch and the master branch (which has the NPE fix in), the root problem seems that toString() hits a NPE before setConf() is called. In that thread the user was in a scala repl so toString() was getting called and the following fails in 0.12.x but works after the NPE fix.

HiveCatalog catalog = new HiveCatalog();
catalog.toString()  // called by the scala repl, runs into NPE in 0.12.x

I thought maybe the user could do a one-liner to get the conf set before the repl tries to print but setConf() doesn't return this. (Should it return this?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, I packaged the following scala code which ran without issue in 0.12.x

import org.apache.iceberg.hive.HiveCatalog;
import org.apache.spark.sql.SparkSession;

object Simple {
  def main(args: Array[String]){
    val spark = SparkSession.builder.appName("Simple").getOrCreate()
    val catalog = new HiveCatalog()
    catalog.setConf(spark.sparkContext.hadoopConfiguration)
  }
}

so the issue in 0.12.x is just contained to any kind of REPL use or if the user explicitly calls toString() in their code for some reason.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your aim is just to make sure that catalog initialization succeeds when getConf isn't called or when getConf and initialize are called in an unexpected order, I'd recommend doing that more directly. Adding mocks is a lot of work (and future maintenance) just to make sure makeCachedClientPool is called.

Since this suite already has a Hive Metastore set up and a test that creates a table, I think it makes more sense to go through different initialization sequences and verify that the catalog still works. Something like this:

  1. Expected sequence: new HiveCatalog(), setConf, and then initialize
  2. Without config: new HiveCatalog() and initialize, where uri and warehouse are passed in the property map
  3. Without config: new HiveCatalog() and initialize, where uri and warehouse are not passed (connect error?)
  4. setConf last: new HiveCatalog(), initialize, and then setConf: should this work? I think that it should reject calls to setConf after initialization.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the guidance here! Agreed the mocks come with more overhead than needed. Playing with this some more it seems that initialize can't work without a conf so scenario 1. might be the only acceptable sequence, even if uri and warehouse are provided. What are your thoughts on adding a runtime error in initialize if conf is null?

if (this.conf == null){
  throw new RuntimeException("Missing configuration, a configuration must be set using setConf before initializing a HiveCatalog");
}

Then the tests can be as simple as checking that initialize succeeds with a conf and raises an exception without it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could log a warning and lazily initialize conf with new Configuration. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! (Also makes for better quickstart docs 😁)

Map<String, String> properties = new HashMap<>();
HiveCatalog catalog = new HiveCatalog();
catalog.setConf(conf);
catalog.initialize("hive", properties);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - I understand the point of these tests, but it’s a bit strange to me seeing a test with no asssrtions.

I believe there are some JUnit functions or Assertions4J functions for asserting that something doesn’t throw. Maybe those would enhance readability? For people familiar with JUnit, arguably the lack of a @throws(…) tag is sufficient, but something to consider 🙂

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, I can test name and maybe the type of the conf attribute (starting to feel like I'm just testing setters and getters here, hehe). I'm going to focus these in a bit more to clearly test for:

  1. Initializing properly when a config is set
  2. Throwing properly when initializing without a config (lmk if a scenario exists where this shouldn't raise!)
  3. Not throwing when toString() is used before a config is set (the REPL case)

HiveCatalog catalog = new HiveCatalog();

Throwable missingConfException = Assert.assertThrows(RuntimeException.class, () -> {
catalog.initialize("hive", properties);

@kbendick kbendick Oct 9, 2021

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an existing utility class, AssertHelpers iirc, that would allow you to combine this check with the following one that asserts on the message into one condensed call. It’s used quite liberally throughout the project. I’d definitely advise you take a look and see if this can be simplified via the existing utilities 🙂

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, I'll use that, thanks!

@samredai

Copy link
Copy Markdown
Contributor Author

Ok updated the PR with one more go at it. I cleaned it up to just warn when initializing without a config and then using a new Configuration() to initialize with. Then there's just 3 tests that check that:

  1. Initializing without a config does not raise
  2. toString does not raise if ran before setConf
  3. Properties passed to initialize are properly set even if the user did not setConf() explicitly

For 1. and 2. I used assertDoesNotThrow so the intention is clear (thanks @kbendick )

@Override
public void initialize(String inputName, Map<String, String> properties) {
this.name = inputName;
if (this.conf == null){

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after (

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed! ✅

@Test
public void testInitialize() {
Assertions.assertDoesNotThrow(() -> {
Map<String, String> properties = new HashMap<>();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe put this inside the call as the value is not used.
Also we usually suggest to use Maps.newHashMap()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed! ✅

@Override
public void initialize(String inputName, Map<String, String> properties) {
this.name = inputName;
if (this.conf == null){

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: No need for this. when reading an instance variable. We use this. when assigning to an instance variable so that it is obvious that there is a state change, but we don't do the same when reading state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed! ✅

public void initialize(String inputName, Map<String, String> properties) {
this.name = inputName;
if (this.conf == null){
LOG.warn("No configuration was set, using a default configuration");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call out that this is a Hadoop Configuration. Otherwise, this warning is potentially misleading.

How about changing this to "No Hadoop Configuration was set, using the default environment Configuration"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed! ✅

@rdblue rdblue left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some minor nits to fix, but overall I think this is about ready to go. Thanks, @samredai!

@rdblue

rdblue commented Oct 15, 2021

Copy link
Copy Markdown
Contributor

Looks good! I'll merge when tests are passing.

@rdblue rdblue merged commit e7e932e into apache:master Oct 15, 2021
@rdblue

rdblue commented Oct 15, 2021

Copy link
Copy Markdown
Contributor

Thanks, @samredai! Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants