Hive: Add tests for HiveCatalog no-arg constructor, setConf, and initialize#3252
Conversation
| catalog.initialize("test", Collections.<String, String>emptyMap()); | ||
|
|
||
| Mockito.verify(catalog, Mockito.times(1)).makeCachedClientPool(any(), any()); | ||
| Assert.assertEquals(catalog.name(), "test"); |
There was a problem hiding this comment.
What are you trying to test? It looks like this only tests that a cached client pool is made. I guess that demonstrates that the initialization completed successfully?
I think what you want to test is that creating a HiveCatalog and initializing it without calling setConf will work. Isn't that the case that broke for users? If so, then I'm surprised that setConf is called in this test.
There was a problem hiding this comment.
This particular test was just to add some coverage generally to the initialize method but you're right, I am missing coverage for the actual case from that thread which was really the toString() method so I'll add a test for that as well. Playing around with the 0.12.x branch and the master branch (which has the NPE fix in), the root problem seems that toString() hits a NPE before setConf() is called. In that thread the user was in a scala repl so toString() was getting called and the following fails in 0.12.x but works after the NPE fix.
HiveCatalog catalog = new HiveCatalog();
catalog.toString() // called by the scala repl, runs into NPE in 0.12.xI thought maybe the user could do a one-liner to get the conf set before the repl tries to print but setConf() doesn't return this. (Should it return this?)
There was a problem hiding this comment.
Just to be sure, I packaged the following scala code which ran without issue in 0.12.x
import org.apache.iceberg.hive.HiveCatalog;
import org.apache.spark.sql.SparkSession;
object Simple {
def main(args: Array[String]){
val spark = SparkSession.builder.appName("Simple").getOrCreate()
val catalog = new HiveCatalog()
catalog.setConf(spark.sparkContext.hadoopConfiguration)
}
}so the issue in 0.12.x is just contained to any kind of REPL use or if the user explicitly calls toString() in their code for some reason.
There was a problem hiding this comment.
If your aim is just to make sure that catalog initialization succeeds when getConf isn't called or when getConf and initialize are called in an unexpected order, I'd recommend doing that more directly. Adding mocks is a lot of work (and future maintenance) just to make sure makeCachedClientPool is called.
Since this suite already has a Hive Metastore set up and a test that creates a table, I think it makes more sense to go through different initialization sequences and verify that the catalog still works. Something like this:
- Expected sequence:
new HiveCatalog(),setConf, and theninitialize - Without config:
new HiveCatalog()andinitialize, whereuriandwarehouseare passed in the property map - Without config:
new HiveCatalog()andinitialize, whereuriandwarehouseare not passed (connect error?) setConflast:new HiveCatalog(),initialize, and thensetConf: should this work? I think that it should reject calls tosetConfafter initialization.
There was a problem hiding this comment.
Great, thanks for the guidance here! Agreed the mocks come with more overhead than needed. Playing with this some more it seems that initialize can't work without a conf so scenario 1. might be the only acceptable sequence, even if uri and warehouse are provided. What are your thoughts on adding a runtime error in initialize if conf is null?
if (this.conf == null){
throw new RuntimeException("Missing configuration, a configuration must be set using setConf before initializing a HiveCatalog");
}Then the tests can be as simple as checking that initialize succeeds with a conf and raises an exception without it.
There was a problem hiding this comment.
We could log a warning and lazily initialize conf with new Configuration. What do you think?
There was a problem hiding this comment.
Makes sense! (Also makes for better quickstart docs 😁)
| Map<String, String> properties = new HashMap<>(); | ||
| HiveCatalog catalog = new HiveCatalog(); | ||
| catalog.setConf(conf); | ||
| catalog.initialize("hive", properties); |
There was a problem hiding this comment.
Nit - I understand the point of these tests, but it’s a bit strange to me seeing a test with no asssrtions.
I believe there are some JUnit functions or Assertions4J functions for asserting that something doesn’t throw. Maybe those would enhance readability? For people familiar with JUnit, arguably the lack of a @throws(…) tag is sufficient, but something to consider 🙂
There was a problem hiding this comment.
Great point, I can test name and maybe the type of the conf attribute (starting to feel like I'm just testing setters and getters here, hehe). I'm going to focus these in a bit more to clearly test for:
- Initializing properly when a config is set
- Throwing properly when initializing without a config (lmk if a scenario exists where this shouldn't raise!)
- Not throwing when toString() is used before a config is set (the REPL case)
| HiveCatalog catalog = new HiveCatalog(); | ||
|
|
||
| Throwable missingConfException = Assert.assertThrows(RuntimeException.class, () -> { | ||
| catalog.initialize("hive", properties); |
There was a problem hiding this comment.
There is an existing utility class, AssertHelpers iirc, that would allow you to combine this check with the following one that asserts on the message into one condensed call. It’s used quite liberally throughout the project. I’d definitely advise you take a look and see if this can be simplified via the existing utilities 🙂
There was a problem hiding this comment.
Perfect, I'll use that, thanks!
|
Ok updated the PR with one more go at it. I cleaned it up to just
For |
| @Override | ||
| public void initialize(String inputName, Map<String, String> properties) { | ||
| this.name = inputName; | ||
| if (this.conf == null){ |
| @Test | ||
| public void testInitialize() { | ||
| Assertions.assertDoesNotThrow(() -> { | ||
| Map<String, String> properties = new HashMap<>(); |
There was a problem hiding this comment.
nit: Maybe put this inside the call as the value is not used.
Also we usually suggest to use Maps.newHashMap()
| @Override | ||
| public void initialize(String inputName, Map<String, String> properties) { | ||
| this.name = inputName; | ||
| if (this.conf == null){ |
There was a problem hiding this comment.
Style: No need for this. when reading an instance variable. We use this. when assigning to an instance variable so that it is obvious that there is a state change, but we don't do the same when reading state.
| public void initialize(String inputName, Map<String, String> properties) { | ||
| this.name = inputName; | ||
| if (this.conf == null){ | ||
| LOG.warn("No configuration was set, using a default configuration"); |
There was a problem hiding this comment.
I think we should call out that this is a Hadoop Configuration. Otherwise, this warning is potentially misleading.
How about changing this to "No Hadoop Configuration was set, using the default environment Configuration"
…fore a conf is set
|
Looks good! I'll merge when tests are passing. |
|
Thanks, @samredai! Merged. |
This PR is tied to issue #3251 and just adds a few tests around constructing a HiveCatalog. A fix was added recently that handles a NPE when using the no-arg HiveCatalog constructor and these tests may help catch this in the future.