Tomorrow_Farewell [any, they/them]

  • 4 Posts
  • 46 Comments
Joined 8 months ago
cake
Cake day: January 30th, 2024

help-circle





  • I don’t think you understand the effects of relevant blockages. This doesn’t just cut off people from entertainment and pro-western propaganda on Youtube. This stuff also prevents people from accessing stuff like libgen, pirate-jammin, getting extensions for some development environments, getting development-related software from repositories, and likely other stuff that I’m forgetting right now, not to mention the fact that Youtube also has plenty of educational material that is likely not replicated on VK or other Russian platforms.









  • Just in case, if I install the library the first way, for the same piece of code the logs start with this:

    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    root
     |-- a: long (nullable = true)
     |-- b: double (nullable = true)
     |-- c: string (nullable = true)
     |-- d: date (nullable = true)
     |-- e: timestamp (nullable = true)
    
    24/07/22 19:04:46 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
    org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
    	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:612)
    	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$1.applyOrElse(PythonRunner.scala:594)
    	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
    	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:789)
    	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:766)
    	at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:525)
    	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
    	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCo*removed*tage1.processNext(Unknown Source)
    	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
    	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
    	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
    	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
    	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
    	at org.apache.spark.scheduler.Task.run(Task.scala:141)
    	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
    	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
    	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
    	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    	at java.base/java.lang.Thread.run(Thread.java:842)
    Caused by: java.io.EOFException
    	at java.base/java.io.DataInputStream.readInt(DataInputStream.java:398)
    	at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774)
    	... 26 more
    



    • pip install pyspark and installing the latest version of Apache Spark leads to errors when calling pyspark.sql.DataFrame.show() methods of DataFrame objects.
    • pip install pyspark and installing an older version of Apache Spark, i.e. having a version mismatch between PySpark and Apache Spark, leads to errors even when instantiating a SparkSession.
    • pip install pyspark==3.3.4 previously led to an error - the system was unable to build wheels for the package. Now, it seems to install that way, but behaves the same as in the previous case.
    • Trying to build the 3.3.4 PySpark package manually with ./build/mvn using Bash from the appropriate directory led to Caused by: org.apache.maven.plugin.PluginExecutionException: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plu gin:4.4.0:compile failed.

    Running this code after having installed this stuff as in case 3:

    from pyspark.sql import SparkSession
    from datetime import datetime, date
    import pandas as pd
    from pyspark.sql import Row
    
    spark = SparkSession.builder.getOrCreate()
    
    df = spark.createDataFrame([
        Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
        Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
        Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
    ])
    df.printSchema()
    df.show()
    

    leads to this:

    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Traceback (most recent call last):
      File "[python file path]", line 6, in <module>
        spark = SparkSession.builder.getOrCreate()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "[python file path]", line 269, in getOrCreate
        sc = SparkContext.getOrCreate(sparkConf)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "[python file path]", line 483, in getOrCreate
        SparkContext(conf=conf or SparkConf())
      File "[python file path]", line 197, in __init__
        self._do_init(
      File "[python file path]", line 282, in _do_init
        self._jsc = jsc or self._initialize_context(self._conf._jconf)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "[python file path]", line 402, in _initialize_context
        return self._jvm.JavaSparkContext(jconf)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "[python file path]", line 1585, in __call__
        return_value = get_return_value(
                       ^^^^^^^^^^^^^^^^^
      File "[python file path]", line 326, in get_return_value
        raise Py4JJavaError(
    py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
    : java.lang.ExceptionInInitializerError
    	at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:56)
    	at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes$lzycompute(MemoryManager.scala:264)
    	at org.apache.spark.memory.MemoryManager.defaultPageSizeBytes(MemoryManager.scala:254)
    	at org.apache.spark.memory.MemoryManager.$anonfun$pageSizeBytes$1(MemoryManager.scala:273)
    	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
    	at scala.Option.getOrElse(Option.scala:189)
    	at org.apache.spark.memory.MemoryManager.<init>(MemoryManager.scala:273)
    	at org.apache.spark.memory.UnifiedMemoryManager.<init>(UnifiedMemoryManager.scala:58)
    	at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:207)
    	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:320)
    	at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
    	at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
    	at org.apache.spark.SparkContext.<init>(SparkContext.scala:464)
    	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
    	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
    	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    	at py4j.Gateway.invoke(Gateway.java:238)
    	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    	at java.base/java.lang.Thread.run(Thread.java:1570)
    Caused by: java.lang.IllegalStateException: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long,int)
    	at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:113)
    	... 25 more
    Caused by: java.lang.NoSuchMethodException: java.nio.DirectByteBuffer.<init>(long,int)
    	at java.base/java.lang.Class.getConstructor0(Class.java:3784)
    	at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2955)
    	at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:71)
    	... 25 more
    
    SUCCESS: The process with PID 21224 (child process of PID 9020) has been terminated.
    SUCCESS: The process with PID 9020 (child process of PID 15684) has been terminated.
    SUCCESS: The process with PID 15684 (child process of PID 4980) has been terminated.
    
    Process finished with exit code 1
    

    System environmental variables JAVA_HOME, HADOOP_HOME, SPARK_HOME are configured. The relevant binary directories are included in the Path system environmental variable.
    PYTHON_SPARK is set to python.

    EDIT: Great, and now Maven can’t even attempt to build the package and throws the error

    Error occurred during initialization of VM
    Could not reserve enough space for 2097152KB object heap
    

    Just great.