PCollection #
Distributed data set that Beam pipeline operates on.
It can be:
-
Bounded - from a fixed source
-
Unbounded - a source that continuously keep growing and changing
PCollection from In-Memory #
Create an initial PCollection from in-memory data (i.e., stuff you input).
with beam.Pipeline() as p:
# numerical PCollection
(p | beam.Create(range(1, 11)))
# PCollection of strings
(p | beam.Create(['This', 'is', 'a', 'string']))