pcp2spark(1) — Linux manual page


PCP2SPARK(1)             General Commands Manual            PCP2SPARK(1)

NAME         top

       pcp2spark - pcp-to-spark metrics exporter

SYNOPSIS         top

       pcp2spark [-5CGHIjLmnrRvV?]  [-4 action] [-8|-9 limit] [-a
       archive] [-A align] [--archive-folio folio] [-b|-B space-scale]
       [-c config] [--container container] [--daemonize] [-e derived]
       [-g server] [-h host] [-i instances] [-J rank] [-K spec] [-N
       predicate] [-O origin] [-p port] [-P|-0 precision] [-q|-Q count-
       scale] [-s samples] [-S starttime] [-t interval] [-T endtime]
       [-y|-Y time-scale] metricspec [...]

DESCRIPTION         top

       pcp2spark is a customizable performance metrics exporter tool
       from PCP to Apache Spark.  Any available performance metric, live
       or archived, system and/or application, can be selected for
       exporting using either command line arguments or a configuration

       pcp2spark acts as a bridge which provides a network socket stream
       on a given address/port which an Apache Spark worker task can
       connect to and pull the configured PCP metrics from pcp2spark
       exporting them using the streaming extensions of the Apache Spark

       pcp2spark is a close relative of pmrep(1).  Refer to pmrep(1) for
       the metricspec description accepted on pcp2spark command line.
       See pmrep.conf(5) for description of the pcp2spark.conf
       configuration file syntax.  This page describes pcp2spark
       specific options and configuration file differences with
       pmrep.conf(5).  pmrep(1) also lists some usage examples of which
       most are applicable with pcp2spark as well.

       Only the command line options listed on this page are supported,
       other options available for pmrep(1) are not supported.

       Options via environment values (see pmGetOptions(3)) override the
       corresponding built-in default values (if any).  Configuration
       file options override the corresponding environment variables (if
       any).  Command line options override the corresponding
       configuration file options (if any).

GENERAL USAGE         top

       A general setup for making use of pcp2spark would involve the
       user configuring pcp2spark for the PCP metrics to export followed
       by starting the pcp2spark application. The pcp2spark application
       will then wait and listen on the given address/port for a
       connection from an Apache Spark worker thread to be started.  The
       worker thread will then connect to pcp2spark.

       When an Apache Spark worker thread has connected, pcp2spark will
       begin streaming PCP metric data to Apache Spark until the worker
       thread completes or the connection is interrupted.  If the
       connection is interrupted or the socket is closed from the Apache
       Spark worker thread pcp2spark will exit.

       For an example Apache Spark worker job which will connect to an
       pcp2spark instance on a given address/port and pull in PCP metric
       data see the example provided in the PCP examples directory for
       pcp2spark (often provided by the PCP development package) or the
       online version at
       https://github.com/performancecopilot/pcp/blob/main/src/pcp2spark/ .


       pcp2spark uses a configuration file with syntax described in
       pmrep.conf(5).  The following options are common with pmrep.conf:
       version, source, speclocal, derived, header, globals, samples,
       interval, type, type_prefer, ignore_incompat, names_change,
       instances, live_filter, rank, limit_filter, limit_filter_force,
       invert_filter, predicate, omit_flat, include_labels, precision,
       precision_force, count_scale, count_scale_force, space_scale,
       space_scale_force, time_scale, time_scale_force.  The rest of the
       pmrep.conf options are recognized but ignored for compatibility.

   pcp2spark specific options
       spark_server (string)
           Specify the address on which pcp2spark will listen for
           connections from an Apache Spark worker thread.
           Corresponding command line option is -g.  Defaults to

       spark_port (integer)
           Specify the port on which pcp2spark will listen for
           connections.  Corresponding command line option is -p.
           Defaults to 44325.

OPTIONS         top

       The available command line options are:

       -0 precision, --precision-force=precision
            Like -P but this option will override per-metric

       -4 action, --names-change=action
            Specify which action to take on receiving a metric names
            change event during sampling.  These events occur when a
            PMDA discovers new metrics sometime after starting up, and
            informs running client tools like pcp2spark.  Valid values
            for action are update (refresh metrics being sampled),
            ignore (do nothing - the default behaviour) and abort (exit
            the program if such an event occurs).

       -5, --ignore-unknown
            Silently ignore any metric name that cannot be resolved.  At
            least one metric must be found for the tool to start.

       -8 limit, --limit-filter=limit
            Limit results to instances with values above/below limit.  A
            positive integer will include instances with values at or
            above the limit in reporting.  A negative integer will
            include instances with values at or below the limit in
            reporting.  A value of zero performs no limit filtering.
            This option will not override possible per-metric
            specifications.  See also -J and -N.

       -9 limit, --limit-filter-force=limit
            Like -8 but this option will override per-metric

       -a archive, --archive=archive
            Performance metric values are retrieved from the set of
            Performance Co-Pilot (PCP) archive files identified by the
            archive argument, which is a comma-separated list of names,
            each of which may be the base name of an archive or the name
            of a directory containing one or more archives.

       -A align, --align=align
            Force the initial sample to be aligned on the boundary of a
            natural time unit align.  Refer to PCPIntro(1) for a
            complete description of the syntax for align.

            Read metric source archives from the PCP archive folio
            created by tools like pmchart(1) or, less often, manually
            with mkaf(1).

       -b scale, --space-scale=scale
            Unit/scale for space (byte) metrics, possible values include
            bytes, Kbytes, KB, Mbytes, MB, and so forth.  This option
            will not override possible per-metric specifications.  See
            also pmParseUnitsStr(3).

       -B scale, --space-scale-force=scale
            Like -b but this option will override per-metric

       -c config, --config=config
            Specify the config file or directory to use.  In case config
            is a directory all files in it ending .conf will be
            included.  The default is the first found of:
            ./pcp2spark.conf, $HOME/.pcp2spark.conf,
            $HOME/pcp/pcp2spark.conf, and
            $PCP_SYSCONF_DIR/pcp2spark.conf.  For details, see the above
            section and pmrep.conf(5).

            Fetch performance metrics from the specified container,
            either local or remote (see -h).

       -C, --check
            Exit before reporting any values, but after parsing the
            configuration and metrics and printing possible headers.

            Daemonize on startup.

       -e derived, --derived=derived
            Specify derived performance metrics.  If derived starts with
            a slash (``/'') or with a dot (``.'') it will be interpreted
            as a PCP derived metrics configuration file, otherwise it
            will be interpreted as comma- or semicolon-separated derived
            metric expressions.  For complete description of derived
            metrics and PCP derived metrics configuration files see
            pmLoadDerivedConfig(3) and pmRegisterDerived(3).
            Alternatively, using pmrep.conf(5) configuration syntax
            allows defining derived metrics as part of metricsets.

       -g server, --spark-server=server
            pcp2spark local server address.

       -G, --no-globals
            Do not include global metrics in reporting (see

       -h host, --host=host
            Fetch performance metrics from pmcd(1) on host, rather than
            from the default localhost.

       -H, --no-header
            Do not print any headers.

       -i instances, --instances=instances
            Retrieve and report only the specified metric instances.  By
            default all instances, present and future, are reported.

            Refer to pmrep(1) for complete description of this option.

       -I, --ignore-incompat
            Ignore incompatible metrics.  By default incompatible
            metrics (that is, their type is unsupported or they cannot
            be scaled as requested) will cause pcp2spark to terminate
            with an error message.  With this option all incompatible
            metrics are silently omitted from reporting.  This may be
            especially useful when requesting non-leaf nodes of the PMNS
            tree for reporting.

       -j, --live-filter
            Perform instance live filtering.  This allows capturing all
            named instances even if processes are restarted at some
            point (unlike without live filtering).  Performing live
            filtering over a huge number of instances will add some
            internal overhead so a bit of user caution is advised.  See
            also -n.

       -J rank, --rank=rank
            Limit results to highest/lowest ranked instances of set-
            valued metrics.  A positive integer will include highest
            valued instances in reporting.  A negative integer will
            include lowest valued instances in reporting.  A value of
            zero performs no ranking.  Ranking does not imply sorting,
            see -6.  See also -8.

       -K spec, --spec-local=spec
            When fetching metrics from a local context (see -L), the -K
            option may be used to control the DSO PMDAs that should be
            made accessible.  The spec argument conforms to the syntax
            described in pmSpecLocalPMDA(3).  More than one -K option
            may be used.

       -L, --local-PMDA
            Use a local context to collect metrics from DSO PMDAs on the
            local host without PMCD.  See also -K.

       -m, --include-labels
            Include PCP metric labels in the output.

       -n, --invert-filter
            Perform ranking before live filtering.  By default instance
            live filtering (when requested, see -j) happens before
            instance ranking (when requested, see -J).  With this option
            the logic is inverted and ranking happens before live

       -N predicate, --predicate=predicate
            Specify a comma-separated list of predicate filter reference
            metrics.  By default ranking (see -J) happens for each
            metric individually.  With predicates, ranking is done only
            for the specified predicate metrics.  When reporting, rest
            of the metrics sharing the same instance domain (see
            PCPIntro(1)) as the predicate will include only the
            highest/lowest ranking instances of the corresponding
            predicate.  Ranking does not imply sorting, see -6.

            So for example, using proc.memory.rss (resident memory size
            of process) as the predicate metric together with
            proc.io.total_bytes and mem.util.used as metrics to be
            reported, only the processes using most/least (as per -J)
            memory will be included when reporting total bytes written
            by processes.  Since mem.util.used is a single-valued metric
            (thus not sharing the same instance domain as the process
            related metrics), it will be reported as usual.

       -O origin, --origin=origin
            When reporting archived metrics, start reporting at origin
            within the time window (see -S and -T).  Refer to
            PCPIntro(1) for a complete description of the syntax for

       -p port, --spark-port=port
            pcp2spark local port.

       -P precision, --precision=precision
            Use precision for numeric non-integer output values.  The
            default is to use 3 decimal places (when applicable).  This
            option will not override possible per-metric specifications.

       -q scale, --count-scale=scale
            Unit/scale for count metrics, possible values include count
            x 10^-1, count, count x 10, count x 10^2, and so forth from
            10^-8 to 10^7.  (These values are currently space-
            sensitive.)  This option will not override possible per-
            metric specifications.  See also pmParseUnitsStr(3).

       -Q scale, --count-scale-force=scale
            Like -q but this option will override per-metric

       -r, --raw
            Output raw metric values, do not convert cumulative counters
            to rates.  This option will override possible per-metric

       -R, --raw-prefer
            Like -r but this option will not override per-metric

       -s samples, --samples=samples
            The samples argument defines the number of samples to be
            retrieved and reported.  If samples is 0 or -s is not
            specified, pcp2spark will sample and report continuously (in
            real time mode) or until the end of the set of PCP archives
            (in archive mode).  See also -T.

       -S starttime, --start=starttime
            When reporting archived metrics, the report will be
            restricted to those records logged at or after starttime.
            Refer to PCPIntro(1) for a complete description of the
            syntax for starttime.

       -t interval, --interval=interval
            Set the reporting interval to something other than the
            default 1 second.  The interval argument follows the syntax
            described in PCPIntro(1), and in the simplest form may be an
            unsigned integer (the implied units in this case are
            seconds).  See also the -T option.

       -T endtime, --finish=endtime
            When reporting archived metrics, the report will be
            restricted to those records logged before or at endtime.
            Refer to PCPIntro(1) for a complete description of the
            syntax for endtime.

            When used to define the runtime before pcp2spark will exit,
            if no samples is given (see -s) then the number of reported
            samples depends on interval (see -t).  If samples is given
            then interval will be adjusted to allow reporting of samples
            during runtime.  In case all of -T, -s, and -t are given,
            endtime determines the actual time pcp2spark will run.

       -v, --omit-flat
            Report only set-valued metrics with instances (e.g.
            disk.dev.read) and omit single-valued ``flat'' metrics
            without instances (e.g.  kernel.all.sysfork).  See -i and

       -V, --version
            Display version number and exit.

       -y scale, --time-scale=scale
            Unit/scale for time metrics, possible values include
            nanosec, ns, microsec, us, millisec, ms, and so forth up to
            hour, hr.  This option will not override possible per-metric
            specifications.  See also pmParseUnitsStr(3).

       -Y scale, --time-scale-force=scale
            Like -y but this option will override per-metric

       -?, --help
            Display usage message and exit.

FILES         top

            pcp2spark configuration file (see -c)

            system provided default pmrep configuration files


       Environment variables with the prefix PCP_ are used to
       parameterize the file and directory names used by PCP.  On each
       installation, the file /etc/pcp.conf contains the local values
       for these variables.  The $PCP_CONF variable may be used to
       specify an alternative configuration file, as described in

       For environment variables affecting PCP tools, see

SEE ALSO         top

       PCPIntro(1), mkaf(1), pcp(1), pcp2elasticsearch(1),
       pcp2graphite(1), pcp2influxdb(1), pcp2json(1), pcp2xlsx(1),
       pcp2xml(1), pcp2zabbix(1), pmcd(1), pminfo(1), pmrep(1),
       pmGetOptions(3), pmLoadDerivedConfig(3), pmParseUnitsStr(3),
       pmRegisterDerived(3), pmSpecLocalPMDA(3), LOGARCHIVE(5),
       pcp.conf(5), pmrep.conf(5) and PMNS(5).

COLOPHON         top

       This page is part of the PCP (Performance Co-Pilot) project.
       Information about the project can be found at 
       ⟨http://www.pcp.io/⟩.  If you have a bug report for this manual
       page, send it to pcp@groups.io.  This page was obtained from the
       project's upstream Git repository
       ⟨https://github.com/performancecopilot/pcp.git⟩ on 2024-06-14.
       (At that time, the date of the most recent commit that was found
       in the repository was 2024-06-14.)  If you discover any rendering
       problems in this HTML version of the page, or you believe there
       is a better or more up-to-date source for the page, or you have
       corrections or improvements to the information in this COLOPHON
       (which is not part of the original manual page), send a mail to

Performance Co-Pilot               PCP                      PCP2SPARK(1)

Pages that refer to this page: pcp2elasticsearch(1)pcp2graphite(1)pcp2influxdb(1)pcp2json(1)pcp2openmetrics(1)pcp2template(1)pcp2xlsx(1)pcp2xml(1)pcp2zabbix(1)pmrep(1)pmrepconf(1)