README – Supplementary secretome dataset

File name
Supplementary secretome dataset.xlsx

General description
This workbook contains the curated supplementary dataset associated with the comparative analysis of the T3SS-independent secretome (T3-IS) across three strains/pathovars of the Pseudomonas syringae complex:
- Psv: Pseudomonas savastanoi pv. savastanoi NCPPB 3335
- Pto: Pseudomonas syringae pv. tomato DC3000
- Pph: Pseudomonas syringae pv. phaseolicola 1448A

The workbook is organized into three sheets that separate proteins into:
1) proteins shared by all three strains,
2) proteins shared by pairwise strain combinations,
3) strain-specific proteins.

All sheets preserve the original spreadsheet formatting of the supplementary dataset. In the original file, the first rows are used for visual spacing and the first column is mostly blank; the actual headers start below these spacer rows. The labels “Median PSMa” and “Consensus abundance percentile medianb” are retained exactly as they appear in the source workbook. The superscripts “a” and “b” are part of the original header text and no additional footnote definitions were present in the uploaded workbook.

Workbook structure
The workbook contains three worksheets:
- Supplementary table 1.
- Supplementary table 2.
- Supplementary table 3.

============================================================
SHEET 1: “Supplementary table 1.”
============================================================

Content
This sheet contains proteins detected in the common secretome core shared by Psv, Pto and Pph. Each row corresponds to one orthologous/shared protein entry represented by the corresponding identifiers in the three strains.

Main columns
- Psv protein ID: protein identifier in Psv
- Pto protein ID: protein identifier in Pto
- Pph protein ID: protein identifier in Pph
- Psv Median PSMa: median peptide-spectrum-match abundance measure for the Psv ortholog
- Pto Median PSMa: median peptide-spectrum-match abundance measure for the Pto ortholog
- Pph Median PSMa: median peptide-spectrum-match abundance measure for the Pph ortholog
- Consensus abundance percentile medianb: percentile-based ranking summarizing abundance within the shared/core dataset
- Top theme: manually assigned or curated major functional theme for the entry
- Theme interpro inferred: functional theme inferred from InterPro/domain-level annotation

Interpretation
This sheet should be read as the triple-shared secretome backbone. It enables direct comparison of orthologous proteins conserved across the three strains while also providing abundance-related ranking and functional categorization.

============================================================
SHEET 2: “Supplementary table 2.”
============================================================

Content
This sheet contains proteins shared by pairwise comparisons only. The table is divided into three consecutive blocks, each with its own repeated headers:
- Psv–Pto shared proteins
- Psv–Pph shared proteins
- Pto–Pph shared proteins

Within each block, each row corresponds to one protein pair shared by the two strains of that comparison.

Main columns used in each pairwise block
- Strain protein ID: generic block label indicating that the following identifiers correspond to strain-specific protein IDs
- Psv protein ID / Pto protein ID / Pph protein ID: identifiers for the two strains in the corresponding pairwise comparison
- Psv Median PSMa / Pto Median PSMa / Pph Median PSMa: abundance values for the proteins in the corresponding pair
- Consensus abundance percentile medianb: percentile-based abundance ranking within the corresponding pairwise subset
- Top theme: manually assigned or curated major functional theme
- Theme interpro inferred: functional theme inferred from InterPro/domain-level annotation

Interpretation
This sheet captures proteins present in two strains but absent from the third in the shared-set definition used for this supplementary dataset. It is therefore useful for identifying pairwise conservation patterns and host/pathovar-associated differences.

Important formatting note
Because the sheet contains three separate pairwise sections, headers are repeated within the same worksheet. Users should treat each section independently when importing the file into analysis software.

============================================================
SHEET 3: “Supplementary table 3.”
============================================================

Content
This sheet contains strain-specific secretome proteins. The worksheet is arranged in multiple blocks:
- a Psv-only block
- a Pto-only block
- a Pph-only block

The left side of the sheet lists strain-specific proteins for each strain in separate sections with repeated headers. In addition, the right side of the sheet (columns Q–V in the original workbook) contains a compact generic summary block headed by:
- Subset
- Original_ID
- Median_PSM
- Abundance_percentile
- Top_theme
- Top_theme_enrichment

In the uploaded workbook, this right-side block contains entries labelled “Pph-only”. These records correspond to the same general data structure as the strain-specific entries and should be interpreted as an auxiliary summary panel included in the original spreadsheet layout.

Main columns in the strain-specific sections
- Psv protein ID / Pto protein ID / Pph protein ID: strain-specific protein identifier
- Median PSMa: abundance value for the strain-specific protein
- Consensus abundance percentileb or abundance percentile: percentile-based abundance ranking within the strain-specific subset
- Top theme: manually assigned or curated major functional theme
- Theme interpro inferred / Top_theme_enrichment: theme inferred from InterPro/domain-level annotation or enrichment-based assignment

Interpretation
This sheet is intended for proteins that do not belong to the shared-core or pairwise-shared categories and are therefore treated as strain-specific. It supports comparison of subset-specific proteins, their relative abundance ranking, and their functional categorization.

============================================================
Meaning of the functional-theme columns
============================================================

Two theme-related columns are present throughout the workbook:
- Top theme: the main functional category assigned to the protein entry in the curated dataset
- Theme interpro inferred (or Top_theme_enrichment in the auxiliary summary block): theme inferred from InterPro/domain annotation or enrichment-derived assignment

Examples of recurring themes in the workbook include:
- Outer membrane channels/porins
- Carbohydrate-binding & processing
- Solute-binding & nutrient binding
- Proteolysis/peptidases & PPIases
- Transporters (non-channel)
- Cell envelope & PG remodeling
- Signal transduction/chemotaxis
- Metal uptake (TonB/siderophore/Zn)
- Motility/flagella
- Respiration/redox
- Secretion/outer membrane machinery
- Unassigned (no enriched term)
- Unmapped/Other

============================================================
How to use the dataset
============================================================

Recommended uses include:
- retrieving shared-core, pairwise-shared, or strain-specific protein sets,
- comparing relative abundance patterns across strains,
- linking protein IDs to broad functional themes,
- tracking proteins assigned to “Unassigned” or “Unmapped/Other” categories,
- integrating the supplementary tables with downstream comparative, functional, or network analyses.

When importing this workbook into R, Python, or spreadsheet software, users should take into account that:
- the first rows in each sheet may contain spacing rather than data,
- headers may be repeated within a sheet,
- some sections are separated by blank rows,
- the right-side block in Supplementary table 3 is an additional summary panel embedded in the same worksheet.

============================================================
Suggested citation/association text
============================================================

This dataset provides the supplementary protein-level tables underlying the comparative analysis of the T3SS-independent secretome in Psv, Pto and Pph, including shared-core proteins, pairwise-shared subsets, and strain-specific proteins, together with abundance-related measures and functional-theme annotations.

End of file.