DATASET DESCRIPTION FOR RIUMA DEPOSIT
====================================

Recommended dataset title
-------------------------
Supplementary vesicles dataset: curated proteomic tables of extracellular vesicle-associated proteins from Pseudomonas savastanoi pv. savastanoi, Pseudomonas syringae pv. tomato, and Pseudomonas syringae pv. phaseolicola

Described file name
-------------------
Supplementary vesicles dataset.xlsx

General description
-------------------
This file contains a curated set of supplementary tables derived from the proteomic analysis of extracellular vesicle (EV)-associated proteins from three members of the Pseudomonas syringae complex:

- Psv: Pseudomonas savastanoi pv. savastanoi NCPPB 3335
- Pto: Pseudomonas syringae pv. tomato DC3000
- Pph: Pseudomonas syringae pv. phaseolicola 1448A

The workbook does not contain raw mass spectrometry data or original MS/MS spectra. Instead, it provides final processed and organized tables prepared for presentation as supplementary material. Proteins are grouped according to their conservation pattern across species and according to an operational classification into two fractions:

1. Hydrophobic proteins:
   proteins associated with the hydrophobic or membrane-related vesicle fraction, operationally defined in the study as proteins containing at least one predicted transmembrane region.

2. Internal proteins:
   proteins associated with the internal or luminal vesicle fraction, operationally defined as proteins lacking predicted transmembrane regions.

The file summarizes proteins conserved across the three species, proteins shared by pairs of species, and species-specific proteins. For each entry, the dataset includes protein identifiers, semi-quantitative abundance measurements based on PSM (peptide spectrum matches), relative abundance percentiles within the corresponding subset, and condensed functional annotations inferred mainly from InterPro.

Brief experimental context
--------------------------
The tables derive from the comparative analysis of EV isolated from Psv, Pto, and Pph, including in the original study wild-type strains and hrpA and hrpL mutants. The abundance values summarized in this file correspond to median values per species or subset and not to raw intensities from individual replicates. Therefore, this dataset should be regarded as a processed and curated results dataset suitable for consultation, comparative reuse, functional meta-analysis, and supplementary documentation, but not as a substitute for primary proteomics files.

File structure
--------------
The workbook contains 3 worksheets:

1. Supplementary table 1.
2. Supplementary table 2.
3. Supplementary table 3.

Each worksheet contains separate blocks divided by blank rows and internal headers. Correct interpretation of the file requires reading each block independently.

Detailed contents by worksheet
------------------------------

1) Worksheet: "Supplementary table 1."
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This worksheet contains proteins conserved across the three species (Psv-Pto-Pph), separated into two blocks:

A. Hydrophobic proteins
   - Number of entries: 9
   - Columns:
     * Psv protein ID
     * Pto protein ID
     * Pph protein ID
     * Psv Median PSMa
     * Pto Median PSMa
     * Pph Median PSMa
     * Consensus abundance percentile medianb
     * Top theme
     * Theme interpro inferred

B. Internal proteins
   - Number of entries: 115
   - Columns:
     * Psv protein ID
     * Pto protein ID
     * Pph protein ID
     * Psv Median PSMa
     * Pto Median PSMa
     * Pph Median PSMa
     * Consensus abundance percentile medianb
     * Top theme
     * Theme interpro inferred

Interpretation:
This worksheet represents the conserved core of vesicle-associated proteins shared by the three species, distinguishing between the hydrophobic fraction and the internal fraction.

2) Worksheet: "Supplementary table 2."
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This worksheet contains proteins shared by pairs of species, also separated into hydrophobic and internal blocks.

A. Hydrophobic proteins
   - Subset Psv-Pto:
     * Number of entries: 3
     * Columns:
       - Psv protein ID
       - Pto protein ID
       - Psv Median PSMa
       - Pto Median PSMa
       - Consensus abundance percentileb
       - Top theme
       - Theme interpro inferred

   - Subset Pto-Pph:
     * Number of entries: 3
     * Columns:
       - Pto protein ID
       - Pph protein ID
       - Pto Median PSMa
       - Pph Median PSMa
       - Consensus abundance percentileb
       - Top theme
       - Theme interpro inferred

B. Internal proteins
   - Subset Psv-Pto:
     * Number of entries: 35

   - Subset Psv-Pph:
     * Number of entries: 6

   - Subset Pto-Pph:
     * Number of entries: 20

   - For each of these three subsets, the same basic structure is repeated:
     * protein IDs for the species involved
     * Median PSM for each species
     * Consensus abundance percentileb
     * Top theme
     * Theme interpro inferred

Interpretation:
This worksheet contains proteins not present in all three species but shared by species pairs. It allows identification of partial overlap and comparison of vesicle composition shared between pairs of pathovars or species.

3) Worksheet: "Supplementary table 3."
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This worksheet contains species-specific proteins, separated into hydrophobic and internal fractions.

A. Hydrophobic proteins
   - Psv-specific:
     * Number of entries: 13
   - Pto-specific:
     * Number of entries: 21
   - Pph-specific:
     * Number of entries: 1

B. Internal proteins
   - Psv-specific:
     * Number of entries: 61
   - Pto-specific:
     * Number of entries: 76
   - Pph-specific:
     * Number of entries: 6

Columns in the species-specific blocks:
- Strain protein ID / Psv protein ID / Pto protein ID / Pph protein ID
- Median PSMa
- Consensus abundance percentileb
- Top theme
- Theme interpro inferred

Interpretation:
This worksheet documents proteins exclusive to a single species within the comparison, which is useful for studying taxonomic or niche-associated specificity.

Definition and interpretation of columns
----------------------------------------

Protein ID columns
~~~~~~~~~~~~~~~~~~
Protein identifiers correspond to the proteome or annotation IDs for each organism. Depending on the block, one, two, or three IDs may appear in parallel to represent orthologs or corresponding proteins across species.

Median PSMa
~~~~~~~~~~~
PSM stands for peptide spectrum matches. In this file, abundance is summarized using the median PSM value for each protein within each species or subset. These values are semi-quantitative: they reflect relative abundance detected by proteomics but should not be interpreted as absolute concentration measurements.

Consensus abundance percentile medianb / Consensus abundance percentileb
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a relative abundance ranking within the corresponding subset. Higher values indicate proteins that are relatively more abundant within that block. In Supplementary table 1 it appears as "median", whereas in Supplementary tables 2 and 3 it appears without that term; in all cases it should be interpreted as a summarized measure of relative abundance position.

Top theme
~~~~~~~~~
Higher-order functional category assigned to each protein based on the enrichment and functional aggregation strategy used in the study. Examples observed in the file include:
- Translation & ribosome
- Small-molecule metabolism
- Redox & oxidoreductases
- Central carbon (TCA,glyoxylate)
- Amino-acid metabolism
- DNA/RNA processes
- Respiration & energy
- Transport & channels
- Cell envelope & lipoproteins
- Other membrane proteins
- Protein export / Sec
- Proteolysis & peptidases

Theme interpro inferred
~~~~~~~~~~~~~~~~~~~~~~~
Condensed functional description inferred mainly from InterPro annotations. This column contains the domain, superfamily, or motif description most representative for the protein and used to support thematic assignment.

Format and internal organization
--------------------------------
- The file is structured as an Excel workbook with several data blocks within the same worksheet.
- Blank rows separate subsets.
- Headers are repeated at the beginning of each block.
- The tables are designed for human reading and supplementary presentation; therefore, they do not follow a single tidy-data structure across the full workbook.
- Automated computational analysis may require splitting each worksheet into blocks before importing into R, Python, or another environment.

Subset sizes
------------
Summary of entries contained in the workbook:

Supplementary table 1
- Conserved hydrophobic proteins (Psv-Pto-Pph): 9
- Conserved internal proteins (Psv-Pto-Pph): 115

Supplementary table 2
- Pairwise-shared hydrophobic proteins:
  * Psv-Pto: 3
  * Pto-Pph: 3
- Pairwise-shared internal proteins:
  * Psv-Pto: 35
  * Psv-Pph: 6
  * Pto-Pph: 20

Supplementary table 3
- Species-specific hydrophobic proteins:
  * Psv: 13
  * Pto: 21
  * Pph: 1
- Species-specific internal proteins:
  * Psv: 61
  * Pto: 76
  * Pph: 6

Important notes for reuse
-------------------------
1. This file contains processed and curated data, not raw mass spectrometry data.
2. PSM values are semi-quantitative and should be interpreted as relative abundance metrics.
3. Functional categories are summarized and curated categories, not direct unprocessed output from a single annotation tool.
4. Protein correspondence across species is presented at the supplementary table level and depends on the conservation criteria used in the study.
5. Hydrophobic and internal blocks should be analyzed separately because they represent biologically distinct fractions.
6. The workbook contains a small number of residual rows at the end of the worksheet "Supplementary table 3" that appear to retain metadata or export traces for a specific entry. These rows are not part of the main tables and can be ignored for standard analytical reuse.

Potential uses of the dataset
-----------------------------
- Supplementary consultation of EV-associated proteins.
- Comparison of conserved versus species-specific proteins.
- Integration with external functional annotations.
- Comparative analysis of hydrophobic versus internal vesicle-associated fractions.
- Support dataset for figures, tables, and interpretations presented in the associated thesis or manuscript.